Mentawais

March 22nd, 2012

This year I was finally lucky enough to fulfill one of my surfing dreams – a boat trip to the Mentawais Islands. We were on board the “Naga Laut”, and had nine awesome days surfing Nipussi, EBay, Telescopes, Macaronis, HT’s, and Bang Bangs – often scoring the break to ourselves. A few pictures of the trip are below:

Outrigger canoes at Padang

Local padang kids surfing in filthy water after a storm

Sleeping quarters aboard the Naga Laut

Board Rack

Perfect Tropical Island

Accomodation near Playgrounds

Me on a nice right-hander at Nipussi

Local fishermen at Telescopes

My temporary island home in Playgrounds after being caught in a storm

Pro surfers at E-Bay

Future Indonesian champion at HT's

Me on another backhand disaster at Telescopes

A slightly better attempt at Telescopes

Sunset at Playgrounds

Another sunset


Validating image uploads with PHP

December 8th, 2011

I was asked by a colleague recently how I ensure that a file uploaded to a web server is of a particular type. For example, if I have an image gallery, how do I ensure that only valid images are uploaded, and not (as happens more regularly than you’d hope) an image embedded in an MS Word document.

There are a number of naive methods for achieving this, most common of which are “check the mime type reported by the browser when uploading”, or the really dumb “check the file extension”. Honestly – do you really think a malicious user wouldn’t think of changing a file extension?

I prefer to use the Linux file command. It identifies files based on their magic number, and can identify images from malicious executables easily. Here’s some simple example code:

	function validImage($filename)
	{
		// Check that the requested file exists
		if (!file_exists($filename) || is_dir($filename))
			throw new Exception('Specified file does not exist');

		exec("file -bi $filename 2> /dev/null", $output, $returnvalue);

		if (((int) $returnvalue === 0) && (count($output) > 0))
		  return preg_match('/^image\/.*/i', implode(' ', $output));

		return false;
	}

The above function attempts to identify a given file, and determines it’s MIME type. If the MIME string begins with the phrase “image/”, then we can assume that the file is indeed an image file. Note that this does not necessarily mean that the file is valid or uncorrupted.

There are a few proviso’s to be aware of before using this technique:
a) You must be on a Linux-based server (duh)
b) Your PHP installation must allow for calls to exec

This method can be extended to identify any type of file uploaded. It’s also not necessary to use the MIME type reported by file either – if you know what you’re looking for you can use the normal out of the command.

If you’re specifically looking to identify images, an alternative is to use ImageMagick’s identify command. This can be slightly more problematic in that ImageMagick will report on PDF documents and video files too – but by carefully inspecting the output you might be able to extract the file type and other useful information (eg image dimensions) in one command.


Javascript snippets

October 20th, 2011

Today I had to convert a whole bunch of CSS colours from RGB(x,y,z) to HTML hexadecimal format. There are loads of these colour format converters on the web, and they all seem to do the same thing: input the R, G, and B components into separate boxes, press a button and you’ll get back an HTML colour string.

There’s a problem with all of these though – you have to copy and paste each RGB component into a separate box. If you have lots to do, this will take ages. Why do none of these converters allow you to paste in the whole RGB triplet at once?

In frustration, I wrote this little Javascript function to do the job:

function rgbToHex(rgb) {
	var result = '#',
	    format = function(n) { result += ((parseInt(n) < 16)?'0':'') + parseInt(n).toString(16); };

 	rgb.split(',').forEach(format);
	return result;
}

Not particularly complicated, and it works a treat – have a try in the space below:

 

Oh yeah… This requires Javascript 1.6, which means a decent browser. IE < 9 need not apply – sorry.


Preventing Googlebot from indexing your website

September 21st, 2011

I know – this seems like the opposite of what most people want from Google – but there’s a good reason for it on occasion:

Recently one of the sites we were building was indexed by Google during the development phase. Somehow Googlebot managed to find a way onto the development server, crawled an almost complete e-commerce site, and proudly displayed those results on it’s search results.

Very soon after that, the site went live and Google crawled that too. It decided that the live version contained duplicate content from the development version, and dropped those pages from it’s listings altogether – not good. This issue was quickly resolved by some permanent redirects on the development server (and resubmitting the live URL for Google to index), but this should not have been necessary in the first place.

A better way to resolve this issue is to stop Googlebot seeing your pages at all. There are a number of ways to achieve this – the most obvious being a restrictive robots.txt file. There are other ways however, and my preferred method is to use some mod_rewrite trickery. Place the following in your htaccess, virtualhost or other config files:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} google [NC]
RewriteRule .* - [R=404,L]

Of course, you’d probably want to modify that HTTP_USER_AGENT line to include the user agents of other bots too.

An easy way to test if this is working is by using the User Agent Switcher plugin for Firefox.

Why have I settled on this method? I have a bash script which I run whenever I create a new site. It sets up a new vhost on our development server, creates appropriate directories, sets appropriate permissions, and reloads Apache2′s configuration. It also includes the above in the vhost file, meaning that I don’t have to worry about remembering to create a robots.txt (and ensures I don’t copy the restrictive robots.txt file to the production server accidentally)


MySQL and Negative Unix Timestamps (Dates pre-1970)

March 24th, 2011

Writing this post as I keep forgetting how to store pre-1970 timestamps in a MySQL datetime field.

The Unix timestamp is an integer representing the number of seconds since January 1st, 1970. I regularly store datetime variables from PHP in a database using:

$db->ExecuteSQL('INSERT INTO table (datefield) VALUES (FROM_UNIXTIME(?))', 'i', $datevalue);

However, this comes unstuck when dealing with pre-1970 dates. Although PHP can easily handle negative timestamp values, MySQL’s FROM_UNIXTIME function cannot, and things start going awry. To fix this issue, use the following:

$db->ExecuteSQL('INSERT INTO table (datefield) VALUES (DATE_ADD(FROM_UNIXTIME(0), ?))', 'i', $datevalue);

Similarly, converting back to a Unix timestamp using MySQL’s UNIX_TIMESTAMP function will also fail if the date value is before the Unix epoch. To retrieve these values as a Unix timestamp from the database, use:

$db->FetchValue('SELECT TIMESTAMPDIFF(SECOND, FROM_UNIXTIME(0), datefield) AS datefield FROM table');

Facedown

November 3rd, 2010

A comment on the Whirlpool N900 forums piqued my interest recently. The author was asking about the possibility of switching phone profiles based on the orientation of the phone – more specifically turning the phone to “silent” mode when placed face-down on a desk (or other flat surface). Apparently this is available already on the HTC Desire, and it seemed like the sort of thing the N900 should be able to handle with ease.

I started with a simple proof of concept shell script. It was pretty untidy, but demonstrated that it was certainly feasible to detect the orientation of the phone and change profiles accordingly. Perhaps I should have done some Googling first, but I later spotted some neat Python code by Great Gonzo which did the job so much better than my shell script. With his permission, I’ve borrowed most of his work here.

How does the script work? In simple terms it polls the N900′s accelerometer every now and then to determine the device’s spatial orientation. If the accelerometer reports that it is face down, we also check the phone’s proximity sensor to ensure that it’s lying flat on a surface, and not (for whatever reason) just being held directly overhead.

The have been some comments that polling the accelerometer is a bit of a battery drain. In an ideal world, we’d receive event notifications for when the phone is face down, and in fact the N900 does provide those orientation notifications to interested applications. Unfortunately, we can’t use them because:

  1. The notifications aren’t sent when the screen is locked
  2. Try as I might, I couldn’t get the sample Python code to work in a console application

My gut feeling was that since we’re checking the sensor pretty infrequently (currently once every 3 seconds), that it wouldn’t be a significant power drain, but in the spirit of actually knowing, I decided to run some tests. I initially tried to use BatteryGraph to determine the power drain of the Python script. I set the script running and left the phone alone for 2 hours. Unfortunately seemed to log data at pretty random intervals, and inspecting the SQLite database it populates didn’t provide enough data to form any sort of valid conclusion.

Plan B: I wrote the following simple Python code to log the battery charge level every minute:

#!/usr/bin/python
import dbus, time
from datetime import datetime

class BatteryLevel():
  def StoreBatteryLevel(self, level):
    f = open("/home/user/battery-level", "a")
    f.write("%s,%s\n" % (datetime.now(), level))
    f.close()

  def Loop(self):
    bus = dbus.SystemBus()
    hal_obj = bus.get_object('org.freedesktop.Hal', '/org/freedesktop/Hal/Manager')
    hal = dbus.Interface(hal_obj, 'org.freedesktop.Hal.Manager')
    uids = hal.FindDeviceByCapability('battery')
    dev_obj = bus.get_object('org.freedesktop.Hal', uids[0])
    while True:
      level = dev_obj.GetProperty('battery.charge_level.percentage')
      self.StoreBatteryLevel(level)
      time.sleep(60);

level = BatteryLevel()
level.Loop();

With this logging script running, I switched off all running N900 applications, locked the screen and left it alone for an hour. The log file it produced looked like this:


time,battery %
2010-11-03 17:30:45.756291,74
...
2010-11-03 18:27:53.257001,74

So… the battery level hadn’t changed significantly over this time (as you’d expect with nothing running and the screen locked).

Leaving the battery logging running, I then started the “facedown” python script. After another hour, the log file looked like this:


time,battery %
2010-11-03 18:29:53.410619,74
...
2010-11-03 19:34:02.163335,74

This surprised me. I expected to see a slight drop in the battery charge level, but nope… no change whatsoever (I may re-run these tests over a longer period later on, but a little bit of luck is required: any received phone calls, SMS’s or other scheduled tasks that might be running on the phone will interfere with the results). With no discernible difference in battery usage over that 60 minute period, I’m confident that I won’t notice any major changes over a 24 hour period either.

Since the battery drain appears to be fairly minimal, I decided to abandon the QBW idea, and just have the script running all the time. To start the script when the phone boots, we need to add a file to UpStart’s /etc/event.d/ directory. I found this quite confusing – and I think I’m not the only one that’s been baffled by UpStart’s stubborn refusal to run my scripts. However, I eventually got it working using:

description "Facedown Daemon"
author "Rob Poyntz"

service
console none

start on started hildon-desktop
stop on stopping shutdown

exec /usr/bin/run-standalone.sh exec /usr/bin/python /opt/Facedown/facedown.py start

So what’s left? Well I guess the code. You can download it from here.

At some point I need to learn how to package this up into a easy to install Debian package, but in the meantime, drop me a line if you’d like a hand getting it running.


On finding the union of sets in PHP

July 30th, 2010

How do we find the union of two sets in PHP? Suppose we have two sets, a = {1, 2, 5, 6}, and b = {1, 2, 4}, and we want to determine a ∪ b = {1, 2, 4, 5, 6}. PHP doesn’t really have a data type that represents a set, but we can instead use the array type. There are a couple of different ways we can achieve this:

1) Use array_merge and array_unique
The array_merge function will append the values of one array to the end of another. In our case, we will end up with some duplicate values as some elements of set b are also contained in set a. To filter out these duplicates, we can use the array_unique function.

$a = array(1, 2, 5, 6);
$b = array(1, 2, 4);
$union = array_unique(array_merge($a, $b));
print_r($union);
//Array
//(
//    [0] => 1
//    [1] => 2
//    [2] => 5
//    [3] => 6
//    [6] => 4
//)

Since a set is not an ordered list, this method produces the correct result.

2) Another way of calculating the union is to use the array union operator (+)
We need to be careful here, as the array union operator acts on the keys of an array, and not it’s values. In order to calculate the union of the array values, we can use the array_fill_keys function:

$a = array(1, 2, 5, 6);
$b = array(1, 2, 4);
$union = array_keys(array_fill_keys($a, true) + array_fill_keys($b, true));
print_r($union);
//Array
//(
//    [0] => 1
//    [1] => 2
//    [2] => 5
//    [3] => 6
//    [4] => 4
//)

Again, this technique produces the correct result.

So which is the best method to use? If you’re only looking at small sets such as the above, I don’t suppose it makes much difference. However for larger sets, array_merge followed by array_unique is by a long way the slower of the two methods. Try this simple test to see the difference:

  // Set up two arrays, each with 10,000 (not necessarily unique) values
  $a = array();
  $b = array();
  for ($i=0; $i < 10000; $i++)
  {
    $a[] = rand(1, 50000);
    $b[] = rand(1, 50000);
  }

  $startTime = microtime(true);
  $union = array_unique(array_merge($a, $b));
  $arrayMergeDuration = microtime(true) - $startTime;

  $startTime = microtime(true);
  $union = array_keys(array_fill_keys($a, true) + array_fill_keys($b, true));
  $arrayFillDuration = microtime(true) - $startTime;

  echo <<<eot
array_merge method took $arrayMergeDuration s
array_fill_keys method took $arrayFillDuration s

eot;

// Output from my test machine:
array_merge method took 0.330797195435 s
array_fill_keys method took 0.0230300426483 s

This simple test indicates that the array_fill_keys technique is about 14 times quicker when working with arrays of non-trivial size. Obviously, when dealing with large sets, you should consider using a tool custom built for the job (ie a relational database), but there may be some cases where this could come in handy. YMMV.


Buttons and IE6

June 16th, 2010

Despite it’s rapidly declining user base, IE6 is still causing headaches to web developers. This morning I had was testing a form I’d built in IE6, after confirming it worked in all other common browsers. The form in question contained a whole bunch of <button type=”submit”> tags, and depending on which button you clicked, the server-side code would perform an action, eg:

<form method="post" action="blah">
  <div>
    <button type="submit" name="cancel" value="cancel">Cancel</button>
    <button type="submit" name="continue" value="continue">Continue</button>
  </div>
</form>
if (array_key_exists('cancel', $_POST))
  return DoCancel();
else if (array_key_exists('continue', $_POST))
  return DoContinue();
else
  return DoNoAction();

The HTML4.01 spec says:

If a form contains more than one submit button, only the activated submit button is successful.

which means that the above PHP code should work as advertised. Unfortunately, IE6 sends every button along with the form data, not just the “activated” one. This means that when using IE6, even if you click the “continue” button, the above code would run the “DoCancel” method.

I know, I’m about 6 years too late with this little snippet, but here’s the fix, using a little jQuery:

<!--[if lt IE 7]>
<script type="text/javascript">
(function() {
  function rmBtns(e) {
    e = e || window.event;
    $('button').not(e.target).remove();
  }

  function buttonFix() {
    $('button').click(rmBtns);
  }

  $(document).ready(buttonFix);
})();

</script>
< ![endif]-->

Productivity negated by blogging

May 31st, 2010

The Linux command line is incredibly powerful. It can control pretty much any aspect of your Linux system with only a few keystrokes, and can help automate and simplify repetitive tasks. For example, I’ve been working on a project for a magazine publisher recently, and one of the sections in their new website is a “back issues” area where subscribers can download, erm, back issues. I’ve been given 26 PDF documents containing each edition of the magazine dating back to mid-2007 or so, and needed to generate a thumbnail of each magazine cover. Two ways of accomplishing this spring to mind:

1) Open each PDF document, take a screenshot, paste into the GIMP, resize and save. Repeat 25 more times.
2) Use ImageMagick:

for pdf in `ls *.pdf`
do
  convert -resize 120x90 "$pdf[0]" images/`echo $pdf | sed 's/\.pdf$/.jpg/'`
done

Using method 1, I expected it would have taken me around 20 minutes of mind-numbing boredom, repeating the same old steps and trying not to get distracted by reading Slashdot every few images.

Using method 2, the whole batch was completed in 7 seconds. This massive gain in productivity has unfortunately been completely negated by the fact that I felt compelled to document it in this very blog post.

Anyway, the point to this story is that I wanted to make a note of the fact that ImageMagick is able to extract individual pages from a PDF document: the “[0]” part of “$pdf[0]” in the example above specifies use the first frame/page of the PDF document.


EasyPDO

March 15th, 2010

This will be the last post in my blog regarding database connectivity. I’ve set up a site for my new project called EasyPDO. EasyPDO incorporates all of the ideas I’ve been working on over the last few years with the MySQLi and PDO database connectivity software.

EasyPDO currently supports connections to MySQL, SQLite and PostgreSQL databases. It provides a simple, secure and fast system for interacting with a database, and helps to eliminate the possibility of SQL injection attacks.

Will be interested to hear any feedback you may have.