Sunday, March 18, 2012

The Magic Number


Hoho, In this post I'll explain a thing called "Magic Number". Can you guess already what it is? Some kind of number that have a magical spell in it? some kind of magic trick that use numbers as its media? or the others? 

Unfortunately, "magic number" in this post is not a thing related to magic itself literally. This magic number is a thing related about a file format. Originally, this kind of term was used for a specific set of 2-byte identifiers at the beginning of a file, but since any undecoded binary sequence can be regarded as a number, any feature of a file format which uniquely distinguishes it can be used for identification. Identify what? to identify a file format of course.

An example, GIF images always begin with the ASCII representation of either "GIF87a" or "GIF89a", depends on its standard. Many file types, most especially plain-text files are harder to spot by this method. HTML files for example, might begin with the string "<html>" (which is not case sensitive), or an appropriate document type definition that starts with <!DOCTYPE, or for XHTML, the XML identifier, which begins with <?xml. This kind of files can also begin with HTML comments, random text, or several empy lines, but still a usable HTML file.

This magic number technique to detect a file types offers a better guarantees that the file format and type will be indentified correctly, and can often determine more precise information about the file. But, beside that this magic number test can be very comples because each file types must effectively be tested against every possibility in the magic database, results in a very inefficient resource, especially for displaying large list of files. 
Also, data must be read from the file itself, increasing latency as opposed to metadata stored in the directory. When the filetypes don't lend themselves to be recognized this way, the system must fall back to metadata. It is, however, the best way for a program to check if a file it has been told to process is in the correct format. A well designed magic number test is great to find if the file is either corrupt or of the wrong type. On the contrary, a valid magic number doesn't guarantee that the file is not corrupt or of a wrong type.

In computer programming itself, magic number has multiple meanings :
  • A constant numerical or text value used to indentify a file format or protocol.
  • Distictive unique values that are unlikely to be mistaken for other meanings.
  • Unique values with unexplained meaning for multiple occurrences which could be replaced with named constants.

Some examples of magic number in a file :
  • Compiled Java class files (bytecode) start with hex "CAFEBABE". When compressed with Pack200 the bytes are changed to CAFED00D.
  • GIF image files have the ASCII code for "GIFF89a"(47 49 46 38 39 61) or "GIF87a"(47 49 46 38 37 61).
  • JPEG image files begin with FF D8 and end with FF D9. JPEG/JFIF files contain the ASCII code for "JFIF"(4A 46 49 46) as a null terminated string. JPEG/Exif files contain the ASCII code for "Exif"(45 78 69 66) also as a null terminated string, followed by more metadata about the file.
  • Standard MIDI music files have the ASCII transfer mode instead of the binary mode.
  • Unix script files usually start with a shebang. "#!"(23 21) followed by the path to an interpreter.
  • PostScript files and programs start with "%!" (25 21).
  • PDF files start with "%PDF" (hex 25 50 44 46).
  • etc. 
You can find more example here.  ^_^b

0 comments:

Post a Comment