The Golden Middle Path - a blog by Amit K Mathur

File Formats

What is a file format

File Format is a way of putting your software application’s data into
a file on the disk when you do a save.

For example, if you create a document using a spreadsheet application,
when you save the document, the file format should have ways to store
how the data is organised in rows and columns, where you have put
colours etc. so that you can later re-open the saved document
in the application and it should look exactly the same as you created
it. File formats are ways of encoding the information in the document,
so that they can be stored in files on disks.

Why are they needed

As stated above, they are needed so that you can
save your application to disk. The disk files are really just one long string
of bytes – like a very long sentence. So, file formats specify how how the
information in a document is converted to a string of bytes.

There is another reason for file formats: they allow documents created
in one application to be opened using another application when the two
applications understand a common file format. For example, you can
create a drawing using the drawing application, save it, then open
it in your word processor.

OK, what’s the controversy

Well, my goal is to just make you aware of idea of file formats and
not go into any controversy. However, you should be aware of the
controversy. So, here it is.

There are two types of formats: open and closed. Open formats are those that
have a written specification for the format which is available to everybody. Anybody can write an
application to read or write a file in that format. Closed ones are
those whose specifications are not public. There are some formats which fall
some where in between – their specifications not public but available
from the creators for a fee and some legal formalities.

But, why should I care

As mentioned above, file formats are vital for data exchange between
applications. For example, your digital camera stores an image in TIFF
format but you can later convert it to JPG to put it on your blog. The
camera need not be programmed with producing every possible image
format because it produces the image in an open format. Anyone can
write an application to read the camera’s captured image in TIFF and
convert it to a different format. It also helps with
future-proofing. Since your camera produces images in an open format,
there is a good chance that it will be supported by new image editing
softwares even when your camera vendor stops supporting your
camera, since anybody can read your image files because its in an open

File formats are used extensively in EDA (Electronic Design Automation), which is where I became
aware of them. EDA vendors produce software to create electronic
designs. Electronic design work happens in steps: first comes the schematic capture,
followed by a simulation and then synthesis and then placement and
routing. The software to do each of these steps can come from
a different vendor and hence the tools need to talk to each
other. However, when a software is being written, the developer does
not clearly know what exact software
will come before and after it. So, the industry has agreed to use
file formats, each software tool is build with ability to import a
standard file format and export data in that standard file format. Problem Solved.

Postscript: One of the popular EDA file format is LEF & DEF created by Cadence
Design System and later open sourced. I worked on the Cadence’s
original LEF & DEF reader and writer which was also the first project I
worked on as a software professional.


1 Comment Added

Join Discussion
  1. tan
    tan Oct 26, 2009 at 9:38 AM

    I agree file formats are quite crucial in organizing data. I have worked/am working with media file formats like mp4/asf etc. With media file formats a key problem is data locality and ability to process packets fast (to enable thin clients). While newer formats will take care of the locality processing requirement is something which I feel could be worked upon. Doing away with variable size headers and variable elements for each media packet would be great.

Post a comment

(Formatting help)