video poker odds royal flush

video poker odds royal flush

free money play poker real

free money play poker real

free craps game for mac

free craps game for mac

casino gambling online portal review

casino gambling online portal review

2006 world series of poker main event chip counts

2006 world series of poker main event chip counts

the signature mgm grand las vegas nv

the signature mgm grand las vegas nv

play free horse racing online games

play free horse racing online games

nfl football odds lines

nfl football odds lines

las vegas oscars odds

las vegas oscars odds

carrie underwood casino rama tickets

carrie underwood casino rama tickets

most drawn numbers florida lottery

most drawn numbers florida lottery

lotto past results texas lottery

lotto past results texas lottery

play roulette free no downloads

play roulette free no downloads

card game rules baccarat

card game rules baccarat

california lotto super result

california lotto super result

las vegas weather november

las vegas weather november

game online play poker

game online play poker

odds poker calculator

odds poker calculator

yahoo free card games

yahoo free card games

on line game gambling casino

on line game gambling casino

state of illinois lottery commission

state of illinois lottery commission

trackside otb off track betting

trackside otb off track betting

video poker odds charts

video poker odds charts

casinos tunica miss las vegas

casinos tunica miss las vegas

free to play online video poker games

free to play online video poker games

astra sports bets football

astra sports bets football

party poker free online bonus

party poker free online bonus

wynn las vegas casino employment

wynn las vegas casino employment

california mega lottery number generator

california mega lottery number generator

imperial palace hotel and casino in biloxi

imperial palace hotel and casino in biloxi

practice free blackjack strategy

practice free blackjack strategy

free gambling addiction information

free gambling addiction information

casinos internet free casino gambling directory

casinos internet free casino gambling directory

best online freerolls poker

best online freerolls poker

final odds preakness

final odds preakness

playing blackjack online in canada

playing blackjack online in canada

texas holdem poker online games free

texas holdem poker online games free

problem gambling treatment

problem gambling treatment

mighty slots free downloads

mighty slots free downloads

bingo free game web

bingo free game web

csi las vegas spoilers sara

csi las vegas spoilers sara

free bingo web site

free bingo web site

free online holdem poker for token

free online holdem poker for token

video poker odds

video poker odds

texas lottery commission jobs

texas lottery commission jobs

300 million jackpot powerball

300 million jackpot powerball

saints sinners bingo free game download

saints sinners bingo free game download

books on how to play texas holdem poker online

books on how to play texas holdem poker online

yahoo games free cards

yahoo games free cards

party poker scanners cheats

party poker scanners cheats

uno game card machine

uno game card machine

mega millions lottery numbers california video

mega millions lottery numbers california video

atlantis big bingo fish free game

atlantis big bingo fish free game

free poker odds software

free poker odds software

poker gambling sites

poker gambling sites

betting gambling sports

betting gambling sports

pc games video poker free downloads

pc games video poker free downloads

horse racing betting tips australia

horse racing betting tips australia

brand new no deposit bonus casinos

brand new no deposit bonus casinos

craps tables for sale

craps tables for sale

tvg horse racing

tvg horse racing

pala resort casino temecula ca

pala resort casino temecula ca
FREE PRINTABLE ABC BINGO CARDS
bingo printable abc

FREE CARDS GAMES BRIDGE

free cards games bridge
CURRENT LAS VEGAS FOOTBALL ODDS
football las vegas odds

MEGA LOTTO CALIFORNIA RESULTS

mega lotto california results
RESULTS LOTTERY NY NEW YORK
ny

LAS VEGAS HOTELS T8

vegasHOW TO PLAY CRAPS AND WIN FREE craps play how to and win

TEXAS POKER DOWNLOAD GAMES

texas poker download games

PLAYING FREE BINGO GAMES ONLINE

games

2007 NEW FREE BINGO SITES

2007 new free bingo sites

ILLINOIS MEGA MILLIONS LOTTERY WINNING NUMBERS

lottery

MICHIGAN LOTTERY MEGA MILLIONS DRAWING TIME

mega michigan lottery millions time
MEGA LOTTO TEXAS
mega lotto texas

MIRAGE LAS VEGAS SPORTS ODDS

mirage las vegas sports odds

DISCOUNT CLAY POKER CHIPS SETS

discount clay poker chips sets
SINGLE DECK BLACKJACK STRATEGY CARD
blackjack single deckGAMBLING ADDICTION HOTLINE addiction

FREE ONLINE GAMES POKER PUZZLE PLAY

free online games poker puzzle play
SPORTS BETTING ODDS NFL POINT SPREAD
spread odds sports bettingLIVE UK HORSE RACING ONLINE live uk horse racing online2006 SUPER BOWL VEGAS ODDS 2006 odds

PLAY FREE BINGO ONLINE FOR CASH PRIZES

play free bingo online for cash prizes

LAS VEGAS REVIEW JOURNAL BEST OF 2005

vegas review best of
TINSELTOWN THEATRE KENOSHA THEATER INFO
kenosha theaterINDIAN CASINOS CALIFORNIA CRAPS california indian

CASINO ROYALE REVIEWS FROM

casino
CALIF SUPER LOTTO WINNING NUMBERS ARCHIVE
numbers lotto archive

CRAPS SYSTEM

system
SLINGO BINGO FREE TRIAL
free bingo trialTAHITI VILLAGE LAS VEGAS NV OFFICIAL SITE site nv vegas tahiti village official
ESPN WORLD SERIES OF POKER 2006 WINNER
espn world series of poker 2006 winnerPOKEMON TRADING CARD GAMES trading

CALIFORNIA MEGA MILLIONS JACKPOT NUMBERS

california mega millions jackpot numbers

PLAY POKER ONLINE GAMES

play poker online games
VEGAS NFL FOOTBALL BETTING LINES
betting football lines

FREE LAS VEGAS SHOWS TICKETS

free las vegas shows tickets

LOTTO LORE 649 ONTARIO

lotto lore 649 ontario649 LOTTO LORE CANADA canada 649 lotto

BONUS GAMES FOR KIDS

bonus games for kids

NATIONAL LOTTERY THE BIG WIN ADVERT

national lottery the big win advert
PAST LOTTERY NUMBERS FLORIDA
past lottery numbers florida

WESTERN 649 LOTTO RESULTS

western 649 lotto results

LOTTO WINNING NUMBERS MEGA MILLIONS

millions mega numbers lotto
INDIAN CASINO GAMBLING AGE LIMIT
indian casino gambling age limit

IMPERIAL PALACE HOTEL AND CASINO LAS VEGAS NEVADA

palace vegas
BABY BINGO
bingo

SOUTHERN CALIFORNIA INDIAN CASINOS AND HOTELS

hotels and southern

LAS VEGAS LIVE SPORTS ODDS

sports lasCALIFORNIA LOTTERY NUMBERS FOR WEDNESDAY california lottery numbers for wednesdayHOT LAS VEGAS DEALS las hot deals

OUTLET MALLS KENOSHA WISCONSIN

outlet malls kenosha wisconsin

YAHOO SPORTS NFL FOOTBALL ODDS

yahoo sports nfl football oddsSUPER LOTTO RESULTS USA super lotto results usa
COLLEGE AND NFL FOOTBALL BETTING TIPS
betting nfl college football tips

CA LOTTERY WINNING NUMBERS RESULTS

winning lottery ca

CALIFORNIA LOTTERY NUMBERS RESULTS

california lottery numbers results

BLACKJACK CINGULAR PHONE DOWNLOADS

blackjack cingular phone downloads
PARTY POKER SCANNERS
party

The Intelligent File Format: Part 1

February 20th, 2006

Category: Conceptual Design

One of the most frustrating things to any computer professional is the wide variety of file formats. Day in and day out we deal with documents, archives, images, multimedia, and other files that only open in a specific program. We try to make our lives easier by using multi-format programs like PowerArchiver, OpenOffice, GIMP, and VLC. However, these programs often fail to render the file contents accurately. Or worse, we come across an old file in a format that’s no longer supported and spend hours trying to find tools to open it.

Perhaps even more frustrating is that many standards have been proposed to solve these issues, but become quickly ineffective as new technologies make the standards obsolete. For example, who is going to encode their video in standard MPEG format when options such as DiVX, Sorenson, and MPEG4 exist?

The problem is even worse for Software Developers who spent inordinate amounts of time reverse engineering formats and code in order to gain compatibility with just one format! Having to reverse engineer a large number of formats becomes an impossible task that encourages developers to find less than ideal shortcuts. (For example, MPlayer redistributes the Windows Media Codecs with a custom linker.)

Some Operating Environments (most notably the Apple Newton) have tried to solve this problem with “standard” interfaces between user programs. This often fails, however, as the programs go out of date, and the interfaces change. What’s needed is a way to magically obtain access to the data inside any file. A way to obtain the structure of the data without reverse engineering someone else’s parser.

A Bit of History

I’ve been long impressed by the number of file systems supported by Linux and FreeBSD. Even with incomplete support, these Operating Systems have made life easy for those of us with dual boot systems and multiple drives. Sadly, many operating systems don’t reciprocate. Some fail because the developers don’t want to support other OSes, but many fail because they can’t reciprocate. The filesystem format may be too new, or perhaps the OS is no longer developed for. Either way, getting even a modicum of support seems difficult, if not impossible.

This lead me to think about ways of improving the situation. One Linux project actually used the NTFS driver from Windows to provide write support. Could the concept work the other way around? Could the Linux driver be compiled for Windows or specially linked against? Possibly. Wouldn’t it be nice if a standard existed for file system drivers?

Then an idea occurred to me. What would happen if the beginning of file systems embedded a driver for accessing the disk? If the driver was in some sort of neutral format (similar to the X Windows drivers), then any OS could access the file system! In fact, the precise format of a given file system would become irrelevant. You could format your disk to the best file system for your need, and feel confident that it would work in any OS.

Or so the theory goes. I wasn’t quite starry-eyed enough to believe that Microsoft, Linus, the FreeBSD Foundation, and Apple would suddenly all become agreeable just because I said so. Still, the concept had merit so I began to work on a prototype for a File-System-in-a-file. My thought was that such a library could prove the concept as well as provide an excellent choice for developers looking to keep all their program data inside a file (ala Access MDB and Outlook PST files) while still allowing the developer to plug in a more robust format at any time. If the concept could be taken one step further, perhaps it could become useful on external Flash drives; many of which are stuck with the sub-optimal FAT file system for compatibility reasons.

The Logical Conclusion

While this concept was exciting in of itself, it didn’t even begin to scratch the surface of what was possible. It wasn’t long before I considered the fact that a file system is nothing more than a hierarchical database. There’s nothing inherently special about it, so why can’t the file system payload be replaced with some sort of other data? As long as the embedded driver can read the format and produce some sort of usable data structure, there’s no reason why the concept couldn’t be extended for all types of data! Images, documents, multimedia, archives, and more could all be converted to self-describing formats.

Of course, like any technological innovation, the concept is not without it’s pitfalls. Issues that need to be addressed are:

  • File Size - Embedding a driver will add overhead to the format.
  • Upgradability - Files are tied to specific version of the format.
  • Interface - How do we link the APIs at runtime?
  • Security - What’s to stop embedded code from launching a virus?
  • Portability - How do we embed code that can work on all platforms?
  • Performance - How do we provide maximum I/O throughput to files that are performance sensitive?

Let’s go over each of these items and investigate the issue in detail.

File Size

There’s no denying that a driver in the header would mean an instant increase in file size. For small files, this can easily double the size. It’s even conceivable that the embedded driver could be larger than the original file itself!

However, there are mitigating factors to consider:

  1. Disk space is cheap. Adding a few kilobytes per file is unlikely to produce any appreciable increase in storage requirements.
  2. Bandwidth in modern systems is far greater than it used to be. Adding a few kilobytes will not increase transfer times to any noticeable degree.
  3. The driver can be compressed using a standard compression algorithm. This may reduce its size considerably.

Upgradability

Since the driver is embedded with the file, files become tied to the specific version of software that they were written with. If the driver software has bugs, these bugs will continue to propagate as long as the file is in circulation. On the other hand, this also reduces the number of version incompatibilities by ensuring that the original software is always available to parse the file.

Many software packages rewrite files anyway, so this is generally not as big of an issue as it may seem.

Interface

One of the more challenging aspects of this scheme is how to link the driver. One would assume that an Image would have a very different interface from a File Archive. Runtime linking tends to be hard enough without adding completely unknown interfaces to the mix. And how do we document the available APIs to anyone who wishes to load the file?

Thus we need a way to store sufficient meta-data about the APIs to allow for proper runtime linking.

Security

The greatest hazard posed by this format is that it allows arbitrary code to run every time a file is loaded. This potentially makes any file into a potential virus, even if it isn’t executable!

What we need is a secure environment to run this code inside of. Such an environment would have to have a foolproof method against accessing the file system, network resources, GUI, and program memory. It can’t allow for buffer overflows, and it must be capable of guaranteeing that the file handle passed to it can’t be used against the parent program.

Portability

Above all else, the embedded driver must be portable. It does no good to invent a universal file format if it can’t leave the confines of the x86 platform. Thus the best solution is to use either a portable scripting language or a language capable of executing on a Virtual Machine.

Performance

With files growing considerably in size, performance has become a major concern. Multimedia files in particular tend to be sensitive to I/O performance, meaning that the scripting language or VM must be capable of using the maximum system throughput without compromising security or portability.

The Obvious Choice

With all these constraints and issues in mind, the choice becomes extremely narrow. Scripting languages like JavaScript and PERL are portable, but tend toward lower performance. Virtual Machines like Smalltalk and .NET have performance, but not high security. The only choice left to us is Java.

The reasons for using Java are:

  • Security is a core feature, not an add-on. Any chunk of code can be perfectly firewalled off from the rest.
  • Java is portable to all major platforms, and can be ported to many more.
  • Java Performance has increased considerably over the years, making it one of the fastest choices on the market. In simple algorithmic usage (e.g. decoders, cryptography, compression, etc.) Java has been shown by many benchmarks to be faster than native code.
  • Java Reflection makes it easy to load a dynamic library, no matter what its source.
  • Java can interface with nearly all languages. If you want to use portable file functionality in your C program, for example, there is nothing stopping you from using a JNI interface to load the data.
  • Java bytecodes are small and compress well. They are regularly much smaller than a comparable native program.

Haven’t I heard this before?

As many readers may note. this concept is not without its precedent. Self Extracting Zip files and installers have commonly used a similar technique to distribute their payloads. While the previous concepts have not been quite as far-reaching as what is described here, they are certainly predecessors to the Intelligent File Format.

XML files have also lead the way by encouraging file formats that are common and easy to share. While the idea of a central repository of all XML DTDs and Schemas never came to pass, the overall concept is still running strong and is the basis for many cross platform protocols such as SOAP and XML-RPC.

Go to Part 2 ->

Digg this | del.icio.us | Slashdot |