Random Tales of Total Geekery

The importance of being HTML

Posted by Q on November 4, 2007

HTMLOr, adventures in converting a saved-from-MS Word html file into a readable (& proprietary) BBeB format of Sony Reader.

Since taking the Reader to work everyday, my To-Be-Read pile, is under considerable attack – a one hour journey either ways is extremely good time for gobbling up a book in 2-4 sittings. Since the Reader cache memory is still too under-powered for instant viewing of new books, i try to convert each txt/html/lit/doc/pdf/etc. into .lrf for easy viewing

And, as Art Speigelman says, here my troubles began…

You see, the problem is conversion. My laptop currently plays host to a wide array of converting utilities which include:

  1. ABC Palm converter (for pdb into everything else)
  2. ABC Lit to htm converter
  3. BBeB Binder (for converting txt/html into BBeB)
  4. Book Designer (editing txt/html/pdb etc)
  5. CLit (hehe) (.Lit to html)
  6. Dreamweaver 8 (don’t ask)
  7. HTML Book Fixer
  8. v HTML merger (for combing 2+ html files into 1)
  9. lib prs500 (ofcourse)
  10. PDB Reader Converter
  11. Mozilla Seabird
  12. Page Breeze Html editor

Now, the usual sequence of events, if i were to be lucky enough to find & download the Kingdoms of Thorn and Bone trilogy by Gregory Keyes would be:

first up, i’d discover that the gentle soul who has scanned and proofed the release has done it in three different formats for all three. So the first book is MS Word-to-html, second is RTF, third is non-garbled HTML. So, my ubermensch el geeko instincts kick in:

  • the first book is first opened in MS Word> saved as RTF>> Results in file size ballooning to 4 MB
  • Second approach, Open file in Mozilla Firefox> Ctrl+A, Ctrl+C>New html page in Dreamweaver 8>Ctrl+V>>Says insufficient space to handle operation
  • Third approach, Open file in Mozilla Firefox> Ctrl+A, Ctrl+C>New html page in seabird Html composer>Ctrl+V>>>>Results in an inactive Laptop for 2 mins>character entity reference kicks in> tad difficult to read a character set of &
  • Fourth approach, Open File in Book Designer>clean up HTML>Prepare for second pass surgery from HTML book Fixer (which doesn’t allow _ or [ in filenames)>File rejected for too many errors
  • Fifth approach, Open File in BBeB Converter to try direct conversion>System stalls for a couple of minutes and Operation exits
  • Sixth & Final, Open file in IE> Ctrl+A, Ctrl+C> New HTML page in Page Freeze>Ctrl+V>Save As Keyes, Gregory – The Briar King (v1.1) .html>>Open in BBeB Binder>Save As LRF>Save successful!

Total time taken in figuring out the proper way = 45 minutes

The second and third files did not take as much time. I prefer using the two-pass approach of Book Designer > BBeB binder, whenever it works out. But eventually, it takes me 1 hr to completely transfer 3 files from my Laptop to the Reader in a format of my choice. phew! talk about work.

The same story repeats over a Dozois Gardner Sci Fi collection (16th edit.). The file is a humungous, horrifyingly huge 6 MB unpacked RTF. i pass it through Wordpad to clear unnecessary formatting, save it as a new file from Sea-bird HTML composer, and then manually edit the html to add a Table of Contents and then convert to lrf from BBeB.

Which brings me to the whole point. HTML is without a doubt my favored format for storing files, it has the neatness of a simple txt, gets your formatting done like an RTF, keeps images in check like a PDF and does not change styles. It’s no wonder that idpf, the Open Source format now accepted as the Industry de facto standard is based on an XML architecture (basically XHTML + CSS).

In ebook piracy also, fans usually prefer to bring out new releases in HTML. The only problem being that multiple editors with various degrees of efficacy abound, leading to a huge chunk of files having dirty formatting.

For the reading device makers, i hope some thought into input device formats is also being undertaken. It’s clear that the versatility of html makes it stand head and shoulders above other formats (and no DRM too, hurray!), but unless the pains of conversion are brought under control, there is very little scope for the ebook market – it shall remain a playground for us enthusiasts.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: