File Format Hell

I’ve had a long journey with text file formats.

First there was… uh… whatever MS works saves files as. That was back in the Windows 3.1 era. That sucked. Then we went to Word… MS Word, and MS Word would not open MS Works files. Go figure. So there was the lovely time of opening files, ripping text from 50 pages of bad binary translation and putting them in Word.

Ok, so after Word I found Wordperfect. No, not that crappy blue screen one either, I found the joy, the wonder, the amazement of Wordperfect 8. To this day, it brings a tear to my eyes. Wordperfect opens Word, great! Absolutely nothing opens Wordperfect except, Wordperfect. Not so great.

Then later I got a copy of Lotus Smart Suite. LWP files baby. Used that for a while, mainly because of Approach. Office 2000 came out, tried that (yet again MS changes Word file formats. Meanwhile WPD files can be opened by any version of Wordperfect after 6…)

Then on to WordPerfect 9. oh, how I miss thee.

There was even some StarOffice in there, and OO.o version, uh… something. (did they do an 0.9 release? Maybe it was just 1.0.. )

Ok, then I go to a Mac.

So after much crying and pouting and general sleepless nights I get MS Office. Thank the Gods that MS Office for the Mac opens MS Office for Windows files. (Yes I was worried, don’t you remember Word 6 -> Word 95 issues? But they are the same… NO THEY ARE NOT!)

But.

Yup, what the hell do I do with SWF, LWP, WPD, WKS (i think) various Quattro Pro and 1-2-3 files (we don’t even think about databases, we learned that lesson from Access 2000)??

Well, again, there was crying and screaming.

Out comes NeoOffice (OpenOffice.org for the Mac). With it comes along ODT. Sounds great, sounds wonderful, sounds like another file format… And I realized that for 99% of my stuff i don’t need all that crap. (it is really just crap) What I need has been in front of me the whole time.

RTF

So that is my new fixation, moving everything to RTF.

Now now, you wait, Mr ODT, I ain’t saying nothing bad about you. In a way you are the next hope, possibly the next RTF. As an open standard there is hope that ODT will find itself on any platform out there. That would rock. Then you could get all the crap in there too. (and spreadsheets would be nice, I like spreadsheets.)

But for now, RTF (and even maybe TXT as back-ups) will do just fine. The goal is the future. A future where I don’t have to find a PC to install Lotus on to get a file that I swore I had backed up somewhere else in WPD format which Neo opens now, but I didn’t so NOW I gotta find a find one.

And do you know how hard it is to find a PC in a Mac house? Sheesh.

Bittorrent File System

This is a paper to discuss an idea for a distributed cloud computing system. This system would use nodes to distribute and hold the data that are non co-located.

From the front end, the system would be identical to any other cloud computing solution. The user could make calls to retrieve or store data from the web (or other applications).

On the back end, the system would differ from a standard cloud system in that instead of being an array of centralized servers, the system would use a P2P method of distributing the data and loads throughout itself.

Let’s take the example of incoming data.

Data is sent to the cloud. This data is then processed by the RAID software and divided as required (RAID 5, 7, ETC). The distributed part then takes each part of that raid split, call them bits (not those bits) for this paper.

Now each bit will be distributed to multiple clients via a bittorrent like P2P system. One bit would then be copied on 2, 4, 8 nodes (however many deeded needed for reliability).

Then when the data is called, the raid will call the data just as it would normally. However, the system will retrieve the data in this bittorrent fashion by calling it from the available nodes. Nodes that have dropped out or are slower will be tagged and recorded, the data will be pulled from the better sources.

Once assembled by the bittorrent, the bit will be presented back to the RAID as desired. The RAID will assemble the file as required and present it back to the web request.

As stated, the nodes themselves will be tagged and recorded for performance when needed. Highly reliable nodes will be called on more frequently than lower reliable drives. The system would ensure that bits lived on a certain percentage of higher reliability than on the lower reliability. The bittorrent client would use this data to shift bit data onto different nodes, populating data as required.

Lower reliability nodes are not completely useless. They can be used to help with this repopulation as well as for storage of lower requested data. This logic would be based on available cloud space, amount of traffic, even peak times and peak availability.

Now the question is: Why? Why divide this system up into all of these nodes and introduce another step in the process over the current system? The answer is that this is not the current system. This is where the ‘distributed’ part comes in.

For the nodes we create a client program. This is installed on a computer and allows for configuration of amount of space, location, etc. Then this computer is now a node. So instead of setting up a huge server farm, node software can be installed on multiple computers spanning anywhere there is an internet connection.

Imagine if all the computers in a college computer lab donated just a single gig of space to being a node. Then maybe all of the computers in an Apple store or a Best Buy. A PS3 or XBox client could be made to contribute. Even an iPhone or Blackberry client offering as little as 10megs of space could be used.

This would significantly reduce outside influences on data availability. Things like power outages, natural disasters, even high traffic load due to sporting events could be worked around by spreading the data geographically both on an individual node level and on a node collection level.

The space would be tallied, prioritized by various parameters and prepared for use.

Security will be a concern of people using this. How do I know that my data is safe out there on someone’s personal PC? First only a small part of any one file would be located on any one node. The next upstream information available to the user is the bittorrent client which only knows where other copies of the bits are. The user would have to go up an additional step into the RAID and then back down through the bittorrents to find a usable chunk of any one file.

This is the same argument for the security of the node provider as well. For example, should a pirated movie be uploaded to the cloud, the nodes themselves would get parts so small that none of them would have enough for a reasonable argument that it was known to be there.

Extra security could be imposed by making the node into an encrypted image. This would further ensure the security, but may have negative impact on the speed of the node. This would need to be investigated.

This distributed cloud computing allows for a more robust system by decentralizing the hardware as well as allowing for expandability beyond boundaries such as building size and electrical power. It would take the one last part of open source cloud computing, the cloud itself, and allow it to be open as well.

A system such as this could be used in a grand scale, one large cloud, or in smaller forms, several small clouds that could be specialized. Just like bittorrent itself, there could be multiple gateways (torrent trackers, or cloud cars?) into the cloud.

Life After the Word Proccessor?

I’ve been using NeoOffice for the Mac since its 0.0.1 stage. My wordproccessor needs were pretty simple: Wordperfect. Since that wasn’t possible being a Mac and all, I went for a different approach: Free.

NeoOffice, which is a verison of OpenOffice.org with a whole bunch of Mac-awesome packed in it, has come a long long way since those first experimental patches that allowed it to do things like print. And in that time when I wanted to write anything, stories, newsletters, posts, notes and ideas, I would fire up Neo, write it down and save it.

So what happened? I have all of these files on my computer. A single book idea can take folders within folders, files upon files. Character sketches, outlines, scene ideas, background stories, and of course the work itself.

I started looking into other things other ways of storing information. For my first try I had some basic criteria: portable, cross platform, easy to use. First thought was a Wiki. I set up MediaWiki on one of my sites. This, however, created the need for the internet. So I threw in another requirement: offline.

I found a wiki-on-a-stick called TiddlyWiki. A single HTML file you store on your thumb drive, your dropbox, anywhere you want basically, it lets you do Wiki-ness and Journal-ness. I used this for ideas, characters, research (i think half is just wikipedia links) and ocasional scene writing. This was my scrap paper, my non-linear notebook. One day I’ll show it off.

Later I participated in some Mac software bundle. I believe it was Mac Heist 2, but i could be wrong. I came with a program called Mac Journal, which I have blogged about here as my new ‘toy’. It hooked up to this blog, downloading my content, and letting me upload from it.

I started using it for a notebook, weaning off of the TiddlyWiki slowly. It was Mac only, so I still had that portable itch, but it was good for notes and research for sure. Without the Wiki-ness it didn’t have the internal links (like linking a charater’s name from an idea to the page of his sketch), but allowed for more robust entries. TiddlyWiki was a text file only. Mac Journal allowed for images and video as well. Along with some Mac-awesome.

Months later I am only kinda sold. It is a great too to store information. I use it for school, recipes, and general scrap paper. But for writing? When I open that TiddlyWiki to look something up, it still FEELS more useful.

One thing I am trying to avoid is having TOO many note taking programs. I did try Evernote, which helped with the portable problem, but its client doesn’t hook up to WordPress. There was a few others, but in the end I ditched them all, not because they were bad, but because I was spreading myself too thin. Why have files in Google Docs, TiddlyWiki, MediaWiki, Dropbox, harddrives, thumbdrives, saved on my iPod, on my phone… see where this is going? Soon you can’t find anything which is way worse than the inconviences of 50 files per story.

As it stands now, I still use NeoOffice to write my stories. (Next post is about that) but for notes, outlining, etc, I currently have Mac Journal, which is a fantastic program btw, and TiddlyWiki. I think as long as I have the XO, the tiddlywiki will stick around.In the end, quick and cross platform is just too good to give up.