Bits n' bytes



All timestamps are based on your local time of:

Posted by: stak
Posted on: 2007-12-25 16:32:46

Been almost a month since my last post, so it's probably time for another update...

Life is generally good, although I have a cold now, which is kind of annoying. On the other hand, it was nice enough to wait until vacations started before striking, so I can't really complain. Probably one of the most convenient times to have a cold, actually, since I'm mostly just sitting around at home anyway.

(Inline comments; post continues below.)

Posted by anonymous at 2007-12-25 16:57:51
You are missing the turkey. :P
Name:
Comment:
Allowed expansions in comments/replies: [i]italic[/i], [u]underline[/u], [b]bold[/b], [code]code[/code], [sub]subscript[/sub], [sup]superscript[/sup], [url=http://some.url]linked text[/url]
Human verification: Sum of thirty-three and twenty-two =

[ Add a new comment ]

A few days ago I saw this post on Slashdot. Curious, I took a look at the puzzle and ended up spending most of the night working on it with various random people on the net. It was pretty fun. One of the clues hidden in the puzzle fairly early was a reference to a google group (this one), so people who were working on the puzzle used it to communicate and collaborate (which was the point of the whole exercise).

I went to bed at around 3am after being unable to figure out the final clue; by the time I woke up somebody had figured it out and solved the challenge. Turns out it was put together by N-BRAIN, a software development company, to promote their new collaborative development environment coming out in January. The software itself doesn't sound overly promising (to me at least) but still, the challenge was pretty fun.

One of the guys who participated blogged about it (here and a more detailed analysis here if you're interested). For me, it was just a lot of fun because it combined: (a) solving puzzles, (b) hacking up one-off scripts and programs on the fly to solve a particular sub-piece of the puzzle, or to try and analyze data, and (c) the thrill of trying to be the first to figure out the solution. With that kind of cocktail, how you can you go wrong?

(Inline comments; post continues below.)

[ Add a new comment ]

In more recent news, I've been wanted to eventually move away from GMail. It's pretty hard to do just because GMail is so damn nice to use, but I think in the long run it's probably for the best. Anyway, now that GMail supports IMAP in addition to their broken POP3, downloading the contents of your GMail account is a lot easier. I wrote a quick PHP script (since my local PHP install includes the IMAP module) to download all my email and save it locally.

It actually downloaded my chat logs too, which was nice, but the format of the chat logs is really weird. It seems that chats are broken up into fairly small chunks and stored as separate messages, potentially with a different encoding. That's not too hard to piece back together, but what's really annoying is that the chat data is in HTML and the lines are reverse-ordered in the message, (most recent at the top of the message). That's just plain weird. If that's actually how the data is stored in GMail internally, then the GMail backend must be doing a lot of work to piece the chat back together when you view in the GMail interface. And if that's NOT how it's stored internally, then the IMAP engine must be doing a lot of work to generate that. Makes no sense either way.

Anyway, I had about ~19800 messages to download (including received mail, sent mail, and chat logs) weighing in at about 740 MB. The GMail interface reports that I'm using "685 MB (11%) of [my] 6029 MB." I wonder if the mismatch is due to not counting email headers or something. If so, there might be a way there to trick GMail and store a lot more data than it lets you :) On the other hand, the discrepancy might just be due to attachments being base64-encoded or something. Meh.

More interesting was that on two of the 19826 messages, GMail's IMAP server seemed to choke and return 0 bytes. Not only that, but all messages after that were 0 bytes too (I had to skip past them in order to get the rest). It seemed like the worker processing my requests died on those messages. I was able to fetch the headers for the messages and figure out which message it was, and then opened them up in GMail's web interface. Both of them had attachments that seem to be giving their virus scanner difficulty (the message displayed above the attachments was "Oops... the virus scanner has a problem right now. Download at your own risk, or try again later.") I'm assuming their buggy virus scanner was scanning all the emails before the IMAP server was allowed to push it out to me, and when the virus scanner died on the attachment, that killed the IMAP process too. Unfortunately, I haven't been able to figure out a way to get the original email data for those two messages yet (the "Show original" function doesn't work for those messages either).

(Inline comments; post continues below.)

Posted by stak at 2007-12-25 17:29:59
Oh boy. The chat log thing just got a little weirder. Turns out every so often the GMail server compacts the chat logs so that they're not all over the place. Unfortunately, it does so in the wrong order. So now a chat that chronologically looks like this:

me: hi
you: hi
me: one
....
me: thirty-one
me: thirty-two
....
me: one hundred!
me: bye
you: bye


might get split like so:

message 1:
me: thirty-one
...
me: one
you: hi
me: hi


message 2:
you: bye
me: bye
me: one hundred!
...
me: thirty-two


and then recombined by the compacter into this:

me: thirty-one
...
me: one
you: hi
me: hi
you: bye
me: bye
me: one hundred!
...
me: thirty-two


And at this point, there's no embedded timestamps in the messages, so it's literally impossible to put back together in chronological order. And of course, there's no way for an average user to actually tell anybody at Google about this bug. (Oy, those of you reading this who work at Google, please file a bug report or something!)
[ Reply to this ]
Posted by varun at 2007-12-25 21:09:38
Huh. I hadn't noticed this until you mentioned it. Now that you do, I see the problem. Oddly, the Chats tag appears correctly online, so I wonder why this happens.
[ Reply to this ]
Posted by Ben at 2007-12-28 12:12:24
done.
[ Reply to this ]

[ Add a new comment ]

Anyway, that concludes this edition of bits n' bytes.

[ Add a new comment ]

 
 
(c) Kartikaya Gupta, 2004-2025. User comments owned by their respective posters. All rights reserved.
You are accessing this website via IPv4. Consider upgrading to IPv6!