Blog

All timestamps are based on your local time of:

[ View: List | Cloud | Calendar | Latest comments | Photo albums ]

Screwing up

2010-01-02 20:05:19

Stolen from Slashdot: The Neuroscience of Screwing Up.

It's an excellent article based on observations of scientists working. I was particularly interested by the DLPFC as described in the article, since I can't see any benefit from having it. Evolutionarily speaking, that should have been eliminated a while ago. And I totally agree with the article that explaining your thoughts and/or problems to somebody else will often directly help you find a solution. I've often started a blog entry about some thought I had, only to realize halfway through that it didn't really make as much sense as I thought it did. Those blog entries never see the light of day, but they do help me filter out some of my misconceptions before they become too deeply ingrained. Add that to the list of reasons why blogging is useful (provided you actually blog about thoughts and ideas, rather than just journaling what you did).

[ 2 Comments... ]

An ode to architecture

2009-12-23 20:29:15

I was reading this article about MinWin and what they're trying to accomplish. I hadn't really understood (or cared) what it was before, but after reading that article I gotta say, I'm really glad that they're doing it. I can feel their pain in trying to untangle large piles of old code, because I've been doing much the same thing at work for the past little while.

There's code that's been lying around for around 8-9 years, predating anybody on my team. There are some fundamental differences between code like this and "new" code. For one, there's nobody around who still understands what the code does, and obviously any documentation is so out of date that it's only purpose is to mislead you. The only way to understand it is by examining the code itself - you have to read it, poke at it, and break it.

I believe that in pretty much any real-world production system, built under real-world constraints, there will be some code like this. In order to maintain it, or even to rewrite it, being able to read and understand code is a fundamental skill. I think of this as another argument against writing documentation. If developers need to acquire the skill of reading code anyway, then you might as well use that skill on new code as well. This makes the documentation redundant.

Of course, that's not all. Usually when reading new code, variable names and classes do have some relationship to the concepts they represent. The older the code is, the weaker that relationship becomes, because the concepts get shifted and skewed whereas the names do not. As far as I can tell the main reason this happens is because it's just a chore to rename things, particularly in systems where the code has been branched into different versions. Integrating fixes after you've renamed things (particularly in Java, where you have to rename the file if you rename the class) is a major pain with no tangible benefit.

I feel the blame here lies mostly on revision control systems. Renaming a file in, say, Bazaar is trivial compared to the same operation in Perforce. An even better revision control system (which incidentally is also my solution to the subjective readability problem) would store the parsed syntax tree of the code rather than a flat text file so that operations like variable renaming could be tracked as a single change and integrated into branches trivially. Such an RCS would also have a long list of other advantages, but I'm not going to get into that until I start writing one :)

Another thing mentioned in the MinWin article is how "countless spaghetti strands extend outwards from the core of Windows to the layers higher up in Windows" - this is basically the programmer's version of dependency hell. When you have dependencies running amok between different parts of the code, everything gets really bad really fast. It becomes easy to end up with circular dependencies - to solve that you either end up compiling both pieces of code as a unit so the compiler can deal with it, or changing one of the dependencies to be some sort of runtime/reflection thing, which makes the code an order of magnitude harder to follow. Code like this is also (by definition) not very modular, and so is hard to unit-test.

When we were writing Mango, one of the design principles we enforced was that even though all the code was compiled together as a unit for production use, the packages in the code were arranged in a DAG. This allowed us to build -- and more importantly, test -- subsets of the rendering engine with each layer adding more functionality to the previous subset. The MinWin team seems to be realizing similar benefits in being able to build standalone subsets of windows for different purposes. In my subjective opinion, of all the design decisions we made, this was probably the single most useful one. Without it, the code would have collapsed in on itself and become an unmaintainable mess within a year, given the rate at which we were churning out code.

Enforcing that design decision from the start was key, though. As I'm discovering with my current refactoring efforts, it is extremely difficult to handle code that was developed without that sort of modularity. Coercing the code into a more elegant design requires several passes of refactoring and lots of time. I'm just thankful I have a smaller codebase to deal with than the tar pit that is Windows.

[ 0 Comments... ]

EEP	2009-12-16 11:18:01

Problem: social networking cloud services (e.g. Facebook) have to rely on advertising for revenue. This obviously annoys users and isn't very reliable.

Problem: people spend a lot of time at work on these social networking sites, in some cases prompting their employers to block access to said sites. This results in lowered employee morale (although arguably better productivity).

Problem: larger companies attempt to replicate the service in-house. Invariably, communication and collaboration software developed specifically for enterprises is crap. In addition, employees need to maintain two different social network accounts with overlapping functions.

Solution: the Enterprise Enhancement Proxy (EEP). A proxy server that is purchased by the enterprise from the cloud operator and placed inside the enterprise firewall. This proxy serves a dual purpose.

(1) It serves as a proxy for all communication to the public cloud, allowing the enterprise network administrators to filter or block certain types of traffic at a more granular level than currently possible (e.g. you probably do not need to publish facebook videos while at work, although status updates might be acceptable).

(2) It interfaces with an enterprise-local database to seamlessly (or maybe seamfully) integrate public and enterprise-specific social data. The resulting view would include social data from both public and enterprise networks, and any interactions with the enterprise data would remain within the enterprise database itself.

So, for example, let's say Twitter develops an EEP and it is deployed at some corporation. When employees on the corporate network go to see their Twitter feed on twitter.com (or any app that accesses the Twitter API), the EEP will intercept the request and mix in tweets from their co-workers into the resulting view. Any replies from an employee to a co-worker will get intercepted by the EEP and get saved to the enterprise database, and will not be visible on the public internet.

This solution allows enterprises to retain full control over their intellectual property while also taking full advantage of the services provided by public social networks. It also provides the social networks with a more reliable revenue stream. The users benefit too by not having to maintain separate social network accounts for public and corporate use. Everybody's a winner!

[ 8 Comments... ]

Gmail downloader

2009-10-15 23:04:13

If anybody else would like to download the contents of their Gmailbox into nice little RFC822 message files, here's a quick-and-dirty (read: no error detection or handling) java program to do it. Requires java 1.6 for the console stuff, but can be ported to 1.5 or 1.4 in a pinch.

import java.util.*;
import java.io.*;
import java.net.*;
import javax.net.*;
import javax.net.ssl.*;

public class GmailDownloader {
    public static void main( String[] args ) throws Exception {
        int start = 1;
        int end = -1;
        if (args.length > 0) {
            start = Integer.parseInt( args[0] );
            if (args.length > 1) {
                end = Integer.parseInt( args[1] );
            }
        }

        Console console = System.console();
        String username = console.readLine( "Enter username: " );
        String password = new String( console.readPassword( "Enter password: " ) );

        SocketFactory sf = SSLSocketFactory.getDefault();
        Socket socket = sf.createSocket( "imap.gmail.com", 993 );
        InputStream in = socket.getInputStream();
        BufferedReader br = new BufferedReader( new InputStreamReader( in ) );

        OutputStream out = socket.getOutputStream();
        PrintWriter pw = new PrintWriter( out );
        br.readLine();

        pw.print( "A LOGIN " + username + " " + password + "\\r\\n" );
        pw.flush();
        br.readLine();

        pw.print( "B SELECT \\"[Gmail]/All Mail\\"\\r\\n" );
        pw.flush();
        for (int i = 0; i < 3; i++) br.readLine();
        String numMsgs = br.readLine();
        for (int i = 0; i < 3; i++) br.readLine();
        if (end < 0) {
            StringTokenizer st = new StringTokenizer( numMsgs );
            st.nextToken();
            end = Integer.parseInt( st.nextToken() );
            System.out.println( "Found " + end + " messages" );
        }

        System.out.println( "Downloading messages from [" + start + "] to [" + end + "]" );
        for (int i = start; i <= end; i++) {
            System.out.println( "Downloading message " + i );
            pw.print( "ZZ FETCH " + i + " RFC822\\r\\n" );
            pw.flush();
            br.readLine();
            String fn = i + ".msg";
            while (fn.length() < 12) {
                fn = "0" + fn;
            }
            PrintWriter file = new PrintWriter( new File( fn ) );
            outer: while (true) {
                String s = br.readLine();
                while (s.endsWith( ")" )) {
                    String t = br.readLine();
                    if (t.startsWith( "ZZ OK" )) {
                        if (s.length() > 1) {
                            file.println( s.substring( 0, s.length() - 1 ) );
                        }
                        break outer;
                    }
                    file.println( s );
                    s = t;
                }
                file.println( s );
            }
            file.close();
        }

        pw.print( "C LOGOUT\\r\\n" );
        pw.flush();

        in.close();
        out.close();
        socket.close();
    }
}

The above code is in the public domain, so feel free to do whatever with it. To use:

javac GmailDownloader.java
java GmailDownloader

If it dies partway through and you want to resume, just add the message number you want to start at as a parameter:

java GmailDownloader 42

[ 7 Comments... ]

OpenBSD install help

2009-10-13 23:09:58

For anybody else out there who's stuck on trying to get OpenBSD booting on a macppc machine (mine is a 12" Powerbook specifically), the magic that is missing from the OpenBSD install manual/instructions is as follows:

After installing OpenBSD, when you power up your system, hold alt+command+o+f to boot into Open Firmware. Then, at the prompt, type "boot hd:,ofwboot bsd" and hit enter. If this successfully results in OpenBSD booting up, you can make this the default behavior by booting back into Open Firmware and typing "setenv boot-device hd:,ofwboot" and "setenv boot-file bsd" at the prompt.

Note that the above worked for me when I installed OpenBSD to take up the whole disk; I don't know what the procedure is if you're dual-booting with Mac OS or any other OS.

[ 0 Comments... ]

Compromise

2009-08-11 22:26:27

Schneier's post on self-enforcing protocols reminded me of a similar scheme I thought of a while ago for negotiating the price of something when there's a single buyer and seller. In order to get the best price for both parties, the buyer should write down the maximum price he is willing to pay. The seller should write down the minimum price he is willing to accept. If the buyer's number is greater than or equal to the seller's number, then the price is the average of the two numbers. If the buyer's number is smaller, then there's no deal. The key is that this process happens only once, so neither side has a chance to cheat and adjust their number later. If they want the deal to actually happen then they should be honest about their max/min bids, and that automatically results in the fairest price for both parties.

[ 1 Comment... ]

The golden goose

2009-07-08 08:13:55

Item the first: Google pushes Apps out of beta
Item the second: Security expert blesses Google Native Client.
Item the third: Google announces the Google Chrome OS.

I had pretty much given up hope on GoOSe ever becoming a reality, but it looks like it's here at last. Now that it's here, though, I'm not all that sure it's such a good thing. I believe this OS will be the next Windows - Google didn't come up with Google Native Client for no reason. Add that to the mix and you have the perfect OS. Unfortunately, everybody using this gives a lot of power to Google. Far more power than Microsoft ever had, because Google will have all your data.

And I'm still waiting for those kiosks.

[ 0 Comments... ]

XML	2009-05-02 18:15:43

I was thinking about a conversation I had the other day where somebody was considering using XML for marshalling/transferring data between two networked hosts. I felt, and still feel, that using XML for something like that is a poor choice. After thinking about it more, I realized that XML sucks when used in communication protocols, but is still useful when used as a data storage format.

The key difference, I think, is the fact that communication protocols are only used while there are entities communicating. If all of those entities cease to operate, then the protocol is effectively dead and/or useless. The communication protocol, therefore, is transient in nature. With data storage, though, it is the opposite. If you save a file, that file is going to stay around as long as you want it to, even if all the apps used to manipulate that file no longer exist.

This difference means that the data storage format must be self-documenting, whereas the communication protocol does not need to be. If you want to recover the data even after all the manipulating apps are gone, you need to be able to look into the file and figure out what is what without being able to look at any source code - that's what XML is great at. With the communication protocol... who cares? If all the communicating entities are gone, just invent a new protocol and be on your merry.

The other thing XML claims to be good at is extensibility. The claim is that XML is a well-defined, structured format, and it is easy to create schemas and extend files with more tags/attributes as necessary. While that is true, it is not a property specific to XML. Binary formats can be just as extensible as XML; they're just not as human-readable. You can reserve bytes and bake in room for backwards-compatible expansion into any well-designed binary protocol. And in both cases (XML and binary formats) any expansion to the protocol will require updates to the implementations that read/write the protocol, so there's no magical advantage to XML on that respect either.

The advantage with binary protocols is that they're more efficient - both in terms of bandwidth and processing time. A switch(read()) loop will outperform a SAX parser by multiple orders of magnitude, and so they make far more sense to use in a communication protocol. With data formats, you could at least argue that self-documentation is important for data persistence and recovery, and therefore conclude that XML would be a better choice.

I don't recall anybody ever really making the distinction between these two categories in which XML is commonly used. At first XML was new and cool and people used it for everything. Then there was a wave (as with all new technologies) where some people decided XML was no longer cool and denounced it as bloated and useless. Now its use is split between people who think it is awesome and people who disagree, rather than using it where it is appropriate and where it is not. An unfortunate state of affairs indeed.

[ 4 Comments... ]

The art of cereal

2009-04-17 22:53:55

So, since there was such a rabid response to my Facebook proclamation that I would determine the best cereal once and for all, I have decided to blog my cereal battles for everybody to follow along (and also for myself to keep track). I've spawned off another sub-blog for it. The first battle is here and in general you can see all the cereal battles at this URL (also has it's own RSS feed). Enjoy!

[ 0 Comments... ]

The Nature of Challenge

2009-04-03 22:22:42

There's a documentary in 4 parts on Youtube about the philosophy of parkour. It's the best one I've come across at explaining parkour to beginners/non-practitioners, and is well worth the time to watch, even if you care nothing about parkour. It's a bit long (each part is just under 10 minutes), so feel free to watch it in pieces.

The Nature of Challenge - Part 1, Part 2, Part 3, and Part 4.

[ 3 Comments... ]

[ « Newer ]

[ View: List | Cloud | Calendar | Latest comments | Photo albums ]

[ Older » ]