XML



All timestamps are based on your local time of:

Posted by: stak
Tags: code
Posted on: 2009-05-02 18:15:43

I was thinking about a conversation I had the other day where somebody was considering using XML for marshalling/transferring data between two networked hosts. I felt, and still feel, that using XML for something like that is a poor choice. After thinking about it more, I realized that XML sucks when used in communication protocols, but is still useful when used as a data storage format.

The key difference, I think, is the fact that communication protocols are only used while there are entities communicating. If all of those entities cease to operate, then the protocol is effectively dead and/or useless. The communication protocol, therefore, is transient in nature. With data storage, though, it is the opposite. If you save a file, that file is going to stay around as long as you want it to, even if all the apps used to manipulate that file no longer exist.

This difference means that the data storage format must be self-documenting, whereas the communication protocol does not need to be. If you want to recover the data even after all the manipulating apps are gone, you need to be able to look into the file and figure out what is what without being able to look at any source code - that's what XML is great at. With the communication protocol... who cares? If all the communicating entities are gone, just invent a new protocol and be on your merry.

The other thing XML claims to be good at is extensibility. The claim is that XML is a well-defined, structured format, and it is easy to create schemas and extend files with more tags/attributes as necessary. While that is true, it is not a property specific to XML. Binary formats can be just as extensible as XML; they're just not as human-readable. You can reserve bytes and bake in room for backwards-compatible expansion into any well-designed binary protocol. And in both cases (XML and binary formats) any expansion to the protocol will require updates to the implementations that read/write the protocol, so there's no magical advantage to XML on that respect either.

The advantage with binary protocols is that they're more efficient - both in terms of bandwidth and processing time. A switch(read()) loop will outperform a SAX parser by multiple orders of magnitude, and so they make far more sense to use in a communication protocol. With data formats, you could at least argue that self-documentation is important for data persistence and recovery, and therefore conclude that XML would be a better choice.

I don't recall anybody ever really making the distinction between these two categories in which XML is commonly used. At first XML was new and cool and people used it for everything. Then there was a wave (as with all new technologies) where some people decided XML was no longer cool and denounced it as bloated and useless. Now its use is split between people who think it is awesome and people who disagree, rather than using it where it is appropriate and where it is not. An unfortunate state of affairs indeed.

Posted by Fai at 2009-05-02 21:18:25
you're confusing the actual protocol (order of operations and what they mean) and the encoding of the protocol. I agree that communication could use a more efficient encoding (Though it is nice that you can read packets on the wire, and parts of it are alrady binary), but I do not at all agree that protocol is useless in communication. What if you had 2 guys who were talking to each other, and one of them died and needs to be replaced? Are you going to rewrite both and invent a new protocol? You obviously also need it for any scaled out service, say the internet. Protocol is more important in communication, not less.
[ Reply to this ]
Posted by stak at 2009-05-02 23:17:14
Yeah, you're right. I meant the encoding of the protocol.
[ Reply to this ]
Posted by Varun at 2009-05-02 22:13:10
I think your friend was getting confused between the different OSI layers - I'd personally opt to use XML for layer 6 and 7, and even in layer 5 for very specific applications, but below that, XML usually gets in the way.

There's also that readability argument that from an earlier blog post about whitespacing. XML being human readable means that you're going to spend less time trying to decode the format, and more time reading said format. Sure, it may be less efficient, but to that argument I say: if you're really in an environment that constrained that you have to think about worrying about the overhead of XML, you shouldn't be considering using XML. For everything else (which is just about every non-embedded system, and a large chunk of embedded systems), XML is an excellent choice to either store data or to communicate it.
[ Reply to this ]
Posted by stak at 2009-05-02 23:21:54
Unless you're reverse-engineering the format, human-readability of the format is a non-issue. You'll be using software tools to do it anyway. Also, see this post for my views on the last bit of your argument.
[ Reply to this ]

[ Add a new comment ]

 
 
(c) Kartikaya Gupta, 2004-2022. User comments owned by their respective posters. All rights reserved.
You are accessing this website via IPv4. Consider upgrading to IPv6!