Visit Citebite Deep link provided by Citebite
Close this shade
Source:  http://blog.unto.net/work/on-rss-and-atom/
On RSS and Atom
July 4th, 2006 by DeWitt Clinton

RSS is great. No, I’ll go further than that. RSS, as a representation of an idea, is perhaps the single most influential cultural shift of the post-2001 technical and business community. RSS is the embodiment of the notion of sharing and syndication. Businesses will do well the heed the lessons being taught by people like Dave Winer and Robert Scoble. Users and customers alike want open access to data, and the ideas behind RSS will go a long way toward realizing their needs.

That said, RSS (the format) itself isn’t always the answer. I worry that people are sometimes pushing a particular implementation (RSS 2.0) over the ideas behind the technology (content syndication). That’s not to say that the marketing message of “adopt RSS, your users will love you” is a bad one. It’s not; it certainly helps drive the concepts home in a concrete way that anyone, even the non-technical, can understand.

As important as it is as a cultural shift, RSS 2.0 as a format does have a few shortcomings. And one of those shortcomings is particularly worrisome, as it diminishes the overall value of syndicating content to begin with.

The issue is technical but hopefully a simple illustration can demonstrate:

Can you spot the differences between the following snippets of RSS:

  <description>
    The quick brown fox jumps over the lazy dog.
  </description>

  <description>
    The quick brown fox &lt;em&gt;jumps&lt;/em&gt; over the lazy dog.
  </description>

  <description>
    <![CDATA[The quick brown fox <em>jumps</em> over the lazy dog.]]>
  </description>

If you’re a human then you’ll probably have no problems spotting that the first one is plain text, the second one is XML-escaped HTML, and the third is HTML wrapped in an XML CDATA section. If presented in a web browser, in a HTML <div/> tag perhaps, then a human will have no trouble interpreting the content.

But if you’re a computer, it isn’t quite that easy. To a computer, the contents of a RSS <description/> element are opaque. The best a computer can do with it is hope to render it for a human to interpret.

This works fine for the bulk of syndicated content on the web today. Blogs can spit out XML-escaped content and blog readers can display that content for a person to read.

But what if you wanted to put something interesting inside a syndicated content feed? What if you wanted to put valid XHTML in a feed? You went through the trouble of writing XHTML, why should it be flattened to an opaque blob of “maybe plain text maybe escaped HTML but I’m not really sure”?

What if you added semantic microformat markup to your HTML? If you’re using an opaque data format, then you may as well have spared yourself the effort, as no client will know it’s there.

Or what if you wanted to put some other structured data in your syndicated content feed? Geospacial data, perhaps. Product data. Or perhaps Google’s GData format. If it’s syndicated over RSS, no one will ever know.

So the problem is that the RSS syndication format is that it is lossy. Lossy insofar as information you had when writing the data is lost when it is passed over the wire.

Again, this isn’t a problem for many of the early scenarios in the blogging world. But as we learn that more and more content can and should be syndicated, the format itself can either help or hinder our application’s capabilities.

Fortunately all is not lost. While I don’t want to get embroiled in a format war, I will say that I’ve found the Atom 1.0 standard to meet the needs of nearly every single problem that I’ve thrown at it. Amazingly so, actually. I’ve been consistently impressed with how well the authors of the Atom syndication format anticipated the needs of the advanced content syndication community. There has yet to be a use-case that I’ve explored — and I work with some thorny ones — in which Atom has let me down.

That, and the Atom Syndication Format specification is the single best technical spec I’ve ever read. Seriously, give it a read just to see good spec writing in action. It’s concise, accurate, unambiguous, and contains the right amount of illustrative detail.

Atom 1.0 addresses the issue of opaque content by including a very simple, but fully-defined, “type” attribute on all elements that can contain content.

For example:

  <content type="text">
    The quick brown fox jumps over the lazy dog.
  </content>

  <content type="html">
    The quick brown fox &lt;em&gt;jumps&lt;/em&gt; over the lazy dog.
  </content>

Or even:

  <content type="xhtml" xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <xhtml:div>
      The quick brown fox <xhtml:em>jumps</xhtml:em> over the lazy dog.
    </xhtml:div>
  </content>

The content generator (a human editor, a blog authoring tool, a publishing protocol such as APP, or anything else), has full knowledge of the type of content being syndicated. If you syndicated that content via a format like Atom, then that data is not lost forever.

For more details regarding the technical differences between RSS and Atom, I recommend reading this page on RSS 2.0 and Atom 1.0 compared. In the article the authors outline several other important advantages of Atom. Though to me personally it is the simple issue of content type that makes the rest of the issues pale in comparison.

Put it this way — I couldn’t be doing half of the work that I’m doing right now on search syndication without Atom. Sending back search results snippets over RSS is one thing. Syndicating rich search content is an entirely different thing, and that requires a non-lossy syndication format.

My recommendation to application developers today is to use Atom 1.0, not RSS, as the basis for your content syndication.

You should absolutely continue to read and support RSS of all flavors. As Postel said, “be conservative in what you do; be liberal in what you accept from others.” In the context of content (or search) syndication, this may mean being able to read all sorts of formats, but only writing the one format that preserves the data the best. And today, I believe that format to be Atom 1.0.

Fortunately, as Atom 1.0 preserves more information that RSS 2.0, it is trivial to transform an Atom feed into a RSS feed. A simple XSLT suffices to provide RSS output when you absolutely need it. Of course, now that Atom 1.0 has been ratified as a standard, it is unlikely that any major application won’t support it natively.

I don’t really care if RSS becomes a generic brand name for content syndication, just like “Kleenex” has for tissues. I think it is fine if engineers recommend to their directors, “we should support RSS in our applications. Content syndication is what our customers want.”

Though it might be a good idea to add “well, we’re actually building it using Atom. It’s the same thing, only a little better, and we can still speak RSS with people who need it. But don’t worry, that’s just a technical detail.”

It is just a technical detail. But it is a technical detail that we, the engineers, should be very concerned with…

41 Responses to “On RSS and Atom”

  1. Sam Ruby Says:

    You are missing the required xhtml:div in your final example.

  2. DeWitt Clinton Says:

    Whoops. Thanks, Sam. That was embarrassing…!

  3. Taka Says:

    >>> If you’re using an opaque data format, then you may as well have spared yourself the effort, as no client will know it’s there.

    This is no longer true since we’ve just released the latest version of Awasu that addresses exactly this issue. “Metadata modules” let Awasu extract specific (non-RSS) information out of an XML feed (using XPath) and associate it with the parent feed item in the archive database:

    http://www.awasu.com/weblog/?p=324

    You can see screenshots of the various metadata modules we’ve already written in action here:

    http://www.awasu.com/downloads/2.2.3/alpha2/demo/

    Each pic is a feed item (as it appears in Awasu) that shows some additional information that was emebdded in it and has been extracted by Awasu. And because it’s driven by config files, anyone can add support for any “opaque data format” they please.

    We’ve long seen RSS as simply a transport layer for information, not just marked-up text for display to a user, but structured, customer-specific data that can be extracted and used by a smart client:

    http://www.awasu.com/weblog/?p=240
    http://www.awasu.com/weblog/?p=312

  4. Taka Says:

    Oops, I quoted the wrong thing. It shoulda been:

    >>> If [your structured data is] syndicated over RSS, no one will ever know.

    It’s been a long day and it’s very early in the A.M. Sigh… :-(

  5. Sam Ruby Says:

    Taka’s example feed appears to be Atom 1.0.

  6. My Manual Attention Recorder » On RSS and Atom Says:

    […] http://www.unto.net/unto/work/on-rss-and-atom/ […]

  7. Scobleizer - Tech Geek Blogger » Lead engineer at Amazon recommends Atom over RSS Says:

    […] DeWitt Clinton has a long, and thoughtful, post on RSS and Atom today. If you’re a geek/developer it’s worth reading and thinking about. Thanks to Niall for linking me to it. […]

  8. Taka Says:

    >>> Taka’s example feed appears to be Atom 1.0.

    Atom is the only format that lets microformats like hCalendar be embedded in item descriptions in a non-opaque fashion.

    We agree with everything DeWitt had to say about the advantages of Atom over RSS. If you just want to move a bit of HTML around, RSS is fine but as soon as you want to do anything remotely serious, its shortcomings become very apparent and you have to switch over to Atom.

  9. Randy Holloway Unfiltered » Should developers prefer Atom over RSS? Says:

    […] Earlier today I read a post from Dewitt Clinton (via Niall), lead engineer for Amazon’s A9. In his post, he is recommending to developers that they prefer Atom over RSS for applications that support syndication. The reason, in essence, is that Atom is more expressive and supports more scenarios in a technically superior way for developers. […]

  10. Raja Says:

    Whether we like it or not, RSS has become the lingua fraca of content synidcation. Even if it is inferior to Atom (I don’t know if it is), having multiple standards would hurt the idea of content syndication. The best thing is for everyone to follow one standard (RSS would be the logical choice because it is the most ubiquitous) and improve it as things evlove. If there are multiple camps, then everyone loses. Please learn from the past history. Don’t repeat the same mistakes.

  11. re:Domino | RSS or ATOM? What’s your opinion? Says:

    […] . Here is his blog entry about On RSS and Atom, he then posted a follow up to respond to some of the comments. […]

  12. RSS Part 2 » 7 And A Crescent Says:

    […] This morning I had the pleasure of reading a really insightful article by DeWitt Clinton on the technical aspects of RSS. In the article, DeWitt talks about some of the reasons why he likes the Atom sydication format over the RSS format. One of the commenters on the article named Raja had this to say: […]

  13. TAG Says:

    > single most influential cultural shift of the post-2001 technical and business community.

    There is nothing new in RSS. Similar technology - Active Channels was installed on user desktops with IE 4 in Windows 98.
    http://en.wikipedia.org/wiki/Channel_Definition_Format

  14. Ugo Cei Says:

    TAG: And nobody used it, so it never had any kind of influence nor did it produce a cultural shift. No wonder nobody remembers it.

    It’s true that RSS isn’t particularly innovative, technically, but that’s not what counts.

    Raja: that’s just FUD. We should not desing and implement a better format because of fear that it will destroy the syndication landscape? That’s akin to saying that ODF shouldn’t be promoted beacuse it hurts the idea of document exchange between word processors: after all we already have Word format and other applications like OpenOffice are able to process it, so DOC is the “lingua franca” of document exchange. Gimme a break.

  15. ~ awasu ~ Says:

    […] It all started when DeWitt Clinton, a senior developer at Amazon’s A9 project wrote about the advantages of Atom over RSS: But what if you wanted to put something interesting inside a syndicated content feed? What if you wanted to put valid XHTML in a feed? You went through the trouble of writing XHTML, why should it be flattened to an opaque blob of “maybe plain text maybe escaped HTML but I’m not really sure”? […]

  16. Reverberations » Blog Archive » RSS vs Atom Revisited Says:

    […] Atom is technically superior, more comprehensive and has more possibilities. From an engineer’s (and a purist’s) point of view - agreed. But RSS […]

  17. Kosso Says:

    Does Atom support enclosures. And multiple ones at that?
    If so, I would look at creating a toolset to podcast in both formats.

    However, that does not mean feeds won’t be broken. So many publishing tool are broken. RSS is ’simpler’ than atom.

    If you could show me an example of a podcastable feed in Atom, I’ll make a tool to publish that feed and enclosure.

    I don’t want to fan a feed war, but I want to judge by trying to build a feed publishing tool which works.

    regards,
    Kosso

  18. kosso’s braingarden » Atom and RSS - again… Says:

    […] A response to this post from the lead engineer at Amazon. […]

  19. RSS vs Atom, Ontological Drift at The Sound of Crickets Chirping Says:

    […] Atom wins (sort of). I second the sentiments expressed by DeWitt Clinton (of Amazon A9 fame): On RSS and Atom. […]

  20. Roger Benningfield Says:

    DeWitt: I agree with you about the pleasures of the @type attribute on atom:content, but bear in mind that things aren’t as rosey as the seem.

    Take your example, for example. :) Some Atom parsers are going to choke on it because you’ve declared the XHTML namespace on the content element rather than on the div itself, which is the more common practice. Others may choke on the namespace prefixes within the content, or simply pass it out to a browser which will promptly fail to render it as unknown markup.

    RSS caught on because anyone with a minimal XML toolset could parse it effectively. With Atom, OTOH, you’ve gotta have some pretty robust tools, or hack together some workarounds based upon your knowledge of the format and the syndication world in general.

  21. The Reach » Blog Archive » Atom Vs. RSS As A Content Syndication Preference Says:

    […] “The gloves come off”, as we would say here in the U.S - in other words, a fight has begun.  Competing specifications for content syndication, long thought a dead issue, have been revived by this post from DeWitt Clinton and response by Robert Scoble.  DeWitt then follows up with another response, for a good ongoing discussion. […]

  22. DeWitt Clinton Says:

    Roger:

    Thanks for adding that valuable point about client support re: Atom. While Atom does make some things easier (and other things possible) for the technically sophisticated client, you are totally correct that it may also introduce some new complexity.

    Since we’re on the topic, I’m actually happy with the way Atom handles XML namespaces. As you clearly understand, the namespace scoping itself is standard XML, and it is there to keep the entire document well-formed. If the client uses a conformant XML parser then the namespaces should automatically be handled appropriately.

    That said, I imagine that many of today’s RSS clients don’t bother with full XML compliance. (RSS doesn’t have its own namespace URI, if I recall correctly.) But Atom clients will probably be using off-the-shelf XML parsers, as conforming to XML standards is certainly a goal of the Atom specification.

    I’m glad you commented on this — it is important to look at both sides of that equation.

    Cheers,

    -DeWitt

  23. Sam Ruby Says:

    “RSS caught on because anyone with a minimal XML toolset could parse it effectively.”

    Counter examples can be found here:

    http://www.intertwingly.net/blog/2004/05/28/detente

    The truth is that every parser will have bugs. And as of the summer of 2006, robust, namespace aware XML parsers are available for pretty much every language on every platform.

    I’ll also note that the trend with Microsoft and Mozilla based tools is to parse even old versions of RSS with a “real” XML parser.

    I agree with DeWitt’s basic point. Consumers should do the best job they can with data they find in each of the various RSS formats out there. But with Atom, the intent of even the example that Roger cites is perfectly clear.

  24. Content & Communication » Blog Archive » RSS v Atom Says:

    […] Scobelized post on the differencs between RSS and Atom. He comes down heavily on Atom’s side. I am invovlved in promoting RSS feeds (another debate in itself) from the BBC through projects like Feed Factory. We currently support flavours 1.0 and 2.0 of RSS but not Atom. Although this is live debate and open to change. […]

  25. Mark Says:

    > Does Atom support enclosures?

    Yes. [link rel=”enclosure” href=”…” type=”…”] Each entry can have as many as you like. Each enclosure may also have a length attribute, but it’s optional.

    http://www.atomenabled.org/developers/syndication/atom-format-spec.php#rel_attribute

  26. Gizbuzz » Battle of the feeds Says:

    […] A lead developer at Amazon has some interesting thoughts on the merits of the two leading syndication XML formats, Atom and RSS. Interestingly he comes out firmly on the side of Atom: While I don’t want to get embroiled in a format war, I will say that I’ve found the Atom 1.0 standard to meet the needs of nearly every single problem that I’ve thrown at it. Amazingly so, actually. I’ve been consistently impressed with how well the authors of the Atom syndication format anticipated the needs of the advanced content syndication community. There has yet to be a use-case that I’ve explored — and I work with some thorny ones — in which Atom has let me down. […]

  27. Randy Charles Morin Says:

    You said: What if you wanted to put valid XHTML in a feed?

    Then do it. RSS allows XHTML too! People have been doing it for years.

  28. James M Snell Says:

    Randy, to be clear, RSS has to be extended to support XHTML. The RSS spec does not allow for XHTML directly (or any content type other than text and escaped markup) meaning that if you want to do anything beyond simple blogs and news type content, you have to hack and extend the format in ways that most current feed readers will be incapable of supporting. Atom solves that problem.

  29. inkBlots » XHTML in RSS 2.0: Pipe Dream? Says:

    […] The RSS Public board has started discussing XHTML in RSS 2.0 again, sparked by Amazon’s DeWitt Clinton’s post about RSS versus Atom. […]

  30. links for 2006-07-06 at Known Stranger Says:

    […] DeWitt Clinton’s Unto.net » Blog Archive » On RSS and Atom Atom vs RSS (tags: web) […]

  31. Roger Benningfield Says:

    DeWitt: “If the client uses a conformant XML parser then the namespaces should automatically be handled appropriately.”

    Well, it kinda depends on what you mean by “handled”. :) I can handle the XML all fine and dandy, but when I toString() the XHTML payload so I can stuff it in a database, I get the namespace prefixes along with the XHTML. That leaves me falling back to regex, which is never good for anyone.

    Sam: “And as of the summer of 2006, robust, namespace aware XML parsers are available for pretty much every language on every platform.”

    Namespace-aware and Atom-friendly aren’t the same things, unfortunately. A little company called Adobe makes a product that parses XML and yet has no idea that xml:base exists, for example. Because I know Atom pretty well, I know to expect people like you and Tim and DeWitt doing funky stuff with your feeds, and can brute-force my way through the problem. Not everyone is in my position.

    Is that an Atom problem? Absolutely not. As I said earlier, I’m a fan of atom:content in particular, and have been encouraging people to support its use in RSS for ages. But IMO, folks need to be conservative with it throughout the foreseeable future.

    * Don’t namespace-prefix XHTML content.
    * Avoid use of xml:base.
    * Don’t prefix core Atom elements, either.

    …and so on. Anything else, IMO, is just gonna slow client uptake by the weekend script warriors who gave RSS its initial boost.

  32. Sam Ruby Says:

    Roger, xml:base is a prime example. Weekend script warriors produce feeds with relative URIs aplenty. The Feed Validator has warned about this since, well, basically forever, but such warnings have proven to be about as successful as the prohibition against alcohol was nearly a century ago.

    It is true that xml:base can’t be handled automatically by a generic parser, as specific knowledge as to which attributes and elements are URIs is required.

  33. alexking.org: Blog > Around the web Says:

    […] On RSS and Atom […]

  34. Just Shelley » Safe for eyes…maybe Says:

    […] As such, I agree with DeWitt Clinton that providing type information for syndication feed consumers is imperative–especially if you have sites that provide a great deal of structured data. Where I don’t agree is that I don’t provide multiple feeds at my site. One feed is sufficient. […]

  35. Subject Code » links for 2006-07-08 Says:

    […] DeWitt Clinton’s Unto.net » Blog Archive » On RSS and Atom (tags: xml programming) […]

  36. Kirit Says:

    The xml:base thing is a red herring to anybody who already produces RSS 2.0 feeds. I had to solve full URI for my RSS 2.0 and just do the same thing with my Atom feed. There’s no way I’m going to assume anything nearly XML conformant when I’m publishing.

    I can’t really see anybody not also producing RSS if they’re producing Atom, especially for web sites. Other use cases will be different, but many of them already have their own XML document types (like NewsML). These may get eaten by Atom in the future, but that’s a whole different ball game to Atom v RSS.

  37. Dan Kohn Says:

    Kirit said: “I can’t really see anybody not also producing RSS if they’re producing Atom, especially for web sites.” Does Google count as anyone? See , which has no RSS equivalent.

    Given that all feed readers now support Atom in addition to RSS, there is no longer any reason to support RSS. Just create an Atom feed.

  38. Dan Kohn Says:

    That URL was: h**ps :// username:password@mail.google.com/mail/feed/atom

  39. ~ awasu ~ Says:

    […] The other major feature addresses some of the questions that people around the net are starting to ask: (1) can I get at the extra information that many publishers are embedding in their feeds and (2) can I actually do anything with all this information coming in (other than just sit in front of the screen and watch it fly by)? […]

  40. kL Says:

    And the holy XSLT transform you’ve mentioned is here:
    http://atom.geekhood.net/

    Actually it wasn’t as trivial as it seemed – there are quirks like html2text or iso2rfc date conversions… but it could be done.

  41. DeWitt Clinton Says:

    Impressive, kL!