RSS and its role in Information Management
The Problem
Internet users have largely given up trying to keep up to date with the vast amount of content being published on their ‘favourite’ web sites, let alone the slow moving sites that they need to track but are not motivated to visit ‘just in case’. Portal vendors have tried to help by allowing users to aggregate bits of many web sites together, to minimise the number of web sites a person needs to access, particularly in a process context.
Proprietary approaches to Syndication, or the publish and subscribe model to information access has been tried several times on the internet, taking the form of for example Internet Explorer channels, and PointCast personalised news feeds. Avantgo continues to find a niche publishing channels to PDA’s.
Email has become flooded with newsletters, status updates, just in case cc emails, and application specific notifications.
RSS to the rescue
Recently however with the advent of RSS, we now have an Open and Simple way for applications to publish, for users to locate and subscribe and for subscribed content to be accessed, processed and ultimately scanned and consumed, discussed, archived and subsequently retrieved.
Background
For information on RSS, have a look at this web site:
http://radio.userland.com/allAboutRSS
for its history etc you are best referred to Dave Winders history of RSS at:
http://blogs.law.harvard.edu/tech/rssVersionHistory
Given the simplicity of RSS, (most people assume RSS stands for Really Simple Syndication), many people overlook its huge potential, which arises not from RSS itself, but from the concept of operation that it enables, (although popularity must in part be attributed to its simplicity), and the passionate way in which its users promote that concept, mainly right now through their own blogs and their new found productivity in subscribing to internet content change.
This short article attempts to describe why I think RSS, and its associated Concept of Operations is important. So lets start with what I mean by the Concept of Operations. I mean a set of conceptual ways of doing things that are enabled by this bit of standard technical glue we call RSS. There are probably many more concepts than the ones I am going to describe here but the list of keywords in bold above is enough to be going on with. The key thing to note is that before RSS there were largely only application specific ways of achieving most of these conceptual activities. RSS has enabled an interoperable standards based environment where hundreds of different applications work together throughout the information lifecycle. The result is a rapid evolution of applications, built on a stable standard, (although there is some fragmentation in RSS versions and the competing ATOM syndication specification this has little practical impact for end users).
Concept of Operations
To help to show why I think RSS is important I have used the following conceptual information lifecycle, (although I have ignored the process of information creation in this article). The lifecycle I have used starts with ‘applications‘ that need to publish data, which users then need to locate and if they are interested in, subscribe to and for that subscribed content to be accessed, processed and ultimately scanned and consumed, discussed, archived and subsequently retrieved.
When RSS is applied to these conceptual activities, a particular Concept of Operation is enabled that is particularly effective. I will describe each part of the lifecycle and attempt to describe where RSS fits in and how its helps. In a separate article I will describe how I have selected a particular set of applications and how there have been applied to the great benefit of my own personal productivity.
Applications
Ok so most people think of RSS as a way of publishing news, blogs, or what’s new information from web sites. But the scope of RSS can be much broader, I find it best to think of it as a means of notification. Many applications need to allow people to be notified about things, and RSS is the way to do it. Microsoft employees have been evangelising RSS for a while, but recently Bill Gates down have started to talk up the technology which is important if it is to break into mainstream IT. If you think of RSS in terms of some familiar Microsoft products you can image RSS being used to notify people about changes to Exchange Public Folder contents, SharePoint Lists including Document Libraries, Event log contents, Content Management Server web sites, SharePoint approvals etc.
To use some examples from Industry you can image RSS feeds for service bulletins describing problems with cars subscribed to by garages, change notification for important documents like aircraft maintenance manuals, tracking of documents posted to knowledge management tools, publishing new customers or key events that happen in a CRM system etc.
Seen this way you can see why I prefer to think of RSS as a way for applications to publish change. Using the word change can also be misleading because in some cases the published content may be largely static, that’s no problem to RSS, the benefit is that if it does change then the consumer will be notified.
Publish
RSS defines a simple XML format for publishing information. The XML format defines a channel or feed, which is made up of a number of items. Each item can contain a small number of elements including a Title and a Description. Multi-media content can be included in enclosures and application specific attributes can be added as well. The full definition of the RRS 2.0 XML definition can be found at:
http://blogs.law.harvard.edu/tech/rss.
So RSS at its simplest from a publishing standpoint provides a very simple way to provide a list of things that have changed, each RSS channel has a build date which ensures the consumers of the channel don’t download the feed unless it has changed and each item has a GUID, (unique id), that ensures the consumers of the channel don’t display items multiple times. In collaborative scenarios an RSS feed can include a URL to a location where comments on an item can be recorded.
In many ways RSS is so trivial that its difficult to see what all of the fuss is about, the reason is simple and powerful, RSS is a standard, it allows for many different applications to publish in a standard way, and that means that the REST OF THE LIFECYCLE, i.e. the remaining parts of my little Concept of Operations can also be standardised regardless of the application! That’s the power and prior to RSS the only thing to compete with RSS in its ubiquity was email and email got to be so successful for the very same reasons the simplicity and standardisation of SMTP. If you want to understand a bit more about how email and RSS compare check out this link:
http://weblogs.asp.net/alexbarn/archive/2004/05/22/139461.aspx
and this link:
http://www.windley.com/2004/03/04.html#a1072
Location
There are probably 6 main scenarios that involve locating information published via RSS that I can think of:
1. You locate an information source while browsing the web, you see a little orange XML image that links to an RSS feed that you can subsequently subscribe to. This is the most common experience today, i.e. you find something interesting and you subscribe to it so that if anything new arrives that’s interesting from the same source you will automatically be notified without having to go visit that web page ‘just in case’.
2. Someone sends you a link via email, or in a document etc
3. A web site or application presents an RSS feed to you as a value added service, for example an application vendor may provide a feed that contains information about updates to their software, or a holiday company that provides information about special offers you may be interested in.
4. You use one of the growing number of search engines that regularly search thousands of RSS feeds for you, for the search terms you specify, and return the list of matches as an RSS feed
5. The above generalised to search other content in addition to RSS feeds and return the results to you as a RSS feed.
6. You subscribe to a feed because of a specific business scenario and in a corporate environment the subscription may be automated and part of some formal role definition you have, in the same way that you might have some corporate standard web browser favourites configured for you.
By this time you are probably thinking that most of these scenarios are currently served by email and you would be correct, the key question is how well are they served, and how extensible and scalable is email by comparison with RSS, that’s what I hope to answer as this article progresses.
Subscription
At this point things start to get interesting because the standards bit of the process has nearly come to an end and the innovation can start. There are many applications serving many diverse needs that allow you to subscribe to either a single RSS feed or many feeds normally using some form of aggregator. Single feeds can be displayed as part of web portals, emailed to you, downloaded and reviewed off line on almost all client platform types and operating systems. Most people however use either a web based aggregator or a offline aggregator that runs on the client device, these can either be standalone or integrated with your email client, or even your web browser. The key thing about most aggregators though is that they automatically and periodically pole often hundreds of applications, (mostly web sites), to check for changes and download those changes so that they can be processed on mass at your convenience.
Its also worth noting that RSS 2 also supports the concept of subscribing to PUSH notifications that a channel has changed. By convention Subscriptions to Notifications need to be renewed every 25 hours. Notifications take the form of XML-RPC or SOAP messages from the publisher to the subscriber. The action taking by the subscriber on receipt of the message is not defined but it would usually be to read the associated channel.
Access
This is the easy bit, get the XML file using http.
Processing
Though its not necessary most RSS clients then perform some form of processing. For example applying a style sheet or other transform to the XML to create a HTML representation. In addition some will aggregate multiple channels and items into news pages. There is much variety and innovation at the client end. In a separate post, following this one, I will describe how my own environment is setup. If you want client software, then I suggest you check out one of the following links:
http://www.ourpla.net/cgi-bin/pikie.cgi?RssReaders
Scanning and consuming/reading
Most people don’t read all of the items they subscribe to they scan them, flag in some way the ones of interest and then read them. They then either delete them or archive them for subsequent retrieval. Of course I used the word consume, rather than read because in the general case there is no reason why it has to be a person reading the feed. If RSS is a general purpose notification mechanism then there may be automated consumers of those notifications.
Discussion
Most RSS hosting systems provide the ability to comment on an item, and for commenting on comments. Authors can be notified when comments are made on their items. This provides for a simple collaborative environment, although in many cases the client side support for threaded discussion type comments is not as efficient as one would expect based on the maturity of NNTP news readers that provide similar capabilities.
Retrieval
RSS items are increasingly seen as being a valuable information resource in their own right, as are the associated comments. A person’s own individual archive of feeds is often very valuable for the ‘I know I read about this somewhere’ type of query and web RSS search engines are valuable for broader queries, especially for getting views of breaking news of any type. For more information on search engines for news, including RSS see here: