Thursday, May 1, 2008

Intro to Enterprise RSS

Firstly, what is RSS and how does it relate to what is familiar to the "Enterprise" us?
In of itself RSS is nothing special. It is just a format for "Latest Headlines". Roughly it says that Content A is about something and the full posting of the same is located at http://A. Content B is about something else and lives at http://B. Where-as prior to popularization of "syndication" the user had to go to sites of interest to peruse updated content, RSS now allows content producers to simply "push" updates to us as we busy ourselves with more important tasks.

Value in RSS is introduced by the Publisher who takes ownership of publishing the latest headlines in a known or easily discoverable location that does not change while keeping that list of headlines up to date. Value is passed on to the Consumer through software that knows how to deliver fresh content to you, the Knowledge Worker.

A Use Case
Let's say a Team wants to maintain a high-level, chronological log of project activity and make this log available for others to peruse at individual convenience (buzzwords: asynchronous consumption of syndicated content). A classical way of disseminating this information (in the past) has been to spend man-hours designing these headlines into a Newsletter format and to send the Newsletter out to everyone via email. Those who care and those who don't care would receive this email aka spam.

Now let's say this Team hosts an RSS file that is auto-magically updated on a daily basis by some publishing system that you, the user, should not really know or care about. Let's say it's Sharepoint (since Sharepoint 2007 does this very well...). The system is pre-configured in such a manner so that whenever a member of the Team publishes an article via Sharepoint, the RSS file representing all such articles published is automatically updated with the latest headlines on published documents.

Those interested in latest headlines from that group can direct their RSS Client to keep an "eye" on that Feed. Specifically, your client "pulls" the latest content on an interval, directly from the "source" without involving Newsletters and without spamming your Inbox with content of suspect relevance to you.

Two important terms here are: RSS Feed - that which is owned and updated by the content owner, and RSS Reader (or RSS Client) through which the end-user maintains subscriptions of interest and by which the end-user can always see the "latest" headlines coming from the Content Owner.

For further details on what RSS is and is not, please take a look at the Wikipedia entry on RSS.

Why are we even bothering with "centralized" or "IT"-driven RSS?
Let's take a very practical (and conservative), risk-minded perspective on infrastructure common to most Fortune 500 enterprise-size companies. Assumption here is that RSS Clients of all sorts run rampant in the environment's Intranet and RSS content is treated as any other web content. This means that all outbound requests for web-hosted content on the "out there" internet as well as the content retrieved from the web go through a Proxy tier commonly maintained by some division of Corporate IT.

Imagine everyone (that's us, to be referred to as "we") in this hypothetical Firm maintaining a list of 50-100 subscriptions to external and internal sources of content. Let's also say that because we always subscribe to Content of Interest and therefore content of Relevance, we wish our RSS Clients to "keep an eye" on each feed and to pull interesting updates to our subscriptions every 30 minutes.

In reality, what this means is that everyone in the Firm is asking the CNN's of the world for updates on ALL headlines of interest every 30 minutes.

A sidebar on Proxies and their role within our environment is important here. We know that we access the "outside" internet through Proxies. Why Proxies? Proxies do some security things. More importantly, Proxies do some content caching and networking and load optimization that are essential to maintaining healthy connectivity to the internal and external networks. Proxies are critical components of an enterprise's infrastructure and are, essentially, Gateway Keepers to the Web. Most Proxy infrastructures are probably designed to accommodate Humans perusing sites. Systems that do demand a high load of network content are usually provisioned, accounted for, and paid for separately as a specific requirement. The Business Unit requiring such special capacity will usually take on this cost. Recall in our original assumption that no such "special" considerations were in place for RSS traffic.

Your RSS Client asking for updates from the web is equivalent to your Computer perusing sites. Your Computer can do a lot of things quickly, such as pull 50-100 subscription updates within a matter of seconds: something that would take us much more time to do by hand. What does this mean? It means that the amount of traffic to and from the outside Internet increases drastically. This happens because polling of many CNNs is condensed to a short amount of time. This happens per individual with RSS feeds over the course of the day. In other words, unrestrained RSS Clients Gone Wild in our environment will create too much unplanned traffic through the Network and will overload the Proxy servers not provisioned to handle such load. Risk here is disrupting daily routing and more importantly business critical applications relying on the network. Enterprises should deem this as an unacceptable risk. Does this mean RSS should not be allowed in the environment? Absolutely not!

Solution
As mentioned before, Proxies do some caching. For example, when you browse for CNN.com homepage and I browse for CNN.com homepage the Proxy is smart enough to realize that CNN.com is not a unique resource and that more than one person wants to and will probably want to see the same content. Rather than using up Networking resources to deliver content to you from CNN.com directly, Proxies will simply cache that page locally and serve out the local (cached) copy to the user for subsequent requests. This is economical.

The exact same model needs to apply for all RSS traffic in and out of our environment. Something needs to sit between your RSS Client and the RSS Feed Owner to provide exactly the same type of mediation and optimization.

To enable RSS within the Firm, the Firm must deploy some RSS-specific Proxy-like infrastructure to Gatekeep RSS traffic. IT should own this task.

Now that RSS is going through a centralized "hub", what else can we do with it?
When data is managed in a centralized location, all sorts of interesting analytics, and intelligence-driven applications can come into play. For example, subscription data, read/unread status information in of itself can yield rough ranking of popularity. Tagging of content can yield human-driven meta information that democratizes categorization of content coming in from a very large and daunting repository: the web. Popularity ranking and democratized categorization yields personalized, intelligent recommendations and filtering. All these buzz words imply boost to productivity by way of highlighting information we "know" is relevant to us and by way of not spamming us with content that is not.

This subscription-based content-delivery mechanism, when coupled with such intelligence, is a valuable paradigm in an environment plagued with info-glut, tight deadlines, and the ever-present abuse of emails. How many mailgroups do you belong to?

Why not simply invest in more Proxies?

  1. General-purpose proxy servers commonly purchased are extremely expensive - can cost as much as $80,000 per box
  2. Proxies are not specialized to provide the bells and whistles mentioned in the previous section and therefore under-represent the value of syndication within an Enterprise environment

Taking it a step further
Let's imagine that we have this piece of infrastructure sitting in-house, that allows us to aggregate, categorize, rank and disseminate content not only intelligently but also via a standardized and globally-acceptable content format: RSS (XML). When the Firm federates this system with already-existing and already invested-in internal Collaborative Platforms, syndicated, relevant content does not have to originate from traditional "news" sources alone. In fact, such syndication can expose internally generated data of all sorts. Think improved portal adoption through Dashboarding, Agile Project Management and Knowledge Bases, system status notifications and so on. For specific ideas on how to leverage Blogging and syndication for Agile Product Management and Knowledge Bases, take a look at my Blog's series called "Taking Baby Steps".

Taking it one more step further
RSS in the Enterprise can drive Social discovery of peer-recommended and peer-generated content by virtue of mining this centralized repository of content and subscriptions. Cleverly federated with Collaborative systems like Sharepoint, the Enterprise RSS Server can socialize boring document repositories and lists in Sharepoint. Imagine "following" content generated, reviewed and starred by the head of your department. Newsfeeds of social activity made popular by Facebook can be easily constructed by marrying Enterprise RSS infrastructure with Collaborative infrastructure such as Sharepoint. And this is only the beginning... For example, semantic analysis of subscriptions and consumed content can yield better understanding of relevancy and quality of recommendations as well as intelligent filters and subscriptions.

Conclusion
Centralized, enterprise-class RSS infrastructure is an enabler of more than news consumption. Social, asynchronous feedback loops are critical aspects of collaboration and are made possible by investing in RSS as infrastructure.

Vendors such as NewsGator, Attensa, and KnowNow are currently leading the market in providing the Enterprise with a "Buy not Build" option for exactly such infrastructure. Who of these "gets" Enterprise requirements best? I urge you to compare their respective offerings against specs outlined in my Blog Post entitled "Why Vendors Don't Get Enterprise2.0". I have. Not surprisingly, when compared against core requirements and the extra credit, as outlined above, competition among the three is not as close as analysts and the laggers of the three would like you to believe...

No comments:

Post a Comment