Wednesday, May 14, 2008

NYBI Meetup #3 Recap

The Recap
The third Meetup was fully dedicated to discussion. It was driven by a few PowerPoint slides from one NYBI member's presentation given at the CTO Summit in Summer of 2007.

One slide of interest was about a proposed BI Ontology reproduced, in part, below:

BI Ontology
  • Decision Space
    • Decision Purpose
    • Decision Frequency
    • Decision Impact
  • Tool Space
    • Data
    • Models
    • Experience
    • Feedback
There was another level of detail for each sub-bullet that will become available when I post the original slides. Ontologies are important because they provide a shared and common understanding of concepts presented and discussed in forum. The same way that we hope to derive consensus on Trends and Truths within the BI space, so do we hope to establish a descriptive (and shared) language about BI with the aid of the proposed Ontology.

An introductory slide or two spoke to the "State of Affairs" as well the "This is how we came to be here" and spurred on 2 full hours of discussion.

Some highlights:
  • Most organizations looking to introduce BI into their fold succeed in rolling out a Data Warehouse. Rolling out Reporting infrastructure is the difficult part. If the organization is lucky enough to succeed, this reporting infrastructure tends to be the most expensive investment.
  • The industry itself evolved from ad-hoc tools to very structured tools and mechanisms. Approach to successful BI implementation became a Trade and a science in of itself.
  • The BI Ecosystem is very large. There are a few established Tools and Tool Vendors but community of people that provide quality or reporting services is for these Tools is significantly larger.
  • Vendor tools, however, are going through mergers and the Vendors are losing independence. Will a community emerge to not just support the Vendors but to actually drive innovation in the space and offset the aforementioned loss of independence by Vendors?
  • Paper distribution has been consistently declining over the last 4 years to be replaced by digital systems. Are paper statements in existence today exercises in Data Summarization? Why are our banking statements and our bills still so confusing and uninformative?
  • In turn, with an encroaching demand from end-users to be able to do real-time analytics and trending of their own personal data, why are banks so slow to spice up their online banking systems? Are new sites like Mint.com and QuickenOnline setting a new standard in personal finance analytics?
  • How much longer will we wait for the higher activities of analysis? Is the next step in BI Tools that aid in decision making process rather than just visualization? Can "actions" exposed by these new BI Tools be contextually relevant and drive specific end-results, such as yield competitive advantage?
  • Volume of data is now manageable. Tools that are making sense of this volume are driving innovation.
  • Is the notion of Business Intelligence simply too constrained to fully encompass possibilities? Are we transitioning to a more general-purpose, behavioral, Collective Intelligence that enables an organization to tap into data previously unaccessible or simply discarded? Can this aid an organization's process of innovation and feed competitive advantage?
  • Is it possible to phase out manual Knowledge Management along with paper distribution and plugin in systems that interpret data on a Semantic level?
  • Are Personalization of Services and Individual Privacy on opposing sides? Are the notions of privacy on the Internet and a Corporate Intranet so drastic that most innovation in Collective Intelligence will, in fact, occur on the Intranet where Individual Privacy is a more manageable Can of Worms?
Truths and Trends
  • Truth: There are lots more data sources of interest than what is currently accessible on the internet
  • Truth: Improving on what computers tend to do well does not necessarily result in value to the end-user
  • Truth: Audio processing is more difficult than Video processing
  • Trend: Use of Voice and Video as datasources is a new frontier. We wonder how much of it is native analysis vs metadata processing.
  • Trend: More real-time decision-making at the hands of the user.
  • Trend: Enterprise environment will give way to enterprise community.
  • Trend: Web2.0 Collaborative tools are becoming more accepted in the enterprise as both the enterprises and the vendors mature
  • Truth: There are lots more data sources of interest than what is currently accessible on the internet
  • Truth: Improving on what computers tend to do well does not necessarily result in value to the end-user
  • Truth: Audio processing is more difficult than Video processing
  • Trend: Use of Voice and Video as datasources is a new frontier. We wonder how much of it is native analysis vs metadata processing
  • Trend: More real-time decision-making at the hands of the user.
Next Meetup
SAP Business Objects folks will be joining us to help us explore Xcelsius, their new analytics tool. They will also speak to the rest of the near-future product road-map. Details are available on the http://nybimeetup.org site.

Thursday, May 1, 2008

Intro to Enterprise RSS

Firstly, what is RSS and how does it relate to what is familiar to the "Enterprise" us?
In of itself RSS is nothing special. It is just a format for "Latest Headlines". Roughly it says that Content A is about something and the full posting of the same is located at http://A. Content B is about something else and lives at http://B. Where-as prior to popularization of "syndication" the user had to go to sites of interest to peruse updated content, RSS now allows content producers to simply "push" updates to us as we busy ourselves with more important tasks.

Value in RSS is introduced by the Publisher who takes ownership of publishing the latest headlines in a known or easily discoverable location that does not change while keeping that list of headlines up to date. Value is passed on to the Consumer through software that knows how to deliver fresh content to you, the Knowledge Worker.

A Use Case
Let's say a Team wants to maintain a high-level, chronological log of project activity and make this log available for others to peruse at individual convenience (buzzwords: asynchronous consumption of syndicated content). A classical way of disseminating this information (in the past) has been to spend man-hours designing these headlines into a Newsletter format and to send the Newsletter out to everyone via email. Those who care and those who don't care would receive this email aka spam.

Now let's say this Team hosts an RSS file that is auto-magically updated on a daily basis by some publishing system that you, the user, should not really know or care about. Let's say it's Sharepoint (since Sharepoint 2007 does this very well...). The system is pre-configured in such a manner so that whenever a member of the Team publishes an article via Sharepoint, the RSS file representing all such articles published is automatically updated with the latest headlines on published documents.

Those interested in latest headlines from that group can direct their RSS Client to keep an "eye" on that Feed. Specifically, your client "pulls" the latest content on an interval, directly from the "source" without involving Newsletters and without spamming your Inbox with content of suspect relevance to you.

Two important terms here are: RSS Feed - that which is owned and updated by the content owner, and RSS Reader (or RSS Client) through which the end-user maintains subscriptions of interest and by which the end-user can always see the "latest" headlines coming from the Content Owner.

For further details on what RSS is and is not, please take a look at the Wikipedia entry on RSS.

Why are we even bothering with "centralized" or "IT"-driven RSS?
Let's take a very practical (and conservative), risk-minded perspective on infrastructure common to most Fortune 500 enterprise-size companies. Assumption here is that RSS Clients of all sorts run rampant in the environment's Intranet and RSS content is treated as any other web content. This means that all outbound requests for web-hosted content on the "out there" internet as well as the content retrieved from the web go through a Proxy tier commonly maintained by some division of Corporate IT.

Imagine everyone (that's us, to be referred to as "we") in this hypothetical Firm maintaining a list of 50-100 subscriptions to external and internal sources of content. Let's also say that because we always subscribe to Content of Interest and therefore content of Relevance, we wish our RSS Clients to "keep an eye" on each feed and to pull interesting updates to our subscriptions every 30 minutes.

In reality, what this means is that everyone in the Firm is asking the CNN's of the world for updates on ALL headlines of interest every 30 minutes.

A sidebar on Proxies and their role within our environment is important here. We know that we access the "outside" internet through Proxies. Why Proxies? Proxies do some security things. More importantly, Proxies do some content caching and networking and load optimization that are essential to maintaining healthy connectivity to the internal and external networks. Proxies are critical components of an enterprise's infrastructure and are, essentially, Gateway Keepers to the Web. Most Proxy infrastructures are probably designed to accommodate Humans perusing sites. Systems that do demand a high load of network content are usually provisioned, accounted for, and paid for separately as a specific requirement. The Business Unit requiring such special capacity will usually take on this cost. Recall in our original assumption that no such "special" considerations were in place for RSS traffic.

Your RSS Client asking for updates from the web is equivalent to your Computer perusing sites. Your Computer can do a lot of things quickly, such as pull 50-100 subscription updates within a matter of seconds: something that would take us much more time to do by hand. What does this mean? It means that the amount of traffic to and from the outside Internet increases drastically. This happens because polling of many CNNs is condensed to a short amount of time. This happens per individual with RSS feeds over the course of the day. In other words, unrestrained RSS Clients Gone Wild in our environment will create too much unplanned traffic through the Network and will overload the Proxy servers not provisioned to handle such load. Risk here is disrupting daily routing and more importantly business critical applications relying on the network. Enterprises should deem this as an unacceptable risk. Does this mean RSS should not be allowed in the environment? Absolutely not!

Solution
As mentioned before, Proxies do some caching. For example, when you browse for CNN.com homepage and I browse for CNN.com homepage the Proxy is smart enough to realize that CNN.com is not a unique resource and that more than one person wants to and will probably want to see the same content. Rather than using up Networking resources to deliver content to you from CNN.com directly, Proxies will simply cache that page locally and serve out the local (cached) copy to the user for subsequent requests. This is economical.

The exact same model needs to apply for all RSS traffic in and out of our environment. Something needs to sit between your RSS Client and the RSS Feed Owner to provide exactly the same type of mediation and optimization.

To enable RSS within the Firm, the Firm must deploy some RSS-specific Proxy-like infrastructure to Gatekeep RSS traffic. IT should own this task.

Now that RSS is going through a centralized "hub", what else can we do with it?
When data is managed in a centralized location, all sorts of interesting analytics, and intelligence-driven applications can come into play. For example, subscription data, read/unread status information in of itself can yield rough ranking of popularity. Tagging of content can yield human-driven meta information that democratizes categorization of content coming in from a very large and daunting repository: the web. Popularity ranking and democratized categorization yields personalized, intelligent recommendations and filtering. All these buzz words imply boost to productivity by way of highlighting information we "know" is relevant to us and by way of not spamming us with content that is not.

This subscription-based content-delivery mechanism, when coupled with such intelligence, is a valuable paradigm in an environment plagued with info-glut, tight deadlines, and the ever-present abuse of emails. How many mailgroups do you belong to?

Why not simply invest in more Proxies?

  1. General-purpose proxy servers commonly purchased are extremely expensive - can cost as much as $80,000 per box
  2. Proxies are not specialized to provide the bells and whistles mentioned in the previous section and therefore under-represent the value of syndication within an Enterprise environment

Taking it a step further
Let's imagine that we have this piece of infrastructure sitting in-house, that allows us to aggregate, categorize, rank and disseminate content not only intelligently but also via a standardized and globally-acceptable content format: RSS (XML). When the Firm federates this system with already-existing and already invested-in internal Collaborative Platforms, syndicated, relevant content does not have to originate from traditional "news" sources alone. In fact, such syndication can expose internally generated data of all sorts. Think improved portal adoption through Dashboarding, Agile Project Management and Knowledge Bases, system status notifications and so on. For specific ideas on how to leverage Blogging and syndication for Agile Product Management and Knowledge Bases, take a look at my Blog's series called "Taking Baby Steps".

Taking it one more step further
RSS in the Enterprise can drive Social discovery of peer-recommended and peer-generated content by virtue of mining this centralized repository of content and subscriptions. Cleverly federated with Collaborative systems like Sharepoint, the Enterprise RSS Server can socialize boring document repositories and lists in Sharepoint. Imagine "following" content generated, reviewed and starred by the head of your department. Newsfeeds of social activity made popular by Facebook can be easily constructed by marrying Enterprise RSS infrastructure with Collaborative infrastructure such as Sharepoint. And this is only the beginning... For example, semantic analysis of subscriptions and consumed content can yield better understanding of relevancy and quality of recommendations as well as intelligent filters and subscriptions.

Conclusion
Centralized, enterprise-class RSS infrastructure is an enabler of more than news consumption. Social, asynchronous feedback loops are critical aspects of collaboration and are made possible by investing in RSS as infrastructure.

Vendors such as NewsGator, Attensa, and KnowNow are currently leading the market in providing the Enterprise with a "Buy not Build" option for exactly such infrastructure. Who of these "gets" Enterprise requirements best? I urge you to compare their respective offerings against specs outlined in my Blog Post entitled "Why Vendors Don't Get Enterprise2.0". I have. Not surprisingly, when compared against core requirements and the extra credit, as outlined above, competition among the three is not as close as analysts and the laggers of the three would like you to believe...