04 September 2013

Dissecting the Summarization Process

This is in effect a mid-2013 progress update. As with many of my blog posts, this is as much a status update for me to get a better handle on where I'm at as it is to broadcast my progress.

mendicott.com is a blog reflecting on my journey with the overall project. This blog started seven years ago, in 2006, with my inquiry into The difference between a web page and a blog.... I had then returned from something like five years of world travel to find the digerati fawning over the blogosphere. At first, I failed to see the difference between a blog and a content management system (CMS) for stock standard web pages. Upon closer examination, I began to realize that the real difference lay in the XML syndication of blog feeds into the real-time web.

meta-guide.com is an attempt to blueprint, or tutorialize, the process. My original Meta Guide 1.0 development in ASP attempted to create automated, or robotic, web pages based on XML feeds from the real-time web. Meta Guide 2.0 development was based on similar feed bots, or Twitter bots, in an attempt to automate, or at least semi-automate, the rapid development of large knowledgebases from social media via knowledge silos. Basically, I use knowledge templates to automatically create the knowledge silos, or large knowledgebases. The knowledge templates are based on my own, proprietary "taxonomies", or more precisely faceted classifications, painstakingly developed over many years.

gaiapassage.com aims to be an automated, or semi-automated, summarization of the knowledge aggregated from social media by feed bots via the proprietary faceted classifications, or knowledge templates. Right now, I'm doing a semi-automated summarization process with Gaia Passage, which consists of automated research in the form of knowledge silos being "massaged" in different ways, but ultimately manually writing the summarization in natural language. This is allowing me to analyze and attempt to dissect the processes involved in order to gradually prototype automation. Summarization technologies, and in particular summarization APIs, are still in their infancy. Examples of currently available summarization technologies include automatedinsights.com and narrativescience.com. The overall field is often referred to as automatic summarization.

In the future, the Gaia Passage human readable summarizations will need to be converted into machine readable dialog system knowledgebase format. The dialog system is basically a chatbot, or conversational user interface (CUI) into a specialized database, called a knowledgebase. Most, common chatbot knowledgebases are based on, or compatible with, XML, such as AIML for example. Voice technologies, both output and input, are generally an additional layer on top of the text based dialog system.

The two main bottlenecks I've come up against are what I like to call artificial intelligence middleware, or frameworks, the "glue" to integrate the various processes, as well as adequate dialog system tools, in particular chatbot knowledgebase tools with both "frontend" and "backend" APIs (application programming interface), in other words a dialog system API on the frontend with a backend API into the knowledgebase for dynamic modification. My favorite cloud based "middleware" is Yahoo! Pipes, which is generally referred to as a mashup platform (aka mashup enabler) for feed based data; however, there are severe performance issues with Yahoo! Pipes -- so, I don't really consider it to be a production ready tool. Like Yahoo! Pipes, my ideal visual, cloud based AI middleware could or should be language agnostic -- eliminating the need to decide on a single programming language for a project. I have also looked into scientific computing packages, such as LabVIEW, Mathematica, and MATLAB, for use as potential AI middleware. Additionally, there are a variety of both natural language and intelligent agent frameworks available. Business oriented cloud based integration, including visual cloud based middleware, is often referred to as iPaaS (integration Platform as a Service), integration PaaS or "Integration as a Service".

The recent closure of the previously open Twitter API with OAuth has set my feed bot, or "smart feed", development back by years. Right now, I'm stuck trying to figure out the best way to use the new Twitter OAuth with Yahoo! Pipes, for instance via YQL, if at all. And if that were not enough, the affordable and user-friendly dialog system API, verbotsonline.com, that I was using went out of business. There are a number of dialog system API alternatives, even cloud based dialog systems, but they are neither free nor cheap, especially for significant throughput volumes. Still to do: 1) complete the Gaia Passage summarizations, 2) make Twitter OAuth work, use a commercial third party data source (such as datasift.com, gnip.com or topsy.com), or abandon Twitter as a primary source (for instance concentrate on other social media APIs instead, such as Facebook), 3) continue the search for a new and better dialog system API provider.

Most basically, the Gaia Passage project is a network of robots that will not only monitor social media buzz about both the environment and tourism but also interpret the inter-relations, cause and effects, between environment and tourism -- such as how climate change effects the tourism industry both negatively or positively, or even what effects the weather has on crime trends for a particular destination -- as well as querying these interpreted inter-relations, or "conclusions", via natural language. If this can be accomplished with any degree of satisfaction, either fully automated or semi-automated, then the system could just as easily be applied to any other vertical. Proposals from potential sponsors, investors, or technology partners are welcomed, and may be sent to mendicot [at] yahoo.com.

No comments: