Showing posts with label taxonomy. Show all posts
Showing posts with label taxonomy. Show all posts

08 January 2008

Books, metadata and chatbots… in search of the XML Rosetta Stone

I am an author and I build chatbots (aka chatterbots). A chatbot is a conversational agent, driven by a knowledgebase. I am currently trying to understand the best way to convert a book into a chatbot knowledgebase.

A knowledgebase is a form of database, and the chatbot is actually a type of search… an anthropomorphic form of search and therefore an ergonomic form of search. This simple fact is usually shrouded by the jargon of “natural language processing”, which may or may not be actual voice input or output.

According to the ruling precepts of the “Turing test”, chatbots must be as close as possible to conversational, and this is what differentiates them from pure “search”…. With chatbots there is a significant element of “smoke and mirrors” involved, which introduces the human psychological element into the machine in the form of cultural, linguistic and thematic assumptions and expectations, so becoming in a sense a sort of “mind game”.

I’m actually approaching this from two directions. I would also like to be able to feed RSS into a chatbot knowledgebase. There is currently no working example of this available. Parsing RSS into AIML (Artificial Intelligence Markup Language), the most common chatbot dialect, is problematic and yet to be cracked effectively. So, my thinking arrived at somehow breaking a book into a form that resembles RSS. The Wikipedia List of XML markup languages revealed a number of attempts to add metadata to books.

Dr. Wallace, the originator of AIML, recently responded on the pandorabots-general group, that using RSS title fields would usually be too specific to make them useful as chatbot concept triggers. However, I believe utilities such as the Yahoo! Term Extraction API could be used to create tags for feed items, which might then prove more useful when mapped to AIML patterns….

My supposition is that a *good* book index is in effect a “taxonomy” of that book. Paragraphs would generally be too large to meet the specialized “conversational” needs of a chatbot. The results of a conventional concordance would be too general to be useful in a chatbot…. If RSS as we know it is currently too specific to function effectively in a chatbot, what if that index were mapped back to the referring sentences as “tags”, somewhat like RSS?

I figure that if you can relatively quickly break a book down into a sentence “concordance”, you could then point that at something like the Yahoo! Term Extraction API to quickly generate relevant keywords (or “tags”) for each sentence, which could then be used in AIML as triggers for those sentences in a chatbot…. Is there such a beast as a “sentence parser” for a corpus such as a common book? All I want to do at this point is strip out all the sentences and line them up, as a conventional concordance does with individual words.

There are a number of examples of desktop chatbots using proprietary Windows speech recognition today, however to my knowledge there are currently no chatbots available online or via VoIP that accept voice input (*not* IM or IRC bots)…. So, I’ve also spent some time lately looking into voiceXML (VXML), ccXML and the Voxeo callXML, as well as the Speech Recognition Grammar Specification (SRGS) and the mythical voice browser…. The only thing I could find that actually accepts voice input online for processing is Midomi.com, which accepts voice input in the form of hummed tune for tune recognition…. Apparently goog411, which is basically interactive voice response (IVR) rather than true speech recognition, is as close as it gets to a practical hybrid online/offline voice search application at this time. So, what if Google could talk?

01 November 2007

Destination Meta-Guide.com 2.0 Update, November 2007

I spent a few weeks last month in the Australian bush on Aboriginal land with a group of old friends.

Since I’ve come back to my ecological niche in the green Shire, I’ve been inspired to overhaul the “Destination Meta-Guide.com 2.0” (http://www.meta-guide.com/). It is now effectively a “daily green travel newspaper” for virtually every country on Earth. The Destination Meta-Guide.com 2.0 combines elements of both the collaborative “Web 2.0” and the semantic web or “Web 3.0”.

Besides maps and photos for every country, it contains relevant news stories based on my improved green travel taxonomy and the latest green travel news for all countries listed in the right hand sidebar.

The Destination Meta-Guide.com 2.0 also contains a green-travel mini-guide for each country, consisting of up to ten of the most recent postings to the green-travel group for that country. This represents the collaborative or “Web 2.0” element - but, the green-travel group has been in existence since 1991 which not only predates the graphical web we know today, but also the concept of “Web 2.0” itself.

Further, the Destination Meta-Guide.com 2.0 contains an automated selection of green travel links for each country, specially drawn from that country’s national domain. So, if you are operating in a specific country and do not have a web presence under the national top level domain, then to be listed here you should.

The biggest additions lately have been four pages for every country specifically searching respectively Development Agencies, Development Banks, UN Agencies, and international NGOs for green travel topics. In particular, this application represents semantic web or “Web 3.0” technology, in that taxonomic filtering at multiple levels effectively creates semantic relationships, increasing relevancy.

These pages are highly configurable, so any suggestions you might have of what not to include or what to include will be most welcome! Feedback of any kind is encouraged, public or private.

I need your help now to continue this project. I am requesting donations of any amount via paypal.com toward sponsoring my research and development. In return, I will include a link of your choice in recognition of your contribution - and have created a detailed Sponsors page for this purpose at http://www.mendicott.com/meta-guide/sponsors.asp

08 June 2007

green-travel taxonomy

Over the past 10 years or so I’ve been gradually developing a taxonomy, or classification system, for “green travel”, or more accurately green or sustainable tourism, an extension of my work with globetrotting and backpacker tourism. In other words, what are the key concepts involved in responsible tourism? A two dimensional taxonomy becomes an ontology when applied in three dimensions, as relationships among the concepts emerge. Taxonomies and ontologies are useful in artificial intelligence applications, such as bots.

I’ve spent much of the past decade tinkering with and tweaking the http://meta-guide.com which emerged from the old green-travel.com site. Today this would be called a “mashup”. Lately, I seem to have hit on a particularly useful algorithm, and have in effect taught the meta-guide to tell me everything in the popular press about “green travel” happening in our world today, in a more useful format, country by country… for nearly every “country” on Earth…. In particular, it returns the latest information about climate change and global warming in relation to tourism, in addition to ecotourism and sustainable tourism developments, etc.

I recommend trying the random country feature and let me know what you think, either in the green-travel group at http://groups.yahoo.com/group/green-travel or directly to me!

Marcus Endicott http://mendicott.com