GaiaPassage.com is subtitled "Marcus L Endicott's favorite tips for green travel around the world". I'm calling it a deep green, eco-centric travel guide to the whole Earth. My Gaia Passage project will be a handwritten ecotourism guide to the entire world, based on the circa 250x ccTLD. The general idea is to write a "white paper" for every country in the world, on environmental and cultural conditions, issues, and who is doing what about them, as well as examining both how they affect tourism and how tourism affects those issues. Anyone could write a lot about something, but the idea here is to provide "snapshots", or "bite sized" summaries, of only the best information and contacts. The name "Gaia Passage" originally came from my pre-Internet (mid-1980s) travel tips newsletter. The site is a work in progress; so far, I've completed the entire Western Hemisphere:
GaiaPassage.com is handwritten, but based on automated research and automated outline. Primary research is based on data mining 20 years of Green Travel archives. Secondary research is based on multiple years of Meta Guide Twitter bots archives. Significance is based on primary sources in the form of root website domains, and/or secondary sources in the form of Wikipedia entries. In other words, if there is not a root website domain name or a Wikipedia entry then it is unlikely to be included. (However, almost anything may be included in Wikipedia - if properly referenced.)
I have noticed that many websites of smaller concerns are going down, offline, apparently due to the economic downturn. However, social media such as Twitter and Facebook do present affordable alternatives to owning a root domain website, and I will take these into consideration when appropriate. (In other words, when something is really cool.) I have also noticed a lot of people using Weebly to make free websites. (Note, GaiaPassage.com currently uses the free Google Sites platform.)
In the early evolution of a website, especially large projects, it's important to first have the "containers" in place as "placeholders", which is no small task in itself. With circa 250x countries and territorial entities, that's a whole year's fulltime work for one man, revising one country per working day. This would mean initial completion by December 2013. Eventually, GaiaPassage.com entries may morph into socialbots, or conversational assistants, containing not only all the knowledge about sustainable tourism gleaned from past Green Travel archives, but also current knowledge resulting from the Meta Guide Twitter bots.
In my previous blog, 250 Conversational Twitter Bots for Travel & Tourism, I detailed my 250x Meta Guide Twitter bots, one for every country and territory in the Internet ccTLD. Basically, I've spent the past five years working on artificial intelligence and conversational agents - and tweeting about it all the while (links below). I had been using Twitter extensively as a framework; however, Twitter has become increasingly protectionistic, most dramatically illustrated by the high profile 2012 Twitter-LinkedIn divorce. The Twitter API has become a moving target, which is just too costly for me to keep playing catch up. In short, I find the "Facebook complex" of Twitter management immensely annoying, and concluded to stop contributing original content; so, my New Year's resolution was to stop tweeting manually at least for all of 2013. Further, my excellent dialog system API, VerbotsOnline.com, went out of business in 2012. Any other good dialog system API I found to replace it turned out to be much too expensive. As a result, all my conversational agents are shut down, at least for 2013. My hope is that the sector will shake out and/or advance during the year, and better or at least more affordable conversational tools will become available next year.
Following up on my previous post of January 2008, “Corpus linguistics & Concgramming in Verbots and Pandorabots”, you can now see the demo of this VagaBot at http://www.mendicott.com . The results of this trial were not satisfying due to the limitation of the VKB engine at verbotsonline.com not being able to process consecutive, or random, responses from identical input or triggers, basically tags. In other words, the responses with identical input hang on the first response, and not cycle through the series of alternatives. Apparently a commercial implementation of the Verbots platform does allow for the consecutive firing of related replies. Thanks to Matt Palmerlee of Conversive, Inc. for increasing the online knowledgebase storage to accommodate this trial and demo.Dr. Rich Wallace has recently blogged a very helpful post, “Saying a list of AIML responses in order”, on his Alicebot blog at http://alicebot.blogspot.com . After considerable fiddling, I have successfully installed Program E on my Windows desktop under Wampserver (Apache, MySQL, PHP). I have also found a very easy commercial product for importing RSS feeds into MySQL. Next I will try to bridge the RSS database and the Program E AIML database with Extensible Stylesheet Language Transformations (XSLT) using the previously mentioned xsl-easy.com database adapters… as well as implement Dr. Wallace’s "successor" function on the Program E AIML platform. Once I get the prototype working on my desktop, I will then endeavor to replicate it on a remote server for public access.The long term goals of Project VagaBot are to create a conversational agent that can not only “read” books, but also web feeds, and “learn” to reply intelligently to questions, in this case on “green travel”, in effect an anthropomorphic frontend utilizing not only my book, "Vagabond Globetrotting 3", but also my entire http://meta-guide.com feed resources as backend. I am not aware of another project that currently makes the contents of a book available using a conversational agent, nor one that “learns” from web feeds. I hope to eventually be able to send the VagaBot avatar into smartphones using both voice output and input. I would be very interested in hearing from anyone interested in investing or otherwise supporting this development.
One of the definitions of semantic, as in Semantic Web or Web 3.0, is the property of language pertaining to meaning, meaning being significance arising from relation, for instance the relation of words. I don’t recall hearing about corpus linguistics before deciding to animate and make my book interactive. Apparently there has been a long history of corpus linguistics trying to derive rules from natural language, such as the work of George Kingsley Zipf. As someone with a degree in psychology, I do know something of cognitive linguistics and its reaction to the machine mind paradigm.
The man who coined the term, called artificial intelligence "the science and engineering of making intelligent machines,” which today is referred to as "the study and design of intelligent agents." Wikipedia defines intelligence as “a property of the mind that encompasses… the capacities to reason, to plan, to solve problems, to think abstractly, to comprehend ideas, to use language, and to learn.” Computational linguistics has emerged as an interdisciplinary field involved with “the statistical and/or rule-based modeling of natural language.”In publishing, a concordance is an alphabetical list of the main words used in a text, along with their immediate contexts or relations. Concordances are frequently used in linguistics to study the body of a text. A concordancer is the program that constructs a concordance. In corpus linguistics, concordancers are used to retrieve sorted lists from a corpus or text. Concordancers that I looked at included AntConc, ConcordanceSoftware, WordSmith Tools and ConcApp. I found ConcApp and in particular the additional program ConcGram to be most interesting. (Examples of web based concordancers include KWICFinder.com and the WebAsCorpus.org Web Concordancer.)Concgramming is a new computer-based method for categorising word relations and deriving the phraseological profile or ‘aboutness’ of a text or corpus. A concgram constitutes all of the permutations generated by the association of two or more words, revealing all of the word association patterns that exist in a corpus. Concgrams are used by language learners and teachers to study the importance of the phraseological tendency in language.I was in fact successful in stripping out all the sentences from my latest book, VAGABOND GLOBETROTTING 3, by simply reformatting them as paragraphs with MSWord. I then saved them as a CSV file, actually just a text file with one sentence per line. I was able to make a little utility which ran all those sentences through the Yahoo! Term Extraction API, extracting key terms and associating those terms with their sentences in the form of XML output, as terms equal title and sentences equal description. Using the great XSLT editor xsl-easy.com, I could convert that XML output quickly and easily into AIML with a simple template.The problem I encountered was that all those key terms extracted from my book sentences when tested formed something like second level knowledge that you couldn’t get out of the chatbot unless you already knew the subject matter…. So I then decided to try adding the concgrams to see if that could bridge the gap. I had to get someone to create a special program to marry the 2 word concgrams from the entire book (minus the 100 most common words in English) to their sentences in a form I could use.It was only then that I began to discover some underlying differences between the verbotsonline.com and pandorabots.com chatbot engine platforms. I've been using verbotsonline because it seemed easier and cheaper, than adding a mediasemantics.com character to the pandorabot. However, there is a 2.5 Meg limit with verbotsonline knowledgebases, which I've reached three times already. Also, verbotsonline.com does not seem to accept multiple SAME patterns with different templates, at least the AIML-Verbot Converter apparently removes the “duplicate” categories.In verbots, spaces automatically match to zero or more words, so wildcards are only necessary to match partial words. This means in verbots words are automatically wildcarded, which makes it much easier to achieve matches with verbots. So far, I have been unable to replicate this simple system with AIML, which makes AIML more precise or controllable, but perhaps less versatile, at least in this case. Even with the AIML knowledgebase replicated eight times with the following patterns, I could not duplicate the same results in pandorabots as the verbots do with one file, wildcarding on all words in a phrase or term.dog catdog cat *_ dog cat_ dog cat *dog * catdog * cat *_ dog * cat_ dog * cat *The problem I encountered with AIML trying to “star” all words was that when starred at the beginning of a pattern only one word was accepted and not more words, and when replaced with the underscore apparently affects pattern prioritization. So there I am at the moment stuck between verbots and pandorabots, not being able to do what I want with either, verbotsonline for lack of capacity and inability to convert “duplicate” categories into VKB, and pandorabots for inability to conform to my fully wildcarded spectral word association strategy….