Willo

Willo is a chatbot that uses a conversational interface to engage children in an open-ended exploration of the world. Willo asks questions about natural phenomena and encourages children to develop possible solutions based on their own ideas and experiences.

 

 

With inquiry-based learning, new knowledge is gained through an active and sometimes messy process of exploration and sense-making. Our pre-existing ideas act as the scaffolding for new concepts; analogous and adjacent ideas act as opportunities for critical thinking. What often holds us back from inquiry-based teaching is a scarcity of time, patience, or know-how. Computers offer infinite scalable time and patience, while know-how is limited only by the sophistication of the algorithms we build.

Willo is a chatbot that uses a conversational interface to engage children in an open-ended exploration of the world. The chatbot starts with a launching question Where do puddles go after it rains? What happens to frogs during the winter? What makes wind blow? and then facilitates a conversation that builds on the child’s notions based on their own ideas and experiences.

Existing conversational interfaces such as Alexa, Google Home, and Siri are primarily designed to execute narrow tasks such as Turn on the lights, Play music by Bob Marley, How old is the earth?. These platforms function by diagnosing a user’s intent and executing a specific corresponding action. IBM Watson Conversation and API.AI (owned by Google) are interfaces which help content developers create these shallow or linear interactions. However, the complexity and fluidity of natural conversations require more sophisticated tools and structures.

Zoomed out view of the conversational building interface

A conceptual flow of an interaction with Willo.

To solve this problem, I designed and developed a decision-tree interface (in JavaScript) that allows users to easily visualize and modify Willo’s conversational flow. The branches of the tree are traversed based on conditions such as specific keywords, machine learning classification (using a Support Vector Machine), and similarities between the user’s response and the subsequent output. The conversational structure emanates from one or more trunks that represent the core ideas off of which sprout branches containing specific related topics. Through continued interaction with Willo. Data is generated which broadens, deepens, and adds sophistication to the model by improving the machine learning algorithm and by allowing content developers to modify or add new branches.

Currently, Willo exists online as an animation that plays on the screen as a child responds to a series of expounding and cascading questions. I have developed content for a single question, Where do puddles go after it rains? as an archetype interaction around which I have built the tools that allow for fluid, complex, and extended conversation. Eventually, Willo will be a physical device similar to Alexa that might sit at the child’s bedside to inspire her natural curiosity about the world.

Voice-to-Text: Willo uses a packaged version of Google’s voice-to-text API, called Annyange, which manages the functionality for continuously streaming voice recognition in Chrome.
Text-to-Voice: Willo uses IBM’s Watson for the voice synthesis of responses. Specifically it uses Allison Voice which has extended functionality using Speech Synthesis Markup Language (SSML) to add expressiveness and make the voice sound more natural.
Server: A Python server functions as the brain of Willo to identify responses and generate further prompts. Text from the client is preprocessed through a natural language processor (NLP) called Spacy. Using Spacy, sentences are divided up into distinct ideas by identifying relationships between words and using verbs as the root of each idea in the longer sentence. The phrases are vectorized using the lemma (dictionary form) of each word. Ideas are then classified with a support vector machine (machine learning algorithm) using scikit-learn. Specific keywords are also extracted from the sentence for more narrow classification or verification.
Conversation Construction: The conversation is built using a custom drag and drop JavaScript interface. The interface allows the content developer to control the flow of the conversation based on specific conditions such as identified keywords, or the more general classification of intents. Nodes can be grouped into functional blocks for reuse in different conversational contexts. The interface allows the content developer to jump to specific positions in the conversation and test them in real-time both by watching the conversational flow and by listening to the audio feedback.
Continuous Learning: Every time the system identifies a phrase that the system has not previously classified it adds that phrase to a database for approval and is then incorporated into the training corpus for future classifications.
Further Development: The conversational complexity that a chatbot needs to communicate with children is extensive. For example, one response Willo received to the question “where do puddles go after it rains?” was a song about the evaporation cycle. The complexity of the problem might require a more sophisticated solution. Machine learning might be added to aid in the construction of the conversation, for instance, by actively suggesting additions or modifications of branches based on collected data. Further advancements in text-to-voice systems would improve the robustness of the system. Children’s voices often fluctuate dramatically in volume and tone based on their confidence in a particular response. Increased accuracy and a wider range of possible environmental conditions would bring Willo closer to my dream of a fully functional device. Continuing advancements in text-to voice will add character to the system which will increase engagement across the interaction. With increased sophistication, a visual input might significantly improve the system as children are often physically expressive in their ideas, particularly new and exciting ones.