Willo is a chatbot that uses a conversational interface to engage children in an open-ended exploration of the world. Willo asks questions about natural phenomena and encourages children to develop possible solutions based on their own ideas and experiences.
Voice-to-Text: Willo uses a packaged version of Google’s voice-to-text API, called Annyange, which manages the functionality for continuously streaming voice recognition in Chrome.
Text-to-Voice: Willo uses IBM’s Watson for the voice synthesis of responses. Specifically it uses Allison Voice which has extended functionality using Speech Synthesis Markup Language (SSML) to add expressiveness and make the voice sound more natural.
Server: A Python server functions as the brain of Willo to identify responses and generate further prompts. Text from the client is preprocessed through a natural language processor (NLP) called Spacy. Using Spacy, sentences are divided up into distinct ideas by identifying relationships between words and using verbs as the root of each idea in the longer sentence. The phrases are vectorized using the lemma (dictionary form) of each word. Ideas are then classified with a support vector machine (machine learning algorithm) using scikit-learn. Specific keywords are also extracted from the sentence for more narrow classification or verification.
Conversation Construction: The conversation is built using a custom drag and drop JavaScript interface. The interface allows the content developer to control the flow of the conversation based on specific conditions such as identified keywords, or the more general classification of intents. Nodes can be grouped into functional blocks for reuse in different conversational contexts. The interface allows the content developer to jump to specific positions in the conversation and test them in real-time both by watching the conversational flow and by listening to the audio feedback.
Continuous Learning: Every time the system identifies a phrase that the system has not previously classified it adds that phrase to a database for approval and is then incorporated into the training corpus for future classifications.
Further Development: The conversational complexity that a chatbot needs to communicate with children is extensive. For example, one response Willo received to the question “where do puddles go after it rains?” was a song about the evaporation cycle. The complexity of the problem might require a more sophisticated solution. Machine learning might be added to aid in the construction of the conversation, for instance, by actively suggesting additions or modifications of branches based on collected data. Further advancements in text-to-voice systems would improve the robustness of the system. Children’s voices often fluctuate dramatically in volume and tone based on their confidence in a particular response. Increased accuracy and a wider range of possible environmental conditions would bring Willo closer to my dream of a fully functional device. Continuing advancements in text-to voice will add character to the system which will increase engagement across the interaction. With increased sophistication, a visual input might significantly improve the system as children are often physically expressive in their ideas, particularly new and exciting ones.