Amazon Alexa Skill - Python - DynamoDB - VoiceFlow
The goal of this project was the development of a functional prototype of an Alexa Skill to be presented at the "Festival for Art & Research in Bavaria" (Hi!A). Our team aimed to showcase the versatility of VUI applications and promote children's literacy and creativity through an interactive storytelling experience.
We crafted an interactive story driven by a Voice User Interface (VUI), with Alexa acting as an omniscient narrator and presenting users with choices that influence the story line. The narrative follows Erik Schneider, a character stranded in a foreign realm, tasking users with guiding him home.
The skill is currently undergoing certification by Amazon and is not yet publicly available. If you would like to try it out, contact me so I can add you to the beta testing list.
Additionally a full project report (currently only in German) is available on GitHub.
Our process began with an ideation phase, where we explored various themes and concepts for our project, ultimately landing on an interactive story driven by a VUI.
With the concept finalized, we proceeded to design the interaction model using Voiceflow, a visual design tool for voice applications. We crafted user flows and mapped out the various decision points within the story, focusing on the user experience and intuitive controls. It was important to maintain a lighthearted tone, ensuring the story's suitability for children and adolescents by minimizing references to violence, but keeping it exciting at the same time.
Within the frontend architecture, we defined intents and their corresponding utterances. Intents represent the user's objectives, guiding the skill's response to user input. For instance, to enable users to call Erik, we established an "Anrufen" (Call) intent, supported by various utterances such as "Erik anrufen" (Call Erik), "Anruf" (Call), "anrufen" (calling), or "Ich möchte Erik anrufen" (I want to call Erik).
In order to ensure optimal user interaction, our team tested the skill thoroughly. It was important to predict and prevent situations where users employed phrases not recognized as utterances, prompting error responses from Alexa. Additionally, we resolved conflicts among intents sharing similar utterances, preventing misinterpretations, and ensuring accurate intent recognition.
The backend architecture comprised three components: a Lambda function, a utilities file, and DynamoDB integration. The utilities file is a collection of pre-programmed functions provided by Amazon, streamlining development tasks. DynamoDB, a NoSQL database service was used to store user data, specifically usernames, to enable personalized interactions within the Alexa Skill.
The Lambda function, deployed on Amazon Web Services (AWS) servers, is the backbone of the Alexa Skill, executing code in response to user interactions with Alexa. Here we implemented handler classes to manage the intents defined in the frontend. Following processing within the handler classes, textual responses were generated and converted into speech for Alexa's output. Additionally, we used the Alexa Skills Kit Sound Library to incorporate immersive sound effects.
Following the initial development phase, we conducted usability testing sessions to gather feedback from our users. We made iterative improvements based on user insights, such as expanding the story line, introducing more narrative branches, and improving dialogue flow. We also opted for subtle sound effects over music to maintain narrative immersion, incorporating audio cues at pivotal story moments.
While we are currently not planning for further development of the project, we still recognize areas for potential improvements.
We could introduce additional story lines across diverse genres, such as science fiction or agent thrillers, in order to expand the skill's appeal and cater to a wider audience. Error messages that guide users when unexpected input is detected would enhance clarity and user experience. We could also add proactive suggestions for alternative phrases, ensuring that people of many different fluency levels can navigate the skill seamlessly, allowing for smooth and engaging interactions.