Introduction
To build a reliable product, it is essential for all the underlying pieces to work coherently and predictably. In traditional software engineering, this predictability is ensured by rigorous testing methodologies, including unit testing, functional testing, integration testing, regression testing, performance testing etc.
However, achieving reliability in AI-driven applications is inherently more complex. The dynamic nature of AI models can lead to varied responses to the same inputs, thus complicating efforts to standardize behavior. Moreover, evaluating these responses involves a degree of subjectivity; a solution that works for one scenario might be suitable for another. Given the diverse needs of users, constructing a comprehensive test set that adequately addresses all potential use cases is challenging.
Recognizing these challenges, Convai is committed to providing our users with the tools they need to navigate this landscape with confidence. To facilitate this, Convai is introducing a new feature, The Testing Framework. This feature will allow users to construct, manage, and utilize their datasets effectively. By enabling the automated testing of their Characters against these datasets, users can ascertain the consistency and appropriateness of responses throughout both the development and deployment stages.
Our goal with the Testing Framework is to bridge the gap between the dynamic nature of AI-driven conversational agents and the need for reliable, predictable interactions. We believe that by offering this capability, we can significantly enhance the confidence and satisfaction of our users, ensuring their Characters behave as intended, regardless of the complexities involved.
Dataset Creation
Understanding the importance of efficiency and user convenience, we have tailored the dataset creation process to be as effortless as possible. In doing so, we have chosen to integrate this functionality seamlessly into the existing workflow, eliminating the need for our users to adapt to a new system. As users engage with their Characters, they have the opportunity to contribute to their dataset directly through Convai’s feedback system. As depicted in Figure 1, users can offer immediate feedback on any interaction with the Character by using a simple thumbs up or thumbs down indicator. Any interaction with feedback is automatically added into the test dataset, streamlining the process of dataset creation without imposing additional tasks on the user.
To further enhance the value and relevance of the feedback, we encourage users to provide additional details regarding their evaluations. This optional feature is designed to deepen our understanding of the user's expectations and the context of their feedback. For instance, say there is a specific requirement for the Character to respond within 3-4 lines and the response exceeds this limit, the user can mark the interaction with a "thumbs down" and annotate it with details about the excessive length. In the future, this detailed feedback will be used to automate the evaluation process within the Testing Framework.
Please note that users can also give a “Thumbs up” to interactions that they like. Adding good interactions to the test dataset will help ensure that the behavior that was working for them does not break as new changes are added to the Character.
Tagging
To better manage their test cases (as we will see in the section below) users can optionally tag their response. Users can use this tagging mechanism to group their test cases on the Testing tab.
Automated Testing
To streamline the testing process and enhance user convenience, we are excited to introduce a dedicated "Testing" tab for each Character, as illustrated in Figure 2. This new tab presents a unified view of all interactions alongside their detailed feedback, offering users an intuitive and efficient way to manage their test cases. On this tab users will have the capability to filter the cases by the tags and run the selected test cases with a single click.
Steps involved in testing a Character
- Modify the Character as needed.
- On the Testing Tab, choose a subset of test cases or all test cases and run them with a single click.
- This will initiate a rerun of all selected test cases using the latest Character modification.
- Upon completion, the system generates and displays the new outputs providing immediate insight into the testing outcomes.
Enhancing Development and Ensuring Reliability
We recognize the dynamic nature of Characters; their evolution is an integral part of enhancing user experience. Over time we expect our users would need to update backstories, personality, and knowledge bank information. With these changes, it's crucial to maintain a seamless user experience. Our Testing Framework is designed to instill confidence among creators, ensuring that modifications enhance rather than disrupt the established behavior. This dedication to "Consistency and Reliability" extends beyond our immediate users, positively affecting the end-user experience as well.
Moreover, we are committed to supporting our users throughout their development journey. By leveraging our Testing Framework, creators can significantly streamline their testing process. This efficiency reduces the time and resources typically required, facilitating quicker iterations and faster deployments. Our goal is to empower users to focus more on innovation and less on the mechanics of testing, accelerating the path from concept to launch.