Software QA is undergoing a sea change due to generative AI-driven testing. That begs the question of how to practice responsible AI in software testing. Hence, this post provides eleven considerations for responsible testing when using generative AI (GenAI).
First, let’s note that responsible AI is an emerging area of AI governance covering ethics, morals and legal values in the development and deployment of beneficial AI. As a governance framework, responsible AI documents how a specific organization addresses the challenges around AI in the service of good for individuals and society.
Two Ways to Use Generative AI in Software testing
Before getting to responsible use considerations, it’s important to differentiate between the two ways to use GenAI for software testing.
- You may attempt to use a Large Language Model (LLM), e.g., ChatGPT or Bard, as part of a cobbled-together toolset.
- Or, you can use an integrated testing platform that is powered by GenAI, e.g., the Appvance IQ (AIQ) testing platform.
We now know that the former poses overwhelming challenges, first for responsible use and then for creating a fulsome testing toolchain, versus simply using an integrated platform off-the-shelf. This is because an LLM tool may generate test plans, but requires considerable work to embed those generated plans in a testing regime, create actual usable automation, and to ensure responsible use.
Accordingly, these considerations differentiate responsible use for each A) LLM driven testing and B) GenAI-powered platform driven testing.
- Accuracy and Reliability: Ensure the accuracy and reliability of the generated scripts, as the AI might not fully understand the nuances of your application-under-test. Such review is facilitated in a platform like AIQ, but will inevitably be ad hoc in an LLM driven testing regime.
- Bias: Regularly review and rectify any biases in the generated scripts, which might be based on the biases present in the training data. While the training data and script generation are clearly evident in a GenAI-powered testing platform, they are ad hoc in an LLM driven testing regime. Thus, the latter requires considerable work to examine.
- Security and Privacy:
- Again, these concerns are much greater when using an LLM driven testing regime. A GenAI-powered testing platform makes the handling and generation of testing data straight forward. Please refer to Pros & Cons of Using Production and Generated Data for Software Testing for more color on this issue.
- Intellectual Property: Consider intellectual property rights and ensure that the use of AI-generated scripts doesn’t infringe on any copyrights or patents.
- Clearly communicate to stakeholders that AI is being used to generate testing scripts and explain any limitations or potential risks associated with this approach.
- If the generated scripts are used as part of a larger project, document the use of AI and provide context on how it contributes to the project.
- Dependency: It is essential to maintain human involvement in the testing process, especially for oversight of AI generated scripts. Periodically review the effectiveness of AI-generated scripts and adapt your approach as needed.
- Ethical Considerations: Consider the ethical implications of using AI for generating testing scripts. For instance, AI-driven testing leads to dramatic leaps in productivity, which may raise concerns about an impact on jobs. However, our experience is that this productivity boost always goes towards closing the coverage gap that resulted from a non-AI driven testing operation.
- Training and Support: Provide training and support to the team members who will be using the AI-generated scripts to ensure that they can effectively provide oversight of those scripts, and then use and interpret the test results. Encourage continuous learning and adaptation as the technology evolves.
- Monitoring and Evaluation: Continuously monitor the performance and outcomes of the AI-generated scripts to identify any issues or areas for improvement. Evaluate the overall impact of using AI on your software testing processes and make adjustments as needed.
- Documentation: Thoroughly document the process of using AI to generate scripts, including the configuration, input data, and any modifications made to the generated scripts. This documentation helps in troubleshooting, auditing and improving the process over time. As with other elements of the software testing process, this is much easier when using an AI-powered platform than when using a cobbled-together toolchain that includes an LLM chatbot.
- Limitation Awareness: Be aware of the limitations of the AI model being used. For instance, a general-purpose LLM tool like ChatGPT is not a domain-specific model, nor is it an integrated testing platform. So it might not be fully aware of the intricacies and specifics of certain software testing methodologies and technologies. In all cases, cross-verify the scripts generated and, if needed, seek expert advice for more complex and critical scenarios.
Your probability of success is vastly higher when using a GenAI-powered testing platform like AIQ than it is when building around a generic LLM tool like ChatGPT. That is because a comprehensive platform is appropriately robust compared to a single function, general purpose language model. And specific to the topic of this post, a highly designed test automation platform doesn’t have the open-ended responsible AI concerns that a generic tool introduces into a testing regime.
In both cases, the above considerations will put you in good stead for responsible use of GenAI in software testing.