TruEra has just launched TruLens, an open-source software designed to test applications built on large language models (LLMs) like the GPT series. As one of the few vendors offering tools to tackle this aspect of LLM app development, TruLens provides enterprises with a quick and easy way to evaluate and iterate on their LLM applications and eliminate the chances of hallucination and bias in the production stage. Best of all, it’s available for free starting today!
LLMs are all the rage, but building applications based on these models can be a tiring experimentation process that involves human-driven response scoring. With TruLens, TruEra is addressing this gap by introducing a programmatic method of evaluation called “feedback functions.” These functions score the output of an LLM application for quality and efficacy by analyzing both the text generated from the LLM and the response’s metadata.
TruLens can be added to the development process with a few lines of code. Once it’s up and running, users can create their own feedback functions — customized to specific use cases — or use the out-of-the-box options. Currently, the software provides feedback functions that test for truthfulness, question-answering relevance, harmful or toxic language, user sentiment, language mismatch, response verbosity, and fairness and bias. Moreover, it also logs how much an LLM is being pinged within the app, giving an easy way to track usage costs.
According to an Accenture survey, 98% of global executives agree that AI foundation models will play an important role in their organizations’ strategies in the next three to five years. This signals that tools like TruLens will soon see increased demand from enterprises.
Other offerings for LLM applications
While testing LLM-driven applications for performance and response accuracy is the need of the hour, only a handful of players have launched solutions to deal with it. These include Datadog’s OpenAI model monitoring integration, Arize’s Pheonix solution, and Israel-based Mona Labs’ just-launched generative AI monitoring solution.
TruEra, for its part, claims that TruLens is best used in the development phase of LLM app development.
“This is actually the phase that most companies are in today — they are experimenting with development and really have an acute need for tools to help them iterate faster and home in on application versions that are both effective at their tasks and risk-minimizing. You can, of course, use it on both development and production models,” said Anupam Datta, cofounder, president and chief scientist at TruEra, in an interview with NeuralNation.
