Are you curious about the role of data labeling in the modern era of generative AI? Look no further than Snorkel AI, which is revolutionizing the way organizations curate and prepare data for generative AI. While data labeling has long been a critical component of helping data scientists prepare data for machine learning (ML) and artificial intelligence (AI), Snorkel AI is taking it a step further with its new GenFlow service for building generative AI applications, and the Snorkel Foundry that helps organizations build customized LLMs.
According to Alex Ratner, CEO and cofounder at Snorkel AI, “How you curate, sample, filter and clean data ends up having a tremendous impact on the resulting foundation model that you get out. In other words, you can’t just dump in a random mix of garbage data, and expect these models to turn out well.” Snorkel Foundry helps organizations with data curation, enabling them to point the service at a data repository as part of a pre-training phase to help data scientists get the right mix of data to meet business objectives and reduce bias and the risk of hallucination.
Getting generative AI to work without good data is a hallucination
One common risk that faces generalized generative AI tools is that of hallucination, where responses are not accurate. Ratner explains that fundamentally, hallucinations occur as a result of a model not being trained for a specific task, or more importantly, not having all the right information in order to be accurate. Snorkel Foundry helps solve this issue by providing the right tooling and management capability to provide feedback to help filter out poor-quality data points in an effort to help generative AI generate an optimal output.
Why data labeling isn’t dead
Despite the hype around generative AI, Ratner argues that in the long run, most of the enterprise value from AI will come from more traditional predictive AI. Data labeling remains important for predictive AI tasks, such as classifying fraud. With generative AI, there is still a need for feedback, but it takes a different form than it does for predictive AI. Rather than labeling something as one type or another, the feedback is more that an individual prefers one summary or response to another. Snorkel AI is making this feedback more programmatic, accelerated, and better managed.