Data scarcity is a big problem for many data scientists.
That might sound ridiculous (“isn’t this the age of Big Data?”), but in many domains there simply isn’t enough labelled training data to train performant models using traditional ML approaches.
In classification tasks, the lazy approach to this problem is to “throw AI at it”: take an off-the-shelf pre-trained LLM, add a clever prompt, and Bob’s your uncle.
But LLMs aren’t always the best tool for the job. At scale, LLM pipelines can be slow, expensive, and unreliable.
An alternative option is to use a fine-tuning/training technique that’s designed for few-shot scenarios (where there’s little training data).
In this article, I’ll introduce you to a favourite technique of mine: SetFit, a fine-tuning framework that can help you build highly performant NLP classifiers with as few as 8 labelled samples per class.