In a world dominated by large language models, MIT CSAIL researchers argue that smaller models should not be overlooked, particularly for widely deployed natural language processing applications. They have developed an innovative approach to address the problems associated with big, computationally expensive models and privacy concerns. By focusing on the concept of textual entailment, the researchers trained a smaller “entailment model” that outperforms models 500 times its size in certain language understanding tasks, without the need for human-generated annotations. This approach enhances the model’s adaptability to different tasks through zero-shot adaptation.
The researchers discovered that leveraging the technique of self-training, where the model uses its own predictions to learn without human supervision, significantly improves performance in downstream tasks such as sentiment analysis, question answering, and news classification. They outperformed Google’s LaMDA and FLAN models, GPT models, and other supervised algorithms in terms of zero-shot capabilities. To mitigate the challenge of generating incorrect labels during self-training, they developed an algorithm called “SimPLE” (Simple Pseudo-Label Editing) to review and modify the model’s self-generated labels, enhancing overall quality and robustness against adversarial data.
While the study demonstrates promising results, there are limitations, particularly in applying entailment models to multi-choice tasks. Nonetheless, the researchers believe their approach presents a more scalable, trustworthy, and cost-effective solution to language modeling. It showcases the potential for compact language models to perform exceptionally well on benchmark understanding tasks compared to their larger counterparts. The paper outlining their research will be presented at the Meeting of the Association for Computational Linguistics in July.