How to evaluate RAG with classification metrics RAG: A standard industry practice for Question-Answering GenAI Chatbots Retrieval-Augmented Generation (RAG) is the go-to approach for enabling LLMs to perform Question Answering on proprietary documents, such as business knowledge bases. The main struggle: How to evaluate RAG? While there are different business/research questions behind RAG, one is the
Papers Cache me if you Can: an Online Cost-aware Teacher-Student Framework to Reduce the Calls to Large Language Models (EMNLP 2023) We propose a framework for reducing calls to LLMs by caching previous LLM responses and using them to train a local inexpensive model. We measure the tradeoff between performance and cost. Experimental results show that significant cost savings can be obtained with only slightly lower performance.
Papers Making LLMs Worth Every Penny: Resource-Limited Text Classification in Banking (ACM ICAIF 2023) Standard Full-Data classifiers in NLP demand thousands of labeled examples, which is impractical in data-limited domains. Few-shot methods offer an alternative, utilizing contrastive learning techniques that can be effective with as little as 20 examples per class.