Evaluating and Enhancing Custom AI Chat Services: Towards Increasing the Speed, Accuracy and Cost Efficiency of Retrieval Augmented Generation Pipelines

Dale, Erik

Dale, Erik

Master thesis

Åpne

no.uia:inspera:222274016:128446466.pdf (1.384Mb)

Permanent lenke

https://hdl.handle.net/11250/3141895

Utgivelsesdato

2024

Metadata

Vis full innførsel

Samlinger

Master's theses in Information and Communication Technology [508]

Sammendrag

Custom AI chat services that utilize Retrieval Augmented Generation are becoming more and more common within different sectors and businesses. When these types of services are used in professional settings like a workplace, they have to be fast, cost-effective, and reliable. The Large Language Models that these services use are often provided as a service with limited access to the software itself, making fine-tuning downstream tasks a challenge. This thesis introduces a Retrieval Augmented Generation pipeline designed to enhance the time and cost efficiency, as well as the reliability, of AI chat services tailored to specific user needs. These advancements have been realized without modifying the underlying architecture of Large Language Models. Instead, they leverage prompt engineering strategies, including prompt compression and prompt classification, to elevate performance and efficiency. Given the critical role of embeddings in these services, this thesis has conducted an exploration of embedding models to find the best one for maximum enhancement. This thesis shows that the enhanced Retrieval Augmented Generation pipeline can achieve a cost reduction of almost 69% compared to a standard Retrieval Augmented Generation pipeline without using any of the proposed enhancements. It has also been shown that utilizing an open-source embedding model can increase efficiency by as much as 41% compared to similar OpenAI models. This advancement highlights the potential of this method to enhance custom AI chat services.

Utgiver

University of Agder