Evaluating and Enhancing Custom AI Chat Services: Towards Increasing the Speed, Accuracy and Cost Efficiency of Retrieval Augmented Generation Pipelines
Master thesis
Permanent lenke
https://hdl.handle.net/11250/3141895Utgivelsesdato
2024Metadata
Vis full innførselSamlinger
Sammendrag
Custom AI chat services that utilize Retrieval Augmented Generation are becoming more and more common within different sectors and businesses. When these types of services are used in professional settings like a workplace, they have to be fast, cost-effective, and reliable. The Large Language Models that these services use are often provided as a service with limited access to the software itself, making fine-tuning downstream tasks a challenge. This thesis introduces a Retrieval Augmented Generation pipeline designed to enhance the time and cost efficiency, as well as the reliability, of AI chat services tailored to specific user needs. These advancements have been realized without modifying the underlying architecture of Large Language Models. Instead, they leverage prompt engineering strategies, including prompt compression and prompt classification, to elevate performance and efficiency. Given the critical role of embeddings in these services, this thesis has conducted an exploration of embedding models to find the best one for maximum enhancement. This thesis shows that the enhanced Retrieval Augmented Generation pipeline can achieve a cost reduction of almost 69% compared to a standard Retrieval Augmented Generation pipeline without using any of the proposed enhancements. It has also been shown that utilizing an open-source embedding model can increase efficiency by as much as 41% compared to similar OpenAI models. This advancement highlights the potential of this method to enhance custom AI chat services.