Conference

Academic Research Library

Find some of the best Journals and Proceedings.

Or Browse Popular Categories

A Competitive LLM-based Hybrid RAG System with Optimized Integration of Dense, Sparse, and Cached Retrieval on Edge Devices

Author : Jhing Fa Wang, Din Yuen Chan Kuo Sheng Hu

Abstract :In this paper, an innovative hybrid RAG system is constructed by optimally integrating dense retrieval, sparse keyword-based retrieval, and a retrieval cache mechanism. The proposed system can harness the main challenge for deploying LLMs on edge devices. The challenge includes the limited memory, constrained compute resources, and high latency. The edge-device implementation of common RAG systems often cause lower 70% retrieval accuracy and more 5 seconds end-to-end response latency for standard question-answering benchmarks. The proposed system can effectively reduce the redundant computation and the inference latency as well as improve the retrieval precision. Experiments demonstrate that our system termed CLH-RAG can achieve the retrieval accuracy of over 80% and the average response latency of less than 2 seconds. Consequently, CLH-RAG have the high competition against the existing RAG systems for the edge device deployment supplying the high-quality real-time LLM inference.

Keywords :Edge AI, Hybrid Retrieval, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Retrieval Cache.

Conference Name :International Conference on Artificial Intelligence and Software Engineering (ICAISE-25)

Conference Place Lisbon, Portugal

Conference Date 5th Nov 2025

Preview

View PDF

Welcome to the Academic Research Library — your gateway to a vast collection of scholarly proceedings across diverse fields.

Quick Links

Topic Links

Help

Email ID

academicresearchlibraryinfo@gmail.com