GenAI provides abundant opportunities to enhance workflows and drive even more business value than what it’s already capable of achieving. Retrieval-Augmented Generation (RAG) emerges as a key component of GenAI that assists in improving productivity and unlocking unique use cases.
RAG manages vast data to provide valuable insights and data-driven recommendations for informed decision-making. Equally, it also contributes to boosting productivity by expediting various operational processes within the organisation.
This article aims to provide insight into a practical use case and inform business leaders of the capabilities they could adopt by incorporating this facet of GenAI technology into their respective organisations.
YLD’s approach with audio-based RAG for client success
YLD has been working with a client that deals with many customer support inquiries regularly. Given the volume of direct interaction with customers, the client maintains a customer service team that handles an influx of customer inquiries daily.
Aligned with the client’s objective to transition towards a more automated and digital-first approach as they scale, YLD had the opportunity to demonstrate our software engineering, data engineering, data science, and horizontal GenAI expertise.
We aimed to enhance the overall customer experience by capturing and interpreting phone conversations. In real-time, we used RAG techniques and tooling to extract relevant information from a knowledge base and empower agents to efficiently assist their customers.
Additionally, this work proves beneficial in more use cases where audio is utilised for language processing or summarisation. For instance, meeting recordings are given a summary at the end of the interaction.
The technology also goes beyond mere summarisation by extracting the general sentiment of conversations. Sentiment analysis allows the extraction of specific data from calls to facilitate a more detailed level of analysis for quality assurance purposes. Notably, agents maintain control over the information they share with customers and can align the system accordingly to improve the result suggestions it provides in the knowledge base they reference.
Audio-based RAG functionality and architecture
For anything related to audio language processing, Whisper is the go-to tool. Released a few years ago by OpenAI, it surpassed any other solution available at the time. It provides an incredibly accurate audio transcription. Either in real-time or via raw files.
On the other hand, Bedrock is an AWS-managed service that uses open and closed Large Language Models (LLMs), like Anthropic’s Claude or Meta’s Llama. With it, you can interact with the models in a stateless way without worrying about deploying them yourself. Additionally, you also don’t have to leave your AWS account and access a third-party service.
Given that you can interact with an LLM via an API, it means that you can use it for any use case you can think of, and that means Retrieval-Augmented Generation. All you need is your content indexed (either in OpenSearch or in a vector database).
In this use case, we transcribed the raw audio exchange between the agent and the customer. The system then breaks up transcripts into chunked portions, identifies relevant topics, and retrieves corresponding documents from the knowledge base. Those documents are then prompted to the AWS Bedrock model to summarise an insight based on the real-time transcript of the call and presented through a user-friendly widget. This widget enables agents to access and provide relevant information to clients in real time.
With this advancement, agents can find the right information quickly as the model provides accurate results. As a result, agents deliver top-quality service more effortlessly.
In addition, with the full transcripts (no longer real-time), we could generate and record a call summary that can be reviewed and indexed for other use cases.
The system architecture comprises several key components. Firstly, the transcripts are processed through an orchestrator app that interacts with the AWS Cloud infrastructure. The orchestrator app serves as the intermediary for handling the flow of transcripts within the system.
In the audio-based RAG process, the Transcribe Application captures and processes the audio stream to generate call transcripts. Within the Customer Support Backoffice App, software elements hosted on AWS Cloud oversee both the Bedrock Summarisation Model and the Bedrock Embedding Model. The embedding model creates the embeddings that we can use to find relevant content in our knowledge base.
Finally, the summarisation model condenses the entire conversation into a concise summary, similar to an AI assistant. The summary is provided post-conversation, offering a valuable feature for customer service-related tasks.
Other use cases for audio-based RAG
Generally, GenAI is not intended to replace direct customer engagement entirely. Other language processing features, such as using avatars in chatbots, automating customer service tasks, and many others, help businesses handle daily inquiries more efficiently.
Furthermore, RAG services could enhance knowledge sharing throughout the support department. An extensive and well-written knowledge base for agents is crucial for leveraging past customer experience and delivering optimal service. However, merely having a knowledge base is insufficient, since users must also be able to easily access it quickly.
Improve decision-making with real-time data
Making data-led decisions is essential for business growth in any industry. By leveraging RAG technology in tandem with tools like AWS Bedrock and transcription tools such as Whisper, businesses can unlock a multitude of improvements, enhancing workflow efficiency. Each incremental step taken towards achieving desired outcomes contributes to sustainable business growth.
Contact us to discuss how we can collaborate on your software engineering, data engineering, MLOps, data analysis, data science, or GenAI needs.