Application of Large Language Model (LLM) Guardrails in Mortgage
Written by Tela Mathias, COO and Managing Partner at PhoenixTeam
Recently I was asked about the use of guardrails in our product and put together a white paper to formalize our internal documentation and thinking. I was curious to explore the relationship between LLMs and how we use them, and to see what the thinking was in other fields like robotics. Guardrails refer to the policies, protocols, and technical measures we put in place to prevent AI systems from producing undesirable or harmful results. Frankly, the concept of guardrails applies to use of ANY technology, especially in mortgage where the consequences of “getting it wrong” can be so significant.
Of course, Sheridan’s classification scheme always comes up, but I was also impressed with the work of Jenay M. Beer “Toward a Framework for Levels of Robot Autonomy in Human-Robot Interaction”. Sheridan and Beer both offer ways of thinking about how humans interact with technology. Based on the need and what can go wrong as we work to meet that need (think borrower applying for a home loan), we need to care about how we ensure technology is doing its job. The question is not can we automate a use case but should we. And if we “should” automate a use case, what safeguards can we put in place to ensure the results are good for humankind.
This is where guardrails come in.
At PhoenixTeam, we think about a broad set of guardrails applied throughout the product, and then specific controls within each of the four categories. We also have a variety of additional controls in our roadmap and dedicate about 30% of our development capacity to improve safety and responsible use.
- We employ the OpenAI 4o stack through a PhoenixTeam OpenAI “team” account for access to both the OpenAI application programming interface and the ChatGPT experience. This business user account type ensures that any data we exchange is not used to train OpenAI models.
- All AI-generated content is grounded using retrieval-augmented generation (RAG). This technique involves retrieving relevant information from a trusted dataset to provide context for the AI’s responses. By anchoring outputs to specific, verified sources, we significantly reduce the likelihood of hallucinations or contextually irrelevant content.
- We have spent hundreds of hours engineering our prompts to stay grounded, minimize hallucinations, and optimize results. By providing clear instructions and context within the prompts, we minimize the potential for the AI to generate incorrect or unrelated information.
- We have implemented a multistep process that incorporates the option for human oversight at each step in the generation process to help ensure the accuracy and appropriateness of AI-generated content. Having a human in the loop (HITL) strikes the right balance between enabling innovation and mitigating risk of unintended results and is an important component of Acting Comptroller of the Currency’s Michael J. Hsu’s remarks on AI risk management toll gates.
- Although we have selected the OpenAI stack as the initial foundational model behind Burst, ultimately this is the customer’s choice. Customers may prefer to use a more open-source model. They may seek additional copywrite and intellectual property protection than those offered by OpenAI. We work with enterprise customers to put the model of their choice behind Burst.
- We have introduced limited systematic evaluation guardrails to monitor AI outputs continually. We are actively evaluating multiple evaluation frameworks for future implementation to enhance our monitoring capabilities and to help ensure consistent compliance with responsible AI standards.
- We current store the chunk from which the statement was generate in our relational database (RDB). We plan to implement additional visible content tracing to individual end-state artifacts later this year. This will enhance transparency by allowing users to trace outputs back to their original sources, fostering trust and accountability.
- For speed and efficiency purposes, we have avoided model fine-tuning in Burst. We intend to evaluate the cost benefits of model fine tuning, and this discovery effort is on our roadmap for next year. Fine-tuning could allow us to tailor the AI models more precisely to our specific use cases and responsible AI guidelines.
- Our use cases do not require or accept any personally identifiable information about consumers (or any human). We make extensive use of process, policy and regulatory information in the public domain to enrich client process-based information and generate artifacts from this knowledge.
We have a lot to learn yet and continue to explore the use cases we implement in our solution, the operational process supporting the use of our application, and how what we do fits into the broader ecosystems of our clients. Please come see this in action and ask your questions at our demonstration at MBA. Chatbots are great, but there’s so much more to applying genAI in mortgage. Understand how Sheridan’s mental model for assistance versus automation applies to use cases in mortgage. See it in action in a live demo of Phoenix Burst and be ready to ask all your questions about building production applications with genAI.
Thomas Sheridan (born December 23, 1929) is American professor of mechanical engineering and Applied Psychology Emeritus at the Massachusetts Institute of Technology. He is a pioneer of robotics and remote-control technology. Jenay M. Beer is an associate professor at the University of Georgia (UGA) Institute of Gerontology, with a joint appointment in College of Public Health (Department of Health Promotion and Behavior) and the School of Social Work.