How Financial Services Can Harness LLMs Safely & Effectively
The financial services sector is grappling with a fundamental question: how do you deploy cutting-edge AI technology in one of the world's most regulated industries?
While LLMs promise significant operational improvements and new capabilities, banks and financial firms face complex challenges around compliance, explainability and risk management that don't exist in other sectors.
To explore these challenges, we spoke with three industry leaders: Simon Thompson, Head of AI, ML and Data Science at GFT; Richard Doherty, Wealth & Asset Management Leader at Publicis Sapient; and Richard Harmon, Vice President and Global Head of Financial Services at Red Hat.
Their insights reveal practical approaches to deploying LLMs safely whilst maintaining regulatory compliance and managing the inherent risks of AI in critical financial processes.
How should financial institutions balance the transformative potential of LLMs with the regulatory and compliance requirements that govern financial services?
There needs to be a balance between the use of sophisticated algorithms and tools as part of a solution, with the corresponding regulatory and compliance requirements, as well as a key requirement for enhanced explainability capabilities.
The EU-AI Act, which came into force on 13 March, is the most comprehensive global AI-focused regulatory example so far, and it aims to take a balanced (i.e., risk-adjusted) view on what is permitted while taking into account a range of determining factors to ensure robust and trustworthy AI systems.
These include the need for explainability, comprehensive documentation, stringent process and data governance, continuous human oversight, proactive risk management and meticulous auditability.
The image below provides one simple, principled approach to be taken into account when planning all gen AI-based solutions - the design of the solution needs to ensure that it is "Trustworthy", as this is crucial for adoption.
Regulatory compliance will and should limit the use of gen AI if there is not a sufficient degree of explainability and transparency so that the solution outcomes are understood by all parties.
Accountability for outputs helps maintain ethical standards, and sound principles around explainability are also required for debugging, refining and explaining the models.
Wide-scale adoption of gen AI models will ultimately require better tools to help users understand how models arrive at decisions or outcomes.
What do you see as the most promising use cases for LLMs in finance beyond customer service and document processing?
Simon: LLMs are text-based generative AIs, and I think that's the clue. Where generating text is going to be helpful, that's where the promising use cases are.
For example, creating credit memos or underwriting reports can be tedious and repetitive and LLMs can supercharge employees doing this.
Of course, careful thought has to be applied to preventing crazy AI-slop text from being fired into critical processes.
Step-by-step generation requiring employee interaction can produce a user experience that is more like piloting an excavator rather than either digging trenches by hand or watching a robot build something.
Richard Doherty (RD): Beyond customer service and document processing, LLMs show exceptional promise in areas like real-time risk monitoring, personalised investment advisory and regulatory change management.
For instance, using LLMs to scan, summarise and contextualise global regulatory changes can give compliance teams a real-time edge, turning a reactive function into a proactive capability.
Similarly, AI-driven decision support for portfolio managers, based on massive cross-market and news analysis, has the potential to fundamentally reshape investment workflows.
Real-time risk monitoring is another game-changer. LLMs can process vast streams of market data, news, and regulatory communications to identify emerging risks before they show up in traditional metrics. This transforms risk management from reactive monitoring to predictive intelligence.
How should financial institutions approach the challenge of ensuring LLM outputs are accurate, explainable and auditable for regulatory purposes?
Simon: LLMs produce long, complex outputs. This is a sharp contrast to traditional ML approaches that create binary decisions or regression.
Because of this, it can be very challenging to evaluate them. Another issue is that LLMs are a moving target; API providers are under pressure to constantly innovate, producing new models seemingly every week.
The output of the new model will probably be better than the output of the old model, but it will definitely be different because that difference is the point of the innovation.
One approach is to use LLM output as a booster for human output rather than as the finished product. The human is still the accountable entity, but human productivity can be massively boosted in GFT's experience when they are supported by LLM-enabled tools.
This approach also allows other AI methods to be used in concert with LLMs to validate output and provide extra information to the human.
RD: Accuracy, explainability, and auditability are non-negotiable for regulatory acceptance. Institutions should be building evaluation pipelines alongside model development, where outputs are continuously benchmarked against business logic, edge cases and regulatory requirements.
Retrieval-augmented generation (RAG), fine-tuning on institution-specific data and post-hoc explainability tooling are part of a growing toolkit to ensure LLMs operate within transparent and reviewable boundaries.
The key is implementing continuous monitoring systems that track model performance in production environments, flagging deviations from expected behaviour before they impact business operations or compliance.
Most importantly, you need comprehensive audit trails that regulators can follow and understand. This means building transparency into your AI systems from the ground up, not trying to retrofit it later.
RH: It comes back to the importance of taking into account explicit upfront requirements for explainability and trustworthiness when designing a gen AI-based solution.
This should be done within the context of an enterprise AI ops platform, which can support consistency with these requirements as well as standardise and streamline the full model lifecycle management process.
Taking advantage of agentic AI capabilities with embedded LLMs can help establish autonomous oversight processes. These agents are capable of monitoring, validating and rectifying inaccurate, biased, or misleading outputs. Properly configured LLMs can subsequently learn from the evidence generated by this process.
What strategies are you considering to mitigate the risks of model bias and hallucination when deploying LLMs in critical financial processes?
RD: Bias and hallucinations are manageable, but only with the right strategies. This means curated training data, continuous testing in production-like environments, human-in-the-loop validation, and architecture choices that prioritise factual grounding.
Leading institutions are implementing layered safeguards: combining internal knowledge bases, usage guardrails and real-time monitoring to prevent off-policy behaviour in sensitive contexts.
You need protection at multiple levels, at the data level through diverse, representative training sets; at the architecture level through design choices that prioritise accuracy over fluency; and at the operational level through continuous validation and monitoring.
The most effective approach involves systematic safeguards that operate throughout the AI system, not just at the endpoints. This creates multiple checkpoints that catch potential issues before they affect critical business decisions.
RH: A comprehensive suite of tools is essential for rigorously testing models throughout their entire lifecycle, from initial data sourcing and algorithm development to output evaluation, explanation and utilisation.
In essence, what we now term "hallucinations" are, in my view, what we have always had in all types of models - i.e., model error!
As with other areas, open source within the AI context helps to drive innovation and safety through thousands of community-supported projects. Leveraging global open-source community efforts is crucial.
These communities offer a vast array of collaboratively developed tools, extending beyond the limited scope of a single firm's developers.
Enterprises must develop and implement a comprehensive AI platform that mandates rigour and consistency, alongside stringent ethical and privacy standards, irrespective of the user, the algorithms, or the use case.
How do you view the evolution of human-AI collaboration in finance, particularly for roles requiring complex decision-making and client relationships?
Simon: One thing that hasn't been well developed is the responsiveness of LLM-based technology to human partnerships. Employees have responded to LLMs with curiosity and openness, at least mostly.
On the other hand, LLMs are an implacable and indifferent technology. For example, a GPT model is the same tomorrow as it is today, no matter what trials and tribulations you've shared with it.
I think that until AI starts growing and adapting with its users, collaboration will be stunted and one-sided.
RD: We don't see LLMs replacing human expertise in finance, but elevating it. Especially in complex decision-making and client advisory roles, AI becomes a trusted co-pilot: summarising information, highlighting risks or identifying unseen patterns, so human professionals can make faster, better-informed decisions.
The most effective collaborations will come from reimagining workflows, not just adding AI into existing ones. When AI handles information synthesis and pattern recognition, human professionals can focus on relationship building, strategic thinking and the nuanced judgement that clients value most.
This is particularly powerful in client-facing roles, where AI can provide real-time insights during conversations, suggest personalised solutions based on comprehensive analysis, and ensure no relevant considerations are overlooked, all while preserving the human connection that remains central to financial services.
RH: This needs to be complementary with a range of tools that ensure not only transparency and explainability but that provide clear and concise methods of communication as part of any advisory role where decisions are deemed to be critical.
There is a wide range of tools that need to be developed and enhanced, but one approach that I find highly intuitive is to have an automated feedback process from human to machine that allows the LLM to learn from this interaction to improve the level of understanding and trust in the analysis.
Ultimately, for critical functions or decisions, there must be a mutually balanced engagement with a heavy dose of advisory-based limitations on the LLMs, where the outcomes can have critical implications for humans as well as society.
Looking ahead, what infrastructure and talent investments do you believe will be essential for financial institutions to successfully leverage LLMs at scale?
Simon: This depends on how much of the value creation in financial institutions becomes commoditised and how much value remains in the parts of the processes that can still be a source of differentiation.
If the bulk of a Financial Service Institution's (FSI's) business process can be commoditised with LLMs, then that FSI should put very little investment into infrastructure and talent to support it.
Instead, the focus must shift to parts of the process that need differentiation. Things like sovereign ownership and information assurance may be what drives this, or it might be possible for FSIs to use their data and skills to outcompete generalist hyperscaler models in specialist areas.
As compute grows and LLM technology itself democratises and commoditises, the pendulum could well swing back to the folk who own their own AI.
RD: To scale LLMs safely and effectively, investment in both infrastructure and talent is critical. On the infrastructure side, secure data pipelines, scalable model hosting environments and integration layers across business systems are foundational.
These aren't nice-to-haves - they're the backbone that enables reliable, compliant AI operations at enterprise scale.
On the talent side, it's not just about AI engineers; it's about bringing together product managers, risk leaders, compliance professionals, and domain experts to co-create solutions.
The most successful AI initiatives require cross-functional teams that understand both the technology and the business context.
The institutions investing here today are building the muscle they'll need to lead tomorrow. This isn't just about adopting AI, it's about developing the organisational capabilities to innovate responsibly and execute at scale.
RH: AI safety will become a growing area of focus and investment. A core element of this will be the need for firms to invest in an enterprise-wide AIOps platform that integrates a range of capabilities to support a robust model lifecycle management capability as well as security requirements.
As well as supporting AI Safety, this also helps give firms the agility to comply with rapidly evolving operational and regulatory requirements.
Open-source AI tools that are community-developed and carefully managed can ensure a high degree of operational resiliency. They can provide the ability to run critical AI workloads on any platform while providing the required accessibility and agility transparently and consistently across the enterprise.
Investments in wide-scale enablement and learning programmes are critical to a firm's ability to effectively and safely integrate gen AI capabilities into business workflow processes and customer-facing capabilities in a trustworthy manner.
Agentic AI offers exciting transformative capabilities, not only for executing complex workflows, but potentially supporting autonomous oversight capabilities as part of a robust AI Safety framework.
These capabilities can improve current methods used for monitoring, validating and correcting outputs that are inaccurate, biased, or misleading.

