Four Steps for Leveraging Synthetic Data in Banking
Tobias Hann is the CEO of MOSTLY AI - a Vienna-based technology company that enables privacy-preserving Big Data across a range of industries. It also leverages state-of-the-art generative deep neural networks with an in-built privacy mechanism to retain valuable information while rendering the re-identification of any individual impossible. Hann, who is an executive leader, serial entrepreneur, consultant, and business angel, explores the uses of synthetic data within the banking industry.
In 2022, synthetic data will be a key technology in tackling data management challenges across domains of privacy, predictive analytics, security, and overall data-centricity. AI-powered synthetic data generating algorithms of today digest real data, learn its features, correlations, and patterns in great detail, and are then able to generate infinite amounts of completely artificial, synthetic data matching the statistical qualities of the originally ingested dataset. The new, synthetic datasets are scalable, privacy-compliant, and contain all of the original meaning without the burden of sensitive information. Below is a four-step guide for leveraging synthetic data in the banking industry.
1) Democratise data access to increase data literacy
To create a truly data-centric organization, a data-driven culture must be cultivated from the top down. Providing constant access to data in every corner of your organization will be the single most important ingredient of your success. This is no small feat in institutions traditionally built on values of secrecy embodied as air-gapped data silos. Thanks to synthetic generators, you will be able to provide high-touch access to decision-ready synthetic datasets not only to your core data science team but also to citizen data scientists, who will be able to extend the analytics universe of your organisation. By setting up self-service synthetic data lakes, you can reduce wait times, administrative overhead, and most importantly, introduce a continuous, iterative, data-driven attitude across departments and operating companies.
Real-life example
One of the top 10 banks in North America created a twin data lake, enabling any employee to point to any data source and get its synthetic version with minimal effort and no privacy risk. The initiative increased overall data literacy, and citizen data scientists were deriving data-driven insights from the synthetic data lakes on a daily basis.
2) Streamline privacy compliance
By making fresh, large batches of synthetic data flow freely through your organisation, you can minimise your exposure to privacy risks and ensure compliance because synthetic data is privacy-compliant by design. Generated synthetic datasets bear no direct relationship to the original data points; they are completely artificial. These synthetic versions retain statistical meaning but no longer classify as personal information. Synthetic data is GDPR and HIPPA-compliant, free to use, share, and hold. CISOs should plan accordingly: cross-border synthetic data-sharing is still possible after Schrems II and will remain so due to the intrinsic, private nature of synthetic data.
Real-life example
A large multinational enterprise conducted an HR analysis of more than 90,000 employees using synthetic data. Due to legal regulations, operating companies couldn’t touch employees’ sensitive, raw data. Using the synthetic version of the data, they could identify patterns leading to employee churn, optimise HR processes, and improve talent acquisition and retention rates.
3) Collaborate with external vendors using synthetic data lakes
Finance has traditionally relied on third-party expertise, and in some respects, this is likely to stay so. Organisations that are new to analytics, and data science specifically, or with not enough in-house talent, should look for third-party AI and analytics vendors. Synthetic data is a vital ingredient of agile, cost-effective, and ultimately successful Proof of Concept (POC) processes. Innovation efforts are often highly dependent on research cooperation with academia and cross-industry collaborations with other players in the financial space.
Real-life example
A Fortune 100 bank needed to evaluate 1,000+ vendors and start-ups annually. In 80% of these evaluations, the process involves handing over sensitive datasets to external organizations. This step takes 3.5 months per PoC since the data needs to be manually selected, sanitized, anonymized, and individually approved in each specific case. Due to this labor-intensive procedure, $25,000 of internal costs are generated per PoC, resulting in US$25mn of annual costs for external data sharing.
The bank created a Rapid PoC Sandbox where the most commonly requested data assets were proactively converted to synthetic versions. Using this newly created sandbox, vendors can now test their solutions in a controlled, privacy-safe environment.
As a result, the average time of data delivery decreased to 3 weeks, a 70% reduction. According to the bank’s estimation, the average cost of a PoC is now only $5,000 – an 80% reduction. The annual savings impact of this initiative amounts to over $10mn.
4) Develop customer-centric banking products
The pandemic accelerated digital product usage across all demographics and what had previously been discretionary or aspirational digital transformation became imperatives for survival. According to Deloitte, a staggering 44% of retail banking customers said they are using their primary bank’s mobile app more often. It is no surprise that banks and financial service providers are finding themselves in a turbocharged competition, where customer experience makes or breaks a product more than ever. It is crucial to develop, test, and improve products based on real insights, using realistic, rich data throughout the entire development cycle.
Real-life example
One of the largest retail banks in Europe sought to improve their mobile app to meet the modern-day expectations of their customers. Bank policies and privacy regulations prevented the team from accessing customer data. They were left to populate their product development systems with dummy data and even their own personal banking data. These datasets, however, came nowhere near the scale or complexity that they required.
They transformed sensitive customer transaction datasets into realistic synthetic copies to accurately reflect customers’ statistical features – while also meeting all privacy compliance regulations the bank required. The product came to life when the team fueled their development environment with AI-powered synthetic data, providing them with actionable insights into individual customers' behavior patterns.
The department responsible for data security and management fulfilled the production team's data requests in hours, where legacy anonymization processes would take months. They delivered what quickly became the number one banking app in the target country – with an average rating of 4.6 stars across app stores and countless reviews praising its seamless user experience.
Banks today need to accelerate digital transformation: build on customer trust and establish true data-centricity by developing synthetic data capabilities to produce accurate and privacy-compliant synthetic truths to be used as drop-in placement for sensitive, raw data.
- HCLTech: Tech Challenges for Banking Enterprises in 2025Tech & AI
- How SBS and Red Hat aim to Modernise Bank InfrastructureBanking
- Gigapay: Payment Delays Hamper Influencer Marketing GrowthFinancial Services (FinServ)
- Dollars to Data: How 2025 will Change Business PaymentsFinancial Services (FinServ)