Erik Kooistra: Towards a model risk management framework for AI models

Erik Kooistra: Towards a model risk management framework for AI models

Risicomanagement Kunstmatige intelligentie Technologie
Erik Kooistra (foto archief Probability & Partners) 980x600.jpg

By Erik Kooistra, Service Line Lead Model Validation at Probability & Partners

The AI revolution introduces significant new risks, which include issues with interpretability, biases, stability, lack of supervision and inaccurate training sets. Banks, insurance companies, pension funds and asset managers need to update their model risk frameworks to address these challenges.

In the financial sector, the recent integration of AI technologies demands a thorough approach to model risk management (MRM). It involves tailoring traditional MRM practices to suit the nuances of advanced AI models within machine learning (ML), reinforced learning, natural language processing (NLP) and generative AI (GenAI). As these technologies, particularly GenAI and NLP, become increasingly integral to crucial decision-making processes within financial institutions, understanding and mitigating the inherent risks associated with them becomes essential.

A fundamental step in managing model risks for AI methodologies is recognizing their limitations. A starting point in understanding these limitations would be to group AI methodologies in holistic classifications. One can easily lose overview of the range of AI methodologies and their applications with the rapidly evolving integration of artificial intelligence in the financial industry.

One can group AI methodologies in many different classifications such as: ML, reinforcement learning, GenAI, NLP, computer vision, speech & audio processing, robotics & control systems, expert systems & rule-based AI, anomaly detection and meta-learning & AutoML. Each AI classification can be further grouped into different types of AI models with inherent traits within that AI classification.

Consequently, one would define MRM standards tailored to each type of AI model. With increasing complexity and more dynamic recalibration of AI models, the model risk standards would need to be enhanced accordingly.

Let us take machine learning as an example, which is often seen as the cornerstone of AI in the financial sector. Machine learning models can be further categorized into supervised learning, unsupervised learning, semi-supervised learning and self-supervised learning.

Supervised learning, encompassing regression and classification techniques, is extensively used for credit scoring, fraud detection, and algorithmic trading. Through techniques such as clustering and dimensionality reduction, unsupervised learning can uncover hidden patterns in market data and customer behavior. This enhances segmentation and personalization efforts, allowing financial institutions to better understand their customers and market dynamics. For unsupervised learning models, MRM would emphasize the dynamic recalibration of these models to ensure accuracy and reliability over time.

Integrating AI MRM into the traditional model lifecycle

Next to recognizing the limitations and characteristics of different types of AI methodologies, one can adopt MRM standards for AI methodologies into the traditional model lifecycle, used in many traditional MRM frameworks. By adapting the model lifecycle stages one can address the unique challenges posed by AI systems.

03062024 - Probability & Partners - Figuur 1

Some examples are:

  • The initiation phase should include a clear definition of requirements of the AI model, addressing both the purpose of the AI model and its intended domain and scope. This stage must also ensure compliance with relevant regulations, such as the newly introduced AI act, and involve potential users to align the AI system's capabilities with actual business needs. It is crucial to align the inherent risks and characteristics of the AI model with the financial institution's risk appetite statement.
  • During the model development, AI models require rigorous testing to address biases and ensure data quality and quantity. This involves employing qualified modelers, thorough back-testing, stress testing, and ensuring sound theoretical foundations are applied. Impact analysis and feature engineering are also pivotal to develop robust AI models capable of handling diverse scenarios. One could also consider implementing a challenger model (based on a traditional non-AI methodology) to benchmark the AI model.
  • AI model validation extends beyond traditional tests. Next to an independent review by qualified teams to assess the model’s assumptions, theoretical background, and regulatory compliance, model validation must specifically focus on the AI's interpretability and its potential biases, ensuring these are thoroughly examined and mitigated.
  • The implementation of AI models should adhere to sound architecture principles and include comprehensive test plans, user testing and regression testing. This stage should ensure the AI model performs reliably across different scenarios, including edge cases. Continuous monitoring and performance assessments are critical to ensure product readiness and operational integrity.
  • In the model use phase, ongoing monitoring becomes vital. AI models, due to their complexity, require dynamic calibration and real-time feedback mechanisms to adapt to new data and conditions. It is important that the human users keep the original scope and purpose of the model in mind, since dynamic calibration can significantly impact the model's behavior. User documentation and clear communication about the model's capabilities and limitations are essential to maintain transparency and trust. In addition, it is recommendable to keep track of AI models, their applications, and the type of AI model in light of the new AI act regulation.

Throughout the model’s lifecycle, continuous risk management is essential. This includes updating the MRM framework based on new data, AI methodologies and insights, and adapting to changes in the operational environment and the regulatory landscape. It is paramount in the deployment of AI technologies to incorporate ethical and legal considerations in the MRM framework. Ensuring that these deployments comply with regulations, adhere to ethical guidelines, and align with organizational values is critical to prevent misuse and mitigate any negative societal impacts, while at the same time allowing financial institutions to adapt to the evolving AI landscape.

Model risks of large language models

Recently, LLMs, such as GPT-3 and GPT-4 from OpenAI and other sources such as Claude3, Llama3 and Gemini, have been increasingly utilized across various domains in the financial industry due to the advantages of enhanced efficiency, accuracy, decision-making and their chatbot capabilities.

Large language models can be classified as a subfield of NLP models, which involves the interaction between computers and human language, and generative models, since they produce new text that is coherent and contextually relevant based on the patterns and knowledge they have learned during training.

LLMs use deep learning techniques, particularly neural networks with many layers to learn from large language datasets, and are designed to understand, interpret, and generate human language in a way that is both meaningful and useful.

Next to the broad considerations of integrating AI MRM into the model lifecycle, there are specific model risks introduced by the usage of the Large Language models.

Given the limitations of LLMs, financial institutions must critically assess potential biases in the models’ training data and understand their impact on outputs. Ensuring accuracy and reliability requires rigorous verification, cross-referencing, and evaluating the quality of training data. Employing diverse datasets and bias-correcting strategies are crucial to mitigate demographic, cultural, or ideological biases in the data.

LLMs are also susceptible to adversarial attacks – intentionally crafted inputs that can manipulate outputs. Thus, assessing the model's robustness and implementing defensive measures is imperative. As an example, one could consider implementing data vaults for protection of the input data of the LLM.

In addition, the ‘black box’ nature of LLMs can make it challenging to discern how decisions are derived. Enhancing model transparency and explainability can assists stakeholders in understanding the decision-making process, thereby reducing risks associated with uninterpretable or biased decisions.

In conclusion, financial institutions leverage the power of AI technologies and large language models in particular, these technologies have many advantages, such as improving the decision-making process and higher efficiency. At the same time, it becomes essential for financial institutions using these technologies to embrace a robust AI MRM framework on top of the traditional MRM framework to ensure these tools are used responsibly and effectively, thus safeguarding institutional integrity and customer trust.