Rethinking AI Software Due Diligence in the Age of LLMs and Generative AI

Written by Egon Wuchner | Oct 29, 2025 7:48:19 AM

In recent years, the capabilities, availability and acceptance of artificial intelligence (AI) have transformed products and markets beyond recognition. The integration of AI features in software or software controlled products has become essential for growth, profitability and company evaluations. However, this pressure is causing companies to adopt AI without building up the necessary expertise or preparing the organisation for the specifics of AI research and development (R&D) and maintenance.

The result is AI software products that are not fully developed or understood by the organisation, creating new risks (software acting like a black box) for the company and thus for potential investors. Especially within M&A and private equity transactions, a software due diligence of LLM- and GenAI-driven products— must evolve rapidly to address these still rather new and unique technology risks.

Traditional software due diligence focuses on code quality and architecture, automated R&D processes, people/team capabilities, and the scalability of the team and software, as well as its documentation. In part 5 of the blog series on "Software Due Diligence: The Key To Successful M&A Deals", we explore how due diligence should be conducted for AI-based software. We focus on technological areas (e.g. the AI stack and practices) that require additional attention, expertise, and tools.

Our insights aim to make investors aware of the additional risks associated with the development and use of AI features in software-based businesses. While AI features for sure drive the company valuation, one should be aware that it will also increase the levels of potential risk.

LLM-Based vs. Proprietary ML Models: Three Flavors of AI Integration – and Why They Matter for Due Diligence

Before diving into AI software due diligence, it’s worth taking a step back to understand the different patterns of AI integration found in today’s software products. Not every “AI-powered” solution works the same way – and those differences have major implications for valuation, risk assessment, and scalability.

Broadly speaking, there are three main categories:

A. Proprietary ML Models – Purpose-Built Intelligence

This category includes products and services that rely on custom-trained, in-house machine learning models. Their key characteristics are:

Purpose-built for specific use cases (e.g. fraud detection, churn prediction, quality inspection)
Developed internally with domain-specific data pipelines
Deterministic behavior – the same input yields the same output
IP and data control remain fully with the company

Takeaway: Proprietary ML systems are technically complex and costly to maintain, but they provide long-term defensibility through owned IP, data assets, and differentiation.

B. LLM / GenAI-Driven Products – AI as a Service

In this concept, companies integrate external Large Language Models (LLMs) via APIs from providers such as OpenAI, Anthropic, or Google Gemini. Common traits are:

Integration through prompt engineering or Retrieval Augmented Generation (RAG) architectures
Often used for text summarization, chatbots, or document analysis
Enable rapid product iteration, but create dependencies on external providers and pricing models

Takeaway: Ideal for fast innovation and prototyping, but with clear dependency, compliance, and data privacy risks due to limited control over model behavior and data handling.

C. Fine-Tuned Foundation Models – The Middle Ground

These models are not built from scratch, nor are they simply consumed via a static API.
Instead, they are fine-tuned versions of existing foundation models (LLMs/GenAI), trained further with proprietary data to fit specific use cases. Examples include:

BERT models fine-tuned for legal clause classification
T5 models trained to summarize clinical notes
LLaMA models adapted for brand-aligned customer support responses
GPT-style models preference-tuned to comply with internal content policies

Takeaway: This approach blends flexibility and customization with higher technical investment and ongoing model management needs.

Why This Distinction Matters in Due Diligence

For investors and technology assessors, understanding which AI integration pattern a target company uses is critical.

LLM-based products (B & C) carry greater dependency, unpredictability, and regulatory exposure, since they rely on external providers and non-deterministic models.
Proprietary ML products (A), on the other hand, demand more engineering effort and maintenance, but offer stronger IP defensibility, data ownership, and control.

These differences call for distinct due diligence lenses – both when assessing technical robustness and when estimating long-term business value.

Understanding the Role of LLMs/GenAI in the Product

The first step in any diligence process involving LLM/GenAI-Driven Products is to determine how central the AI components are to the product’s core value proposition. There’s a major difference between using a generative model for auto-generating blog summaries and augmenting workflows with GPT-driven assistants or applications completely built on retrieval-augmented generation (RAG)..

Key questions to assess are the following:

Is AI a fundamental part of how the product functions, or is it an enhancement?
Does the product rely on licensed APIs?
How big is the dependency on a single vendor (vendor lock-in)?
What is the model pricing since commercial growth and scalability (e.g. 10x user growth) are highly dependent on it?
What is the potential latency impact on user experience and the product?
How is user- or customer-generated data protected from being used for future training of the used LLM?
Does the AI infrastructure meet the growth requirements in case of self-hosted LLM/GenAI model(s)?

Answering these questions helps establish whether the AI capability is a strategic differentiator — or an implementation detail that can be replaced or removed with minimal impact.

Observability and Fall-Back Management

Large Language Models (LLMs) and generative AI systems are inherently non-deterministic — identical inputs may yield divergent outputs. This probabilistic behavior introduces operational and compliance risks that traditional QA frameworks cannot fully capture.

Due diligence must therefore assess observability, control, and resilience in real-world operation:

Output monitoring: Are model responses logged, categorized, and periodically reviewed for accuracy and policy compliance?
Hallucination detection: Does the system identify or flag misleading, false, or low-confidence outputs?
Prompt management: Are prompts, templates, and test cases version-controlled, peer-reviewed, and benchmarked?
Model versioning: How are LLM updates validated, and is there a structured regression test to detect quality drift?
Fall-back mechanisms: Are escalation paths in place for uncertain or unsafe outputs, including human-review triggers or alternative response models?

For products operating in regulated sectors such as finance, legal, healthcare, or defense, the evidentiary standard for monitoring, documentation, and fall-back management must be significantly higher.

Team Capabilities

LLM-driven software requires specialized expertise that extends beyond traditional ML or DevOps roles. A sustainable AI organization demonstrates depth in:

Prompt engineering and Retrieval-Augmented Generation (RAG)
Model evaluation and benchmark design
Data governance and observability tooling

Key due diligence questions include:

Does the team apply best practices in prompt design, version control, and behavior monitoring?
Is critical model knowledge concentrated in a few individuals?
How much of the AI development is performed in-house versus outsourced?
What is the retention and documentation strategy for key technical roles?

In early-stage startups, key-person dependency is often the largest operational risk if know-how is undocumented or non-transferable.

Security and Abuse Prevention

AI systems expand the attack surface well beyond traditional software security. Due diligence must evaluate defensive depth against new vectors, including:

Prompt-injection attacks and adversarial inputs designed to override safeguards
Abuse detection systems for inappropriate or policy-violating content generation
Third-party safeguards — e.g., content filters, usage limits, and safety layers imposed by API providers

A robust security posture combines preventive controls, active monitoring, and clear incident-response playbooks for AI-specific threats.

Ethics, Bias, and Social Responsibility

Modern AI diligence extends beyond performance to ethical accountability.
Regulators and investors increasingly expect proactive governance around bias and transparency. Due diligence should confirm:

Whether bias or fairness audits have been conducted on core models.
If toxicity and alignment evaluations are performed prior to deployment.
The presence of human-in-the-loop oversight for high-impact use cases.
Transparent user communication when AI systems influence or generate outcomes.

Failure to evidence these controls may expose the acquirer to reputational and regulatory risk — even when technical performance is strong.

Conclusion: Future-Proofing AI Due Diligence

Generative AI blurs the line between deterministic code and probabilistic reasoning.
This evolution requires a new form of software due diligence — one that combines technical depth, regulatory literacy, and ethical awareness.

For investors, the objective is not only to identify technology risks, but to recognize capabilities that create defensible value: robust data provenance, mature MLOps, explainable model behavior, and credible governance.

When executed rigorously, AI-focused due diligence becomes more than risk management — it becomes the foundation for confident investment, responsible innovation, and long-term competitive advantage in the AI-driven economy.

Act Now: Integrate AI Expertise Into Your Transaction Evaluation

Whether you're acquiring a tech company or investing in a data-driven business model – if AI plays a role, you need new criteria, new methods, and partners with hands-on experience.

👉 Cape of Good Code combines deep technology analysis with AI-specific expertise – delivering clear insights on the viability, scalability, and sustainability of your target technology in no time.

📞 Schedule a non-binding initial consultation or request our AI Software Due Diligence.

Links

[0] Photo by Solen Feyissa

View full post