You have seen the demo. The AI reads a filing, extracts the rate tables, summarizes the methodology. It looks right. It sounds right. But when you check the loss development factors against the source PDF, two of the eight values are wrong, and there is no way to tell which ones without re-reading the entire document yourself.
That is the core problem with insurance AI verification today: the output is plausible, but the work to confirm it is nearly as much as the work to do it from scratch. For actuaries, compliance teams, and anyone whose signature goes on a regulatory filing, “plausible” is not a professional standard. Verifiable is.
The scale of demand makes this gap more urgent. PwC’s 2025 Global Actuarial Modernization Survey found that 87% of insurers are actively modernizing their actuarial functions, with 94% citing efficiency as the primary driver. Yet automation maturity across actuarial tasks averages just 2.5 out of 5, and actuaries still spend more than half their time on data preparation. The ambition is there. The tools that meet the verification bar are what’s lagging.
Insurance AI will not be adopted at scale because it can generate answers. It will be adopted when the people responsible for those answers can verify them faster than they could have produced them manually. That standard, verification-first, is not just good engineering. Regulators are already codifying it.
Verification-first is not a feature. It is the minimum viable standard for AI that works in insurance.
The Accountability Problem
Insurance work is signed work. An actuary who certifies a rate filing is personally accountable for the accuracy of every number in it, under the Actuarial Standards of Practice (ASOP No. 56), under state regulatory requirements, and under their professional code of conduct.
This is not a technicality. ASOP No. 56, the Actuarial Standards Board’s standard on modeling, requires actuaries to understand the models they use, validate model output, evaluate and mitigate model risk, and document their methodology and assumptions. When an actuary uses a model developed by others, including an AI system, they are still responsible for the output. The standard is explicit: “an actuary using a model developed by others in which the actuary is responsible for the model output is subject to this standard.”
The International Actuarial Association (IAA) reinforced this in January 2026 with a consultation draft on professional considerations for actuaries interacting with AI systems. The paper addresses accountability directly: actuaries must consider their professional responsibility under applicable laws, standards, and codes of conduct when using AI, including understanding the AI system’s limitations and the conditions under which its outputs may be unreliable. It emphasizes that professional judgment cannot be delegated to a model.
This creates a fundamental constraint that consumer AI products do not face. ChatGPT does not sign rate filings. An actuary does. And that signature carries weight under law, professional standards, and regulatory requirements. Any AI tool that enters this workflow must be built to support verification, not just production. (For a closer look at where AI actually delivers value in actuarial work today, see AI for Actuaries: What Actually Works in 2026.)
The Regulatory Ground Has Already Shifted
The gap between “AI is coming to insurance” and “regulators expect you to govern your AI” has closed faster than most carriers anticipated. Three regulatory actions in the last three years have drawn the lines.
Colorado SB 21-169 Signed
First state law to prohibit unfair discrimination from insurers’ use of algorithms and predictive models. Requires governance frameworks and officer attestation.
State LawColorado Regulations Take Effect
Implementing regulations go live. Division of Insurance begins expanding governance requirements beyond life insurance to auto and health.
State LawNAIC Model Bulletin Adopted
Sets baseline AI governance expectations for insurers: documentation of inputs, outputs, and decision processes; testing and validation; third-party vendor oversight.
NAIC ModelNYDFS Circular Letter No. 7
New York requires AI governance frameworks for underwriting and pricing, proxy assessments for protected classes, and board-level accountability for AI outcomes.
NY Regulation25 Jurisdictions Adopt NAIC Bulletin
Connecticut adds annual AI certification. Virginia strengthens verification language. Iowa becomes first state to define “bias” and “outcomes testing” in this context.
Adoption WaveNAIC Model Bulletin (December 2023)
The NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers, adopted December 4, 2023, sets the baseline. It requires insurers to establish a written AI governance program that ensures decisions made or supported by AI systems comply with all applicable laws, including unfair trade practices, unfair discrimination, and rate adequacy standards. The bulletin applies across the entire insurance lifecycle: product development, underwriting, pricing, claims, and distribution.
The bulletin’s governance expectations are specific. Insurers must maintain documentation of AI system inputs, outputs, and decision processes. They must conduct testing and validation to ensure AI systems produce accurate, non-discriminatory results. And they must extend these governance requirements to third-party AI vendors.
As of May 2026, 25 jurisdictions have adopted the model bulletin, including Connecticut (which added an annual AI certification requirement), Virginia (which strengthened verification language), and Iowa (which became the first state to formally define “bias” and “outcomes testing” in this context). The direction is clear: more states, more specific expectations.
New York Circular Letter No. 7 (July 2024)
New York went further. NYDFS Circular Letter No. 7, issued July 11, 2024, addresses the use of AI systems and external consumer data in underwriting and pricing. It requires insurers to establish governance and risk management frameworks, conduct proxy assessments to ensure AI inputs do not serve as proxies for protected classes, and perform quantitative assessments of disproportionate adverse effects.
The circular letter places accountability at the board and senior management level. It does not expect executives to perform day-to-day AI implementation work, but it makes them responsible for overall outcomes. It also extends to third-party vendors: insurers must conduct appropriate oversight of any AI system provided by a vendor, regardless of whether the insurer understands the system’s internal workings.
Colorado SB 21-169
Colorado SB 21-169, signed in 2021 and implemented through regulations effective in late 2023, was the first state law to specifically prohibit unfair discrimination resulting from insurers’ use of algorithms and predictive models. The law requires insurers to establish risk management frameworks, provide attestation by officers that those frameworks are in place, and conduct ongoing monitoring of algorithmic outcomes.
Colorado’s approach is notably prescriptive. The Division of Insurance has since expanded its governance regulations beyond life insurance to private passenger auto and health benefit plans, and has developed draft regulations for quantitative testing of algorithms, requiring insurers to demonstrate that their models do not unfairly discriminate based on race or ethnicity.
What This Means in Practice
Taken together, these frameworks create a regulatory environment where “we used AI” is not an explanation; it is the beginning of one. Regulators expect to see documentation of what the AI did, how it was validated, what governance controls were applied, and who is accountable for the output. An AI tool that produces answers without audit trails is not just operationally risky. It is a compliance liability.
What Insurance AI Verification Actually Requires
Verification-first is a set of specific capabilities that an AI system must have to operate in regulated insurance workflows. Here is what it means in practice.
Source Citation on Every Output
Every number, every extracted value, every summarized finding must link back to a specific source: a page in a filing, a cell in a rate table, a paragraph in a DOI bulletin. Not “this information comes from Filing XYZ” but “this value of 1.0847 appears on page 47 of SERFF Tracking Number ABCD-123456789, Table 3, Row 12.”
This is the difference between a summary and a work product. A summary tells you what the AI concluded. A verification-ready output tells you where to look to confirm it.
Transparent Intermediate Steps
For any calculation, whether a rate indication, a competitive comparison, or a loss development selection, the intermediate steps must be visible. What inputs were used. What methodology was applied. What assumptions were made. If an AI produces a final number without showing its work, it has the same evidentiary value as a guess.
This directly maps to ASOP No. 56’s requirements for model governance: actuaries must understand the model, validate its outputs, and document the assumptions and methodology. An AI system that hides its reasoning makes compliance with these standards functionally impossible.
Reproducibility
If a different analyst runs the same analysis on the same source data, the AI should produce the same result. Stochastic variation in large language models, where the same prompt produces different outputs on different runs, is a fundamental problem for regulated work. Insurance AI verification requires deterministic outputs for deterministic tasks, or at minimum, confidence intervals and methodology documentation for tasks with inherent variability.
Human Review Checkpoints
Verification-first does not mean the AI verifies itself. It means the system is designed so that human experts can verify efficiently. This is the critical distinction: AI should reduce the time required for human review, not eliminate it. The goal is to shift the actuary’s role from performing the analysis to confirming it, but confirming it requires that the analysis be presented in a reviewable format with clear source trails.
Source Citation on Every Output
Every number links to a filing tracking number, page, table, row, and column. Not “from Filing XYZ” but the exact location in the source PDF.
NAIC Model Bulletin: documentation of AI system inputs, outputs, and decision processesTransparent Intermediate Steps
For any calculation, the inputs, methodology, assumptions, and intermediate results are visible. A final number without shown work has the same evidentiary value as a guess.
ASOP No. 56: actuaries must understand, validate, and document model methodologyReproducibility
Same analyst, same data, same result. Stochastic variation in LLM outputs is a fundamental problem for regulated work. Deterministic outputs for deterministic tasks.
NYDFS CL No. 7: governance frameworks must ensure fair, transparent, compliant decisionsHuman Review Checkpoints
The system is designed for efficient expert verification, not self-verification. AI shifts the actuary from performing the analysis to confirming it.
IAA (Jan 2026): professional judgment cannot be delegated to a modelNo source citation
No page reference
SERFF ABCD-123456789, p.47, Table 3, R12
SERFF ABCD-133456789, p.12, Table 1, R47 Amended
Amend: ABCD-133456789 (2025-01-15)
Why Generic AI Fails the Insurance AI Verification Test
General-purpose AI tools, the models built for consumer search, content generation, and broad enterprise applications, were not designed for this standard. Their architecture optimizes for plausibility, not verifiability. They are trained to produce the most probable next word, not the most accurate extracted value.
A large language model that generates a well-phrased summary of a rate filing may get eight out of ten facts right. In consumer applications, that is impressive. In insurance, those two wrong facts could mean an incorrectly calculated premium, a compliance violation, or a DOI objection that delays a market entry by months. (The downstream cost of those delays is staggering; we broke down the economics in The $20K-Per-State Problem.) The failure mode is particularly dangerous because the wrong output looks exactly like the right output. There is no formatting error, no obvious flag. Just a confidently stated number that happens to be incorrect.
We tested this directly. In our research on self-verifying domain agents, we ran a coding agent through insurance rating tasks ranging from 17 to 128 steps. Using a general-purpose approach, mean per-policy error was 103.7%. That means the average calculated premium was more than double the correct amount, or less than half of it. A domain-specific compiled language with built-in verification reduced that error to 0%. The gap between “close enough” and “correct” is not a tuning problem. It requires a fundamentally different architecture.
An actuary who signs a rate filing is personally accountable for every number in it. No AI vendor’s terms of service change that.
The U.S. Treasury’s December 2024 report on AI in financial services, based on 103 comment letters from financial firms and stakeholders, specifically highlighted bias, explainability, and hallucination as key risks. Multiple respondents noted that emerging AI technologies are “introducing new risk, leading firms to be cautious about deploying them broadly in customer-facing applications.”
The EY European Financial Services AI Pulse Survey (2025) (European financial services, n=410 C-level executives across banking, insurance, and wealth management) found that 30% of financial services companies have no or limited controls in place to ensure AI is free from bias. Meanwhile, 57% of leaders said their organization’s approach to technology-related risk was insufficient to address challenges from emerging AI.
These numbers describe an industry that is moving fast on adoption and slowly on verification. That gap is where regulatory risk lives.
What to Demand from Insurance AI Tools
If you are evaluating AI tools for actuarial, pricing, or filing work, the verification question should come before the capability question. A tool that can extract 500 rate tables is worthless if you cannot confirm that extraction number 347 is correct without re-reading the source PDF yourself.
Five criteria that matter:
- Source citation on every output. Not a summary bibliography, but a per-value, per-finding citation that links directly to the source document and location. If the tool cannot do this, it is not ready for professional use.
- Domain-specific architecture. A system that understands insurance document structures (SERFF filings, rate manuals, amendment chains, exhibit formats) will produce more accurate and more verifiable output than a general-purpose model answering insurance questions. The difference is not marginal.
- Verification packet, not just results. The output should include the evidence, not just the conclusion. For a rate table extraction: the extracted values, the source PDF locations, and a mechanism to compare them. For a competitive analysis: the specific filed factors, filing tracking numbers, and effective dates behind every comparison.
- Auditability under regulatory scrutiny. Given the NAIC model bulletin, NYDFS Circular Letter No. 7, and Colorado’s governance regulations, the output from any AI tool used in insurance workflows may need to be produced during an examination or investigation. Build this into your evaluation: if a regulator asks “how did you arrive at this number?” can you answer from the AI tool’s output alone?
- Honesty about limitations. Any vendor that tells you AI replaces actuarial judgment either does not understand the work or is not being straight with you. The IAA’s January 2026 guidance is clear: professional judgment cannot be delegated to a model. Look for tools that are explicit about what they automate (research, extraction, calculation) and what they leave to you (selection, judgment, certification).
The Standard Is Already Set
The trajectory is clear. Twenty-five jurisdictions have adopted the NAIC’s AI governance expectations. New York and Colorado have gone further with their own frameworks. The professional standards, from ASOP No. 56 to the IAA’s January 2026 guidance, are equally direct: actuaries own their output regardless of what produced it. And the industry surveys confirm that most organizations know their current AI controls are insufficient.
That gap will close, but it will close from the verification side, not the capability side.
The regulatory direction points one way: more states adopting AI governance requirements, more specific documentation expectations, more examination questions about how AI-generated output was validated.
Verification-first is not a feature. It is the minimum viable standard for AI that works in insurance.
Effective AI builds specialized AI agents for actuarial and insurance product workflows, including filing research, rate table extraction, competitive analysis, and exhibit generation. Every output ships with a verification packet: source citations, intermediate calculations, and traceable findings. If you are evaluating AI tools and want to see what that looks like in practice, see a verification packet โ