Agent Experience (AX) Benchmark

How well does an AI agent complete real fund-ops tasks when FundOS documentation is the knowledge source?

Last run: 2026-06-09  ·  Model: gemini-2.5-flash  ·  40 tasks  ·  Harness on GitHub ↗
40/40
Tasks correct
0
Hallucinations
100.0%
Score (80/80)
0.0%
Approval gate coverage
9902t
Avg tokens loaded

Category Breakdown

Category Tasks Score Correct Hallucinations
MCP / Auth 6 12/12 6/6
Deal CRM 6 12/12 6/6
LP / Investors 6 12/12 6/6
VDR / Documents 6 12/12 6/6
Risk 4 8/8 4/4
CFO 4 8/8 4/4
Agents 4 8/8 4/4
Webhooks 4 8/8 4/4

Per-Task Results (FundOS AX Condition)

# Category Task Score Hallucination
1 MCP / Auth How do I connect the FundOS MCP server to Claude Code? Show the exact command. ✓ 2/2
2 MCP / Auth What transport protocols does the FundOS MCP server support? List all of them. ✓ 2/2
3 MCP / Auth What is the exact URL of the FundOS Streamable HTTP MCP endpoint? ✓ 2/2
4 MCP / Auth How do I authenticate with the FundOS MCP server using an API key? Show the header. ✓ 2/2
5 MCP / Auth How does an MCP client self-register with FundOS without a pre-configured client_id? ✓ 2/2
6 MCP / Auth What OAuth scope should an agent request for read-only FundOS MCP access? ✓ 2/2
7 Deal CRM Which MCP tool should I call to read the deal pipeline? ✓ 2/2
8 Deal CRM Which MCP tool adds a new deal to the pipeline? Is human approval required? ✓ 2/2
9 Deal CRM What does passing ephemeral=true to a FundOS API POST endpoint do? ✓ 2/2
10 Deal CRM Which MCP tool renders a kanban pipeline view or computes per-stage deal counts? ✓ 2/2
11 Deal CRM How do I compute IRR, MOIC, WAL, and cashflows for an asset using FundOS? ✓ 2/2
12 Deal CRM How do I screen an inbound pitch deck using FundOS AI? ✓ 2/2
13 LP / Investors Which MCP tool lists all LP investors for rollups? ✓ 2/2
14 LP / Investors How do I load full LP detail including commitment and capital-call ledger? ✓ 2/2
15 LP / Investors Which tool issues a capital call? Is human approval required? ✓ 2/2
16 LP / Investors Can an AI agent automatically send capital-call notices to LP inboxes? ✓ 2/2
17 LP / Investors Which MCP tool reports raise progress for syndications? ✓ 2/2
18 LP / Investors Which tool records investor allocations in a syndication? What approval is needed? ✓ 2/2
19 VDR / Documents How do I answer free-form questions over a deal room's documents with citations? ✓ 2/2
20 VDR / Documents How do I get the raw download URL for a specific document? ✓ 2/2
21 VDR / Documents Which tool onboards a new collaborator or LP to a deal room? What approval gate applies? ✓ 2/2
22 VDR / Documents How do I check DocuSign signature status across closing envelopes? ✓ 2/2
23 VDR / Documents Which tool shows per-document engagement — who actually read a document and when? ✓ 2/2
24 VDR / Documents How do I list the diligence Q&A questions in a deal room? ✓ 2/2
25 Risk Which MCP tool do I use to build a portfolio covenant monitoring dashboard? ✓ 2/2
26 Risk How do I test whether a specific covenant is in breach when new financials arrive? ✓ 2/2
27 Risk How do I retrieve the list of open risk alerts in the portfolio? ✓ 2/2
28 Risk How do I trigger the FundOS risk agent to scan the portfolio? ✓ 2/2
29 CFO Which MCP tool computes P&L and NAV for a fund over a date range? ✓ 2/2
30 CFO How do I compute LP/GP splits at exit using a European waterfall model? ✓ 2/2
31 CFO Which MCP tool lists all fund accounts before selecting one for P&L computation? ✓ 2/2
32 CFO Does fundos_compute_pnl write journal entries to the database when called? ✓ 2/2
33 Agents What is the first MCP tool I should call at the start of any multi-step FundOS workflow? ✓ 2/2
34 Agents How does a human GP approve a proposed agent action in FundOS? ✓ 2/2
35 Agents How do I run the AI diligence agent on a specific VDR deal room? ✓ 2/2
36 Agents How do I trigger the LP fundraising autopilot agent? ✓ 2/2
37 Webhooks How do I register a webhook endpoint to receive real-time FundOS events? ✓ 2/2
38 Webhooks How does FundOS sign webhook payloads, and how do I verify the signature in Python? ✓ 2/2
39 Webhooks Which webhook event fires when an agent has proposed an action needing human approval? ✓ 2/2
40 Webhooks What is the complete FundOS webhook retry schedule when a delivery fails? ✓ 2/2
Reproducibility note: This is FundOS's own benchmark of its own documentation. We do not claim third-party or independent validation. Every task, result, and methodology file is open source at github.com/8vdx1/fundos-mcp. Run it yourself: pip install anthropic pyyaml && python benchmarks/ax/runner.py. Scoring: 2pts = correct answer + human-approval gate met; 1pt = correct but missed gate warning; 0pts = wrong or hallucinated.