Benchmarks
Realistic benchmarks for financial AI.
We evaluate AI models on the tasks that matter most to financial institutions—using real data, realistic scenarios, and the metrics that are most relevant for the domain.
KYBench - Adverse Media Search
Updated Apr 2, 2026Evaluating AI agents for adverse media research in Know Your Business reviews. Tests how well AI systems investigate regulatory red flags, fraud history, and sanctions violations across real businesses.
David Ahn, Maximilian Eber, PhD, Sahith Jagarlamudi
Task types
- Web investigation
- Adverse media detection
- Evidence quality
- Risk calibration
Data source
47 real businesses annotated by expert compliance practitioners
Evaluation method
Elo ratings, Adj F1, and RAIS evidence quality scoring
Last updated
2026-04-02
FinSpread-Bench
Updated Mar 10, 2026The first public benchmark for agentic financial spreading. Evaluates how well AI systems extract, calculate, and reason across financial documents—like bank statements, tax returns, payslips, and financial spreads—in real-world decision scenarios.
Nico Klees, Maximilian Eber, PhD
Task types
- Extraction
- Cross-document reasoning
- Calculation
- Structured output
Data source
Anonymized data from Taktile co-development partners
Evaluation method
Automated metrics and expert human evaluation
Last updated
2026-03-04