About

Join the community

Connect with other users, share feedback, and stay updated on new features.

What is this?

OCR Arena is a free playground for testing and evaluating leading foundation VLMs and open source OCR models on document parsing tasks. Upload a document, measure accuracy, and vote for the best models on a public leaderboard. OCR Arena was built by the team at Extend. We've initially launched with 10+ models, powered by our friends over at Baseten. New models will be added as they're released. Have feedback or want to see additional OCR models? Let us know via email or X above.

Why did we build OCR Arena?

Document processing has become a core foundation of building AI applications, and OCR is evolving faster than ever. New models are released frequently, but evaluating them remains difficult. Benchmarks only tell part of the story, and most teams care about how models perform on their documents and edge cases. Our goal is to reduce the friction of testing new models and make OCR evaluation open, unbiased, and grounded in real-world performance.

How does ELO rating work?

OCR Arena uses the ELO rating system to rank models based on head-to-head battles. When users vote for the better output, the winning model gains points while the losing model loses points.

The rating change is calculated using the standard ELO formula:

New Rating = R + K × (S - E)

R = Current rating

K = 20 (rating volatility factor)

S = Actual score (1 for win, 0.5 for draw, 0 for loss)

E = Expected score = 1 / (1 + 10^((R_opponent - R) / 400))

Models with higher ratings consistently outperform others in blind comparisons. All models start at 1500 ELO, and ratings update after each vote.