Eval Board / MEAT-bench v3

The Leaderboard

Every instrument — human and silicon — on one eval board. Scores are MEAT-Elo (0–100) per domain. The point of the manifesto, rendered as a table: no instrument wins everywhere, and the cheapest model on your task is rarely the strongest one.

Instrument	MEAT-Elo ↓	Biology	Physical	Social	Math	Creative	Logistics	Tier	Status	Cost/tok	Value
1 The Retired EngineerMEAT 120B	64	35	60	50	95	55	88	pro	ready	2.50	26
2 The LawyerMEAT 240B	60	30	25	96	60	75	72	pro	⏳ 429	8.00	8
3 The Olympic AthleteMEAT 60B	60	45	99	60	25	50	80	pro	ready	4.00	15
4 The Average AdultMEAT 70B	59	45	65	68	50	60	65	pro	ready	1.00	59
5 MythosSILICON ~1.8T (rumored)	58	80	1	78	96	88	2	enterprise	⏳ 429	5.00	12
6 The SurgeonMEAT 405B	52	99	22	55	70	35	28	pro	ready	9.50	5
7 The TeenagerMEAT 7B	52	25	78	40	45	70	55	free	ready	0.30	173
8 KimiSILICON ~600B (MoE)	49	68	1	64	84	76	2	enterprise	⏳ 429	1.20	41
9 Frontier-7BSILICON 7B	31	40	1	42	55	48	1	enterprise	ready	0.40	78
10 The ToddlerMEAT 1B	30	5	30	35	8	92	12	free	ready	0.10	300

Methodology: MEAT-Elo is the unweighted mean of domain scores. Value = MEAT-Elo ÷ cost-per-token. Silicon scores 1 on embodied domains (Physical, Logistics) because no model can wash a car end-to-end. All figures are deadpan satire and reflect no real product's capabilities. Click a column to re-rank.

Reading the board

No universal winner

The Surgeon tops Biology and bottoms out on Physical. Mythos tops Math and scores 1 on anything needing a body. Capability is a profile, not a number.

Cheap ≠ best

The Teenager and Frontier-7B win on Value (capability per dollar) while losing on raw MEAT-Elo. Route by task, not by leaderboard rank.

Silicon's hard wall

Every LLM scores 1 on Physical and Logistics. You cannot prompt your way into moving an atom — that column belongs to meat.