Chess | Peval Competition

Description

Given a chess position in FEN, write a prompt that outputs the best next move in UCI (e.g., e2e4, a7a8q) to be scored against Stockfish 17. The LLM should output just 1 move, and no tools or code are allowed.

Evaluation

Each submission is evaluated on 250 positions from real games ranging different ELO ranges, game phases, and tacticality. For each position, the grader has a precomputed table of all legal moves with Stockfish at a fixed budget, where the number is the evaluation in centipawns from the side to move after applying the move:

[
  ["e5h8", 636],
  ["e5g7", 297],
  ["e5e4", 126],
  ...
]

The grader computes the centipawn loss (CPL) for each move from the LLM as $\texttt{max}(0,\text{eval}_{\text{best}}-\text{eval}_{\text{move}})$ , then smoothly maps it along a curve $\in[0,1]$ :

\text{Score}=\frac{1}{1+\text{CPL}/\lambda}\quad\text{with}\ \lambda=150

Small evaluation noises of $\text{CPL}\leq10$ are ignored and scored 1, and invalid moves are scored 0. The final score will be the mean of all 250 results.

Submission Requirements

Standard rules apply.
The output should be a single, valid chess move in UCI format inside <answer> tags.
Maximum input length of 100,000 characters.
Maximum output of 16,384 tokens.
No tool-calling.