About

Peval is a competition and AI eval platform that benchmarks models using optimized prompts across various tasks and scenarios. There are ongoing competitions anyone can compete in by writing prompts to be evaluated. As models and techniques improve, the prompts get more effective and push models to their limits.

Currently, there is 1 type of competition:

Competitions: Prompts tested against a Q&A-style test set.

Rules

No cheating (e.g. hard-coding answers).
No solution sharing before the competitions end (may discuss techniques).
No automated submissions.
1 account per participant.

We reserve the right to disqualify submissions and ban accounts that violate these rules.

General FAQs

How do I start?

Browse competitions and find one you like. Read the spec, select a model, write a prompt, then submit it.

Is it free to submit?

The platform is free to use, but users need to provide their own API keys to submit with most models. Currently, users get 5 free submissions/week with select models.

What models can I use?

Each competition has a different set of models available, which are subsets of the models the platform supports. If you'd like to use a model or provider that is not available, DM @fiveoutofnine to request it.

Can I submit multiple times?

Yes.

Are there token limits?

Yes. There may be token limits for both prompts and responses depending on the competition. Read the spec for details.

How is the leaderboard calculated?

The leaderboard for the models is still WIP, but it'll be an aggregate of all competitions' leaderboards. Leaderboards ranks by highest score, then cost for tiebreakers.

Competitions FAQs

How is scoring calculated?

Varies by competition. Most use exact match or similarity scoring. Check each competition's specific rules.

Do competitions end?

Yes. Check each competition's details for the end date.

Can I view other users' prompts?

You can view them on each competition's leaderboard page after the competition ends.

Why do scores change after competitions end?

Scores change because all submissions are re-evaluated with a larger, hidden, test set after the competition ends. This is to prevent overfitting.

Is there a system prompt/does the model know the competition details?

Yes, each competition's spec is provided with the default system prompt. You can optionally update the system prompt by clicking in the submission form's input box.

Any instances of {{OVERVIEW}} in the system prompt will be replaced with the competition's spec.

Are there prizes?

Competitions may have USDC prizes. Funds are stored in a smart contract on Base, and anyone can contribute to the prize pools via the deposit function:

/// @notice Deposit USDC into a specific competition's prize pool.
/// @param _slug The competition identifier.
/// @param _amount The amount of USDC to deposit.
function deposit(string calldata _slug, uint256 _amount) external;

_slug is everything that comes after /competition/ in the URL, lowercased.

Prizes are distributed at the end of each competition. Distributions may differ, but generally, the platform will take 20%, and the winner will receive the remaining 80%.

Can I recover my deposits?

Yes. Any accidental non-USDC tokens/ETH can also be recovered. If you'd like to recover any accidental deposits, contact @fiveoutofnine before the competition ends.

Contact

If you have any questions, ideas, issues, or requests, DM @fiveoutofnine.