Appearance
Prices LLM usage as you go β
Model Performance and Pricing β
Note
All prices are in USD per 1,000 tokens. MMLU scores indicate model performance on the Massive Multitask Language Understanding benchmark.
High Performance Models β
Model | Input | Output | MMLU | Details |
---|---|---|---|---|
llama3-swiss π¨π | $0.015 | $0.045 | 85.2% | Advanced, Efficient, Recommended |
gpt-4o | $0.045 | $0.09 | 92.3% | Leading Performance |
claude-sonnet | $0.005 | $0.02 | 88.7% | Multilingual, Writing, Coding |
Balanced Models β
Model | Input | Output | MMLU | Details |
---|---|---|---|---|
llama-swiss-medium π¨π | $0.005 | $0.01 | 79.2% | Strong for Size |
mixtral-swiss-big π¨π | $0.01 | $0.02 | N/A | Advanced Multilingual |
mistral-medium | $0.00375 | $0.01125 | 77.3% | Efficient, Multilingual |
mixtral-swiss-medium π¨π | $0.003 | $0.01 | 77.3% | Efficient, Multilingual |
gpt-4 | $0.045 | $0.09 | 86.5% | Consistent |
claude-opus | $0.022 | $0.10 | 86.8% | Strong Reasoning |
Efficient Models β
Model | Input | Output | MMLU | Details |
---|---|---|---|---|
gpt-3.5-turbo-1106 | $0.0015 | $0.003 | ~70% | Legacy |
mistral-tiny | $0.00042 | $0.00126 | 60.1% | Compact, Fast, Cost-Effective |
mistral-small | $0.0012 | $0.0036 | 70.6% | Balanced Speed/Quality |
Performance Notes
- MMLU scores marked with (1) indicate single-shot performance
- Scores marked with (5-shot) use few-shot learning
- N/A indicates pending benchmark data
MMLU scores
While MMLU scores provide a useful metric for comparing language model capabilities, they represent only one dimension of performance. These scores primarily measure how well models can handle a standardized set of tasks, but they do not fully capture broader skills such as multilingual comprehension, information retention, domain-specific usage, programming proficiency, or complex reasoning abilities. In practice, different tasks place different demands on a modelβs underlying architecture and training data, causing performance to vary considerably across these domains. As a result, MMLU should be seen as a helpful indicator rather than a definitive measure of a modelβs overall quality or suitability for a given application. Source: Llm leaderboard
Rate Limiting β
All endpoints have a combined limit of CHF 50 per month. If you would like to increase it, please contact support with your estimated usage and use case.