LLM service cost scenarios
Based on crowdsourced cost data from LiteLLM for
- Daily
- Annual
carbon
Enter your assumed rate of energy consumption (kWh per million tokens) and grid emission factor (grams of CO₂ equivalent per kWh) to produce a back-of-the-envelope estimate of carbon emissions from model inference.
- Daily
kgCO₂e - Annual
kgCO₂e
kWh/Mtok references
| Study / model | kWh/Mtok | Context | Original metric | Source |
|---|---|---|---|---|
| GPT-3 175B (GPT-3 paper) | ~13 | GPT-3 on 2020 infra | 0.4 kWh per 100 pages (~30k tokens) | GPT-3 paper |
| Husom et al. (older ChatGPT) | ~9 | GPT-3-era ChatGPT | ~9 mWh/token | Husom et al. summary |
| LLaMA-65B — Samsi et al. | ~0.8–1.1 | LLaMA-65B on V100/A100 | ~3–4 J/token | From Words to Watts |
| Local Llama-3 8B (Baquero) | ~0.17 | 8B model on Apple M3 | <200 J for ~333 tokens | Baquero CACM blog |
| Llama-3.3-70B (Lin 2025) | ~0.11 | 8×H100, FP8, batch 128 | 0.39 J/token | llm-tracker entry |
| Generic 4×H100 node (Arthur) | ~2.7–5.2 (best); ~10–20 (worst) | 4×H100; batch size affects J/token | 9.6–72 J/token | Arthur writeup |
| GPT-4o (Epoch AI) | ~0.375 | 0.3 Wh/query, 800-token assumption | 0.3 Wh per typical GPT-4o query | Epoch AI |
| ChatGPT average query (Altman) | ~0.425 | 0.34 Wh/query, 800-token assumption | 0.34 Wh per ChatGPT query | Altman – Gentle Singularity |
| Gemini Apps – full stack | ~0.30 | 0.24 Wh per prompt, 800-token assumption | 0.24 Wh median text prompt | Google Cloud blog |
| Gemini Apps – active chips only | ~0.125 | 0.10 Wh per prompt, 800-token assumption | 0.10 Wh median prompt | Google Cloud blog |
| Mistral Large 2 (LCA) | ~7–11 | Lifecycle analysis; CO₂-based | 1.14 gCO₂e per 400-token response | Mistral LCA |
| DitchCarbon H100 example | ~23 | Single H100, no batching | 0.008 kWh per ~350 tokens | DitchCarbon blog |
| Older “3 Wh per ChatGPT response” estimate | ~3.75 | 3 Wh/query, 800-token assumption | 3 Wh per response | kmaasrud summary |
Grid Emission Factors (gCO₂e/kWh)
Location-based scope 2
Location-Based Grid Factors (Lifecycle-ish)
| Grid Type | Practical Estimate (gCO₂e/kWh) | Notes |
|---|---|---|
| Very clean grid | ~50 | Hydro / nuclear / high-renewable grids (e.g., Nordics, France, Quebec) |
| Typical rich-country | 350–450 | OECD averages; mixed gas + renewables + some coal |
| Coal-heavy grid | 700–800 | India, South Africa, SE Asia regions; close to coal plant lifecycle values |
Fuel-Specific Lifecycle “Best Estimate” Factors
| Fuel / Technology | Best Estimate (gCO₂e/kWh) | Notes |
|---|---|---|
| 100% coal | ~900 | Conventional coal plant fleet (median lifecycle) |
| 100% natural gas (NGCC) | ~450 | Modern combined-cycle gas plants |
| Coal with CCS | ~250–350 | Depends heavily on configuration & capture rate |
| NGCC with CCS | ~130–200 | Lifecycle; assumes effective methane control |
| Wind / Solar / Nuclear | ≤50 | Representative lifecycle numbers for high-quality clean generation |
| Cloud Region | Approx. Grid EF (gCO₂e/kWh) | Location / Grid Basis |
|---|---|---|
| GCP us-central1 | ≈350 | Iowa, US (state grid mix) |
| GCP europe-west1 | ≈110 | Belgium (country average) |
| GCP us-east1 | ≈255 | South Carolina, US (state grid mix) |
| AWS us-west-2 | ≈135 | Oregon, US (state grid mix) |
| AWS us-east-1 | ≈270 | Northern Virginia, US (state grid mix) |
| AWS eu-central-1 | ≈420 | Frankfurt, Germany (national grid mix) |
model router
Simulate a model router that distributes traffic across multiple models. Adjust the traffic percentage for each model to calculate blended costs.
Blended Cost
- Daily:
- Annual: