LLM service cost scenarios

Based on crowdsourced cost data from LiteLLM for ()

carbon

Enter your assumed rate of energy consumption (kWh per million tokens) and grid emission factor (grams of CO₂ equivalent per kWh) to produce a back-of-the-envelope estimate of carbon emissions from model inference.

  • Daily kgCO₂e
  • Annual kgCO₂e

million tokens and kWh per year

kWh/Mtok references
Study / model kWh/Mtok Context Original metric Source
GPT-3 175B (GPT-3 paper) ~13 GPT-3 on 2020 infra 0.4 kWh per 100 pages (~30k tokens) GPT-3 paper
Husom et al. (older ChatGPT) ~9 GPT-3-era ChatGPT ~9 mWh/token Husom et al. summary
LLaMA-65B — Samsi et al. ~0.8–1.1 LLaMA-65B on V100/A100 ~3–4 J/token From Words to Watts
Local Llama-3 8B (Baquero) ~0.17 8B model on Apple M3 <200 J for ~333 tokens Baquero CACM blog
Llama-3.3-70B (Lin 2025) ~0.11 8×H100, FP8, batch 128 0.39 J/token llm-tracker entry
Generic 4×H100 node (Arthur) ~2.7–5.2 (best); ~10–20 (worst) 4×H100; batch size affects J/token 9.6–72 J/token Arthur writeup
GPT-4o (Epoch AI) ~0.375 0.3 Wh/query, 800-token assumption 0.3 Wh per typical GPT-4o query Epoch AI
ChatGPT average query (Altman) ~0.425 0.34 Wh/query, 800-token assumption 0.34 Wh per ChatGPT query Altman – Gentle Singularity
Gemini Apps – full stack ~0.30 0.24 Wh per prompt, 800-token assumption 0.24 Wh median text prompt Google Cloud blog
Gemini Apps – active chips only ~0.125 0.10 Wh per prompt, 800-token assumption 0.10 Wh median prompt Google Cloud blog
Mistral Large 2 (LCA) ~7–11 Lifecycle analysis; CO₂-based 1.14 gCO₂e per 400-token response Mistral LCA
DitchCarbon H100 example ~23 Single H100, no batching 0.008 kWh per ~350 tokens DitchCarbon blog
Older “3 Wh per ChatGPT response” estimate ~3.75 3 Wh/query, 800-token assumption 3 Wh per response kmaasrud summary
Grid Emission Factors (gCO₂e/kWh)

Location-based scope 2

Location-Based Grid Factors (Lifecycle-ish)

Grid Type Practical Estimate (gCO₂e/kWh) Notes
Very clean grid ~50 Hydro / nuclear / high-renewable grids (e.g., Nordics, France, Quebec)
Typical rich-country 350–450 OECD averages; mixed gas + renewables + some coal
Coal-heavy grid 700–800 India, South Africa, SE Asia regions; close to coal plant lifecycle values

Fuel-Specific Lifecycle “Best Estimate” Factors

Fuel / Technology Best Estimate (gCO₂e/kWh) Notes
100% coal ~900 Conventional coal plant fleet (median lifecycle)
100% natural gas (NGCC) ~450 Modern combined-cycle gas plants
Coal with CCS ~250–350 Depends heavily on configuration & capture rate
NGCC with CCS ~130–200 Lifecycle; assumes effective methane control
Wind / Solar / Nuclear ≤50 Representative lifecycle numbers for high-quality clean generation
Cloud Region Approx. Grid EF (gCO₂e/kWh) Location / Grid Basis
GCP us-central1 350 Iowa, US (state grid mix)
GCP europe-west1 110 Belgium (country average)
GCP us-east1 255 South Carolina, US (state grid mix)
AWS us-west-2 135 Oregon, US (state grid mix)
AWS us-east-1 270 Northern Virginia, US (state grid mix)
AWS eu-central-1 420 Frankfurt, Germany (national grid mix)
model router

Simulate a model router that distributes traffic across multiple models. Adjust the traffic percentage for each model to calculate blended costs.

Blended Cost

  • Daily:
  • Annual: