awesome-free-llm-apis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAwesome Free LLM APIs
优质免费LLM API汇总
Skill by ara.so — Daily 2026 Skills collection.
A curated list of LLM providers offering permanent free tiers for text inference — no trial credits, no expiry. All endpoints listed are OpenAI SDK-compatible unless noted.
Provider Overview
服务商概览
Provider APIs (trained/fine-tuned by the company)
服务商自有API(由企业训练/微调)
| Provider | Notable Models | Rate Limits | Region |
|---|---|---|---|
| Cohere | Command A, Command R+, Aya Expanse 32B | 20 RPM, 1K req/mo | 🇺🇸 |
| Google Gemini | Gemini 2.5 Pro, Flash, Flash-Lite | 5–15 RPM, 100–1K RPD | 🇺🇸 (not EU/UK/CH) |
| Mistral AI | Mistral Large 3, Small 3.1, Ministral 8B | 1 req/s, 1B tok/mo | 🇪🇺 |
| Zhipu AI | GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash | Undocumented | 🇨🇳 |
| 服务商 | 重点模型 | 速率限制 | 区域 |
|---|---|---|---|
| Cohere | Command A, Command R+, Aya Expanse 32B | 20次/分钟,1000次/月 | 🇺🇸 |
| Google Gemini | Gemini 2.5 Pro, Flash, Flash-Lite | 5–15次/分钟,100–1000次/天 | 🇺🇸(欧盟/英国/瑞士不可用) |
| Mistral AI | Mistral Large 3, Small 3.1, Ministral 8B | 1次/秒,10亿 tokens/月 | 🇪🇺 |
| 智谱AI | GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash | 未公开 | 🇨🇳 |
Inference Providers (host open-weight models)
推理服务商(托管开源权重模型)
| Provider | Notable Models | Rate Limits | Region |
|---|---|---|---|
| Cerebras | Llama 3.3 70B, Qwen3 235B, GPT-OSS-120B | 30 RPM, 14,400 RPD | 🇺🇸 |
| Cloudflare Workers AI | Llama 3.3 70B, Qwen QwQ 32B | 10K neurons/day | 🇺🇸 |
| GitHub Models | GPT-4o, Llama 3.3 70B, DeepSeek-R1 | 10–15 RPM, 50–150 RPD | 🇺🇸 |
| Groq | Llama 3.3 70B, Llama 4 Scout, Kimi K2 | 30 RPM, 1K RPD | 🇺🇸 |
| Hugging Face | Llama 3.3 70B, Qwen2.5 72B, Mistral 7B | $0.10/mo free credits | 🇺🇸 |
| Kluster AI | DeepSeek-R1, Llama 4 Maverick, Qwen3-235B | Undocumented | 🇺🇸 |
| LLM7.io | DeepSeek R1, Flash-Lite, Qwen2.5 Coder | 30 RPM (120 with token) | 🇬🇧 |
| NVIDIA NIM | Llama 3.3 70B, Mistral Large, Qwen3 235B | 40 RPM | 🇺🇸 |
| Ollama Cloud | DeepSeek-V3.2, Qwen3.5, Kimi-K2.5 | 1 concurrent, light usage | 🇺🇸 |
| OpenRouter | DeepSeek R1, Llama 3.3 70B, GPT-OSS-120B | 20 RPM, 50 RPD (1K with $10+) | 🇺🇸 |
| 服务商 | 重点模型 | 速率限制 | 区域 |
|---|---|---|---|
| Cerebras | Llama 3.3 70B, Qwen3 235B, GPT-OSS-120B | 30次/分钟,14400次/天 | 🇺🇸 |
| Cloudflare Workers AI | Llama 3.3 70B, Qwen QwQ 32B | 10000神经元/天 | 🇺🇸 |
| GitHub Models | GPT-4o, Llama 3.3 70B, DeepSeek-R1 | 10–15次/分钟,50–150次/天 | 🇺🇸 |
| Groq | Llama 3.3 70B, Llama 4 Scout, Kimi K2 | 30次/分钟,1000次/天 | 🇺🇸 |
| Hugging Face | Llama 3.3 70B, Qwen2.5 72B, Mistral 7B | 每月0.1美元免费额度 | 🇺🇸 |
| Kluster AI | DeepSeek-R1, Llama 4 Maverick, Qwen3-235B | 未公开 | 🇺🇸 |
| LLM7.io | DeepSeek R1, Flash-Lite, Qwen2.5 Coder | 30次/分钟(使用令牌可达120次/分钟) | 🇬🇧 |
| NVIDIA NIM | Llama 3.3 70B, Mistral Large, Qwen3 235B | 40次/分钟 | 🇺🇸 |
| Ollama Cloud | DeepSeek-V3.2, Qwen3.5, Kimi-K2.5 | 1个并发连接,轻量使用 | 🇺🇸 |
| OpenRouter | DeepSeek R1, Llama 3.3 70B, GPT-OSS-120B | 20次/分钟,50次/天(消费满10美元后可达1000次/天) | 🇺🇸 |
Getting API Keys
获取API密钥
Each provider has its own key management page:
bash
undefined每个服务商都有自己的密钥管理页面:
bash
undefinedStore keys as environment variables — never hardcode them
将密钥存储为环境变量 — 切勿硬编码
export GROQ_API_KEY="your_groq_key"
export GEMINI_API_KEY="your_gemini_key"
export OPENROUTER_API_KEY="your_openrouter_key"
export MISTRAL_API_KEY="your_mistral_key"
export COHERE_API_KEY="your_cohere_key"
export CEREBRAS_API_KEY="your_cerebras_key"
export GITHUB_TOKEN="your_github_pat"
export HF_TOKEN="your_huggingface_token"
export NVIDIA_API_KEY="your_nvidia_key"
export CLOUDFLARE_API_TOKEN="your_cf_token"
export CLOUDFLARE_ACCOUNT_ID="your_cf_account_id"
---export GROQ_API_KEY="your_groq_key"
export GEMINI_API_KEY="your_gemini_key"
export OPENROUTER_API_KEY="your_openrouter_key"
export MISTRAL_API_KEY="your_mistral_key"
export COHERE_API_KEY="your_cohere_key"
export CEREBRAS_API_KEY="your_cerebras_key"
export GITHUB_TOKEN="your_github_pat"
export HF_TOKEN="your_huggingface_token"
export NVIDIA_API_KEY="your_nvidia_key"
export CLOUDFLARE_API_TOKEN="your_cf_token"
export CLOUDFLARE_ACCOUNT_ID="your_cf_account_id"
---OpenAI SDK Integration
OpenAI SDK集成
All providers (except Ollama Cloud) are OpenAI SDK-compatible — just swap the and .
base_urlapi_key所有服务商(除Ollama Cloud外)均兼容OpenAI SDK — 只需替换和即可。
base_urlapi_keyPython
Python
python
from openai import OpenAIpython
from openai import OpenAI── Groq ──────────────────────────────────────────────────────────────────────
── Groq ──────────────────────────────────────────────────────────────────────
client = OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key=os.environ["GROQ_API_KEY"],
)
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
client = OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key=os.environ["GROQ_API_KEY"],
)
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
── Google Gemini ─────────────────────────────────────────────────────────────
── Google Gemini ─────────────────────────────────────────────────────────────
client = OpenAI(
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
api_key=os.environ["GEMINI_API_KEY"],
)
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Explain quantum entanglement."}],
)
client = OpenAI(
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
api_key=os.environ["GEMINI_API_KEY"],
)
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Explain quantum entanglement."}],
)
── Mistral AI ────────────────────────────────────────────────────────────────
── Mistral AI ────────────────────────────────────────────────────────────────
client = OpenAI(
base_url="https://api.mistral.ai/v1",
api_key=os.environ["MISTRAL_API_KEY"],
)
response = client.chat.completions.create(
model="mistral-small-latest",
messages=[{"role": "user", "content": "Write a haiku about code."}],
)
client = OpenAI(
base_url="https://api.mistral.ai/v1",
api_key=os.environ["MISTRAL_API_KEY"],
)
response = client.chat.completions.create(
model="mistral-small-latest",
messages=[{"role": "user", "content": "Write a haiku about code."}],
)
── OpenRouter ────────────────────────────────────────────────────────────────
── OpenRouter ────────────────────────────────────────────────────────────────
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)
response = client.chat.completions.create(
model="deepseek/deepseek-r1", # free model on OpenRouter
messages=[{"role": "user", "content": "What is 2+2?"}],
extra_headers={
"HTTP-Referer": "https://yourapp.com", # optional but recommended
"X-Title": "My App",
},
)
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)
response = client.chat.completions.create(
model="deepseek/deepseek-r1", # OpenRouter上的免费模型
messages=[{"role": "user", "content": "What is 2+2?"}],
extra_headers={
"HTTP-Referer": "https://yourapp.com", # 可选但推荐
"X-Title": "My App",
},
)
── Cerebras ──────────────────────────────────────────────────────────────────
── Cerebras ──────────────────────────────────────────────────────────────────
client = OpenAI(
base_url="https://api.cerebras.ai/v1",
api_key=os.environ["CEREBRAS_API_KEY"],
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Tell me a joke."}],
)
client = OpenAI(
base_url="https://api.cerebras.ai/v1",
api_key=os.environ["CEREBRAS_API_KEY"],
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[{"role": "user", "content": "Tell me a joke."}],
)
── NVIDIA NIM ────────────────────────────────────────────────────────────────
── NVIDIA NIM ────────────────────────────────────────────────────────────────
client = OpenAI(
base_url="https://integrate.api.nvidia.com/v1",
api_key=os.environ["NVIDIA_API_KEY"],
)
response = client.chat.completions.create(
model="meta/llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Summarize this text."}],
)
client = OpenAI(
base_url="https://integrate.api.nvidia.com/v1",
api_key=os.environ["NVIDIA_API_KEY"],
)
response = client.chat.completions.create(
model="meta/llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Summarize this text."}],
)
── GitHub Models ─────────────────────────────────────────────────────────────
── GitHub Models ─────────────────────────────────────────────────────────────
client = OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"],
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Draft an email."}],
)
client = OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"],
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Draft an email."}],
)
── Cohere (OpenAI-compatible endpoint) ───────────────────────────────────────
── Cohere(OpenAI兼容端点)────────────────────────────────────────────────────
client = OpenAI(
base_url="https://api.cohere.com/compatibility/v1",
api_key=os.environ["COHERE_API_KEY"],
)
response = client.chat.completions.create(
model="command-a-03-2025",
messages=[{"role": "user", "content": "Translate to French: Hello world"}],
)
undefinedclient = OpenAI(
base_url="https://api.cohere.com/compatibility/v1",
api_key=os.environ["COHERE_API_KEY"],
)
response = client.chat.completions.create(
model="command-a-03-2025",
messages=[{"role": "user", "content": "Translate to French: Hello world"}],
)
undefinedJavaScript / TypeScript
JavaScript / TypeScript
typescript
import OpenAI from "openai";
// ── Groq ──────────────────────────────────────────────────────────────────────
const groq = new OpenAI({
baseURL: "https://api.groq.com/openai/v1",
apiKey: process.env.GROQ_API_KEY,
});
const completion = await groq.chat.completions.create({
model: "llama-3.3-70b-versatile",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);
// ── OpenRouter with free model router ────────────────────────────────────────
const openrouter = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: {
"HTTP-Referer": "https://yourapp.com",
"X-Title": "My App",
},
});
// Use the free models router — automatically picks an available free model
const freeCompletion = await openrouter.chat.completions.create({
model: "openrouter/free",
messages: [{ role: "user", content: "What is the capital of France?" }],
});
// ── Mistral ───────────────────────────────────────────────────────────────────
const mistral = new OpenAI({
baseURL: "https://api.mistral.ai/v1",
apiKey: process.env.MISTRAL_API_KEY,
});
const mistralCompletion = await mistral.chat.completions.create({
model: "mistral-small-latest",
messages: [{ role: "user", content: "Explain async/await in JavaScript." }],
});typescript
import OpenAI from "openai";
// ── Groq ──────────────────────────────────────────────────────────────────────
const groq = new OpenAI({
baseURL: "https://api.groq.com/openai/v1",
apiKey: process.env.GROQ_API_KEY,
});
const completion = await groq.chat.completions.create({
model: "llama-3.3-70b-versatile",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);
// ── OpenRouter搭配免费模型路由───────────────────────────────────────────────────
const openrouter = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: {
"HTTP-Referer": "https://yourapp.com",
"X-Title": "My App",
},
});
// 使用免费模型路由 — 自动选择可用的免费模型
const freeCompletion = await openrouter.chat.completions.create({
model: "openrouter/free",
messages: [{ role: "user", content: "What is the capital of France?" }],
});
// ── Mistral ───────────────────────────────────────────────────────────────────
const mistral = new OpenAI({
baseURL: "https://api.mistral.ai/v1",
apiKey: process.env.MISTRAL_API_KEY,
});
const mistralCompletion = await mistral.chat.completions.create({
model: "mistral-small-latest",
messages: [{ role: "user", content: "Explain async/await in JavaScript." }],
});Cloudflare Workers AI
Cloudflare Workers AI
Cloudflare uses a slightly different auth pattern:
python
import requests, os
ACCOUNT_ID = os.environ["CLOUDFLARE_ACCOUNT_ID"]
API_TOKEN = os.environ["CLOUDFLARE_API_TOKEN"]
response = requests.post(
f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/"
"@cf/meta/llama-3.3-70b-instruct-fp8-fast",
headers={"Authorization": f"Bearer {API_TOKEN}"},
json={"messages": [{"role": "user", "content": "What is Cloudflare Workers?"}]},
)
result = response.json()
print(result["result"]["response"])typescript
// Cloudflare Workers runtime (inside a Worker)
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const ai = new Ai(env.AI);
const response = await ai.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", {
messages: [{ role: "user", content: "Hello from Workers AI!" }],
});
return Response.json(response);
},
};Cloudflare采用略有不同的认证模式:
python
import requests, os
ACCOUNT_ID = os.environ["CLOUDFLARE_ACCOUNT_ID"]
API_TOKEN = os.environ["CLOUDFLARE_API_TOKEN"]
response = requests.post(
f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/"
"@cf/meta/llama-3.3-70b-instruct-fp8-fast",
headers={"Authorization": f"Bearer {API_TOKEN}"},
json={"messages": [{"role": "user", "content": "What is Cloudflare Workers?"}]},
)
result = response.json()
print(result["result"]["response"])typescript
// Cloudflare Workers运行时(在Worker内部)
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const ai = new Ai(env.AI);
const response = await ai.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", {
messages: [{ role: "user", content: "Hello from Workers AI!" }],
});
return Response.json(response);
},
};Ollama Cloud (Non-OpenAI API)
Ollama Cloud(非OpenAI API)
Ollama Cloud uses the Ollama API format, not the OpenAI format:
python
import requests, os
response = requests.post(
"https://ollama.com/api/chat",
headers={"Authorization": f"Bearer {os.environ['OLLAMA_API_KEY']}"},
json={
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "What is 2 + 2?"}],
"stream": False,
},
)
print(response.json()["message"]["content"])python
undefinedOllama Cloud采用Ollama API格式,而非OpenAI格式:
python
import requests, os
response = requests.post(
"https://ollama.com/api/chat",
headers={"Authorization": f"Bearer {os.environ['OLLAMA_API_KEY']}"},
json={
"model": "deepseek-v3.2",
"messages": [{"role": "user", "content": "What is 2 + 2?"}],
"stream": False,
},
)
print(response.json()["message"]["content"])python
undefinedUsing the ollama Python client
使用ollama Python客户端
import ollama, os
client = ollama.Client(
host="https://ollama.com",
headers={"Authorization": f"Bearer {os.environ['OLLAMA_API_KEY']}"},
)
response = client.chat(
model="qwen3.5",
messages=[{"role": "user", "content": "Write a poem about the sea."}],
)
print(response["message"]["content"])
---import ollama, os
client = ollama.Client(
host="https://ollama.com",
headers={"Authorization": f"Bearer {os.environ['OLLAMA_API_KEY']}"},
)
response = client.chat(
model="qwen3.5",
messages=[{"role": "user", "content": "Write a poem about the sea."}],
)
print(response["message"]["content"])
---Hugging Face Inference API
Hugging Face推理API
python
from openai import OpenAI
import os
client = OpenAI(
base_url="https://router.huggingface.co/novita/v3/openai",
api_key=os.environ["HF_TOKEN"],
)
response = client.chat.completions.create(
model="meta-llama/llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Summarize the theory of relativity."}],
max_tokens=512,
)
print(response.choices[0].message.content)python
from openai import OpenAI
import os
client = OpenAI(
base_url="https://router.huggingface.co/novita/v3/openai",
api_key=os.environ["HF_TOKEN"],
)
response = client.chat.completions.create(
model="meta-llama/llama-3.3-70b-instruct",
messages=[{"role": "user", "content": "Summarize the theory of relativity."}],
max_tokens=512,
)
print(response.choices[0].message.content)Streaming Responses
流式响应
python
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key=os.environ["GROQ_API_KEY"],
)
with client.chat.completions.stream(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Write a short story about a robot."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)typescript
const stream = await groq.chat.completions.create({
model: "llama-3.3-70b-versatile",
messages: [{ role: "user", content: "Write a haiku." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}python
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key=os.environ["GROQ_API_KEY"],
)
with client.chat.completions.stream(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Write a short story about a robot."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)typescript
const stream = await groq.chat.completions.create({
model: "llama-3.3-70b-versatile",
messages: [{ role: "user", content: "Write a haiku." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}Provider Fallback Pattern
服务商降级策略
Cycle through providers when rate limits are hit:
python
from openai import OpenAI, RateLimitError
import os
PROVIDERS = [
{
"name": "Groq",
"base_url": "https://api.groq.com/openai/v1",
"api_key": os.environ.get("GROQ_API_KEY"),
"model": "llama-3.3-70b-versatile",
},
{
"name": "Cerebras",
"base_url": "https://api.cerebras.ai/v1",
"api_key": os.environ.get("CEREBRAS_API_KEY"),
"model": "llama-3.3-70b",
},
{
"name": "Mistral",
"base_url": "https://api.mistral.ai/v1",
"api_key": os.environ.get("MISTRAL_API_KEY"),
"model": "mistral-small-latest",
},
{
"name": "OpenRouter",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.environ.get("OPENROUTER_API_KEY"),
"model": "openrouter/free",
},
]
def chat_with_fallback(messages: list[dict], **kwargs) -> str:
for provider in PROVIDERS:
if not provider["api_key"]:
continue
try:
client = OpenAI(
base_url=provider["base_url"],
api_key=provider["api_key"],
)
response = client.chat.completions.create(
model=provider["model"],
messages=messages,
**kwargs,
)
return response.choices[0].message.content
except RateLimitError:
print(f"Rate limited on {provider['name']}, trying next...")
continue
except Exception as e:
print(f"Error on {provider['name']}: {e}, trying next...")
continue
raise RuntimeError("All providers exhausted.")当触发速率限制时,循环切换服务商:
python
from openai import OpenAI, RateLimitError
import os
PROVIDERS = [
{
"name": "Groq",
"base_url": "https://api.groq.com/openai/v1",
"api_key": os.environ.get("GROQ_API_KEY"),
"model": "llama-3.3-70b-versatile",
},
{
"name": "Cerebras",
"base_url": "https://api.cerebras.ai/v1",
"api_key": os.environ.get("CEREBRAS_API_KEY"),
"model": "llama-3.3-70b",
},
{
"name": "Mistral",
"base_url": "https://api.mistral.ai/v1",
"api_key": os.environ.get("MISTRAL_API_KEY"),
"model": "mistral-small-latest",
},
{
"name": "OpenRouter",
"base_url": "https://openrouter.ai/api/v1",
"api_key": os.environ.get("OPENROUTER_API_KEY"),
"model": "openrouter/free",
},
]
def chat_with_fallback(messages: list[dict], **kwargs) -> str:
for provider in PROVIDERS:
if not provider["api_key"]:
continue
try:
client = OpenAI(
base_url=provider["base_url"],
api_key=provider["api_key"],
)
response = client.chat.completions.create(
model=provider["model"],
messages=messages,
**kwargs,
)
return response.choices[0].message.content
except RateLimitError:
print(f"{provider['name']}触发速率限制,尝试下一个服务商...")
continue
except Exception as e:
print(f"{provider['name']}出现错误: {e},尝试下一个服务商...")
continue
raise RuntimeError("所有服务商均已尝试完毕。")Usage
使用示例
answer = chat_with_fallback(
messages=[{"role": "user", "content": "What is the speed of light?"}]
)
print(answer)
---answer = chat_with_fallback(
messages=[{"role": "user", "content": "What is the speed of light?"}]
)
print(answer)
---OpenRouter Free Models Router
OpenRouter免费模型路由
OpenRouter provides a special router that automatically selects available free models:
python
from openai import OpenAI
import os
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)OpenRouter提供特殊路由,可自动选择可用的免费模型:
python
from openai import OpenAI
import os
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=os.environ["OPENROUTER_API_KEY"],
)Use the free router — picks from 29+ free models automatically
使用免费路由 — 自动从29+个免费模型中选择
response = client.chat.completions.create(
model="openrouter/free",
messages=[{"role": "user", "content": "Explain recursion."}],
)
response = client.chat.completions.create(
model="openrouter/free",
messages=[{"role": "user", "content": "Explain recursion."}],
)
Or use model fallbacks for priority ordering
或使用模型降级策略进行优先级排序
response = client.chat.completions.create(
model="deepseek/deepseek-r1",
messages=[{"role": "user", "content": "Explain recursion."}],
extra_body={
"route": "fallback",
"models": [
"deepseek/deepseek-r1",
"meta-llama/llama-3.3-70b-instruct:free",
"openrouter/free",
],
},
)
---response = client.chat.completions.create(
model="deepseek/deepseek-r1",
messages=[{"role": "user", "content": "Explain recursion."}],
extra_body={
"route": "fallback",
"models": [
"deepseek/deepseek-r1",
"meta-llama/llama-3.3-70b-instruct:free",
"openrouter/free",
],
},
)
---LangChain Integration
LangChain集成
python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import ospython
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import osWorks with any OpenAI-compatible provider
适用于所有OpenAI兼容服务商
llm = ChatOpenAI(
model="llama-3.3-70b-versatile",
openai_api_base="https://api.groq.com/openai/v1",
openai_api_key=os.environ["GROQ_API_KEY"],
temperature=0.7,
)
response = llm.invoke([HumanMessage(content="What are the SOLID principles?")])
print(response.content)
llm = ChatOpenAI(
model="llama-3.3-70b-versatile",
openai_api_base="https://api.groq.com/openai/v1",
openai_api_key=os.environ["GROQ_API_KEY"],
temperature=0.7,
)
response = llm.invoke([HumanMessage(content="What are the SOLID principles?")])
print(response.content)
Gemini via LangChain
通过LangChain调用Gemini
gemini = ChatOpenAI(
model="gemini-2.0-flash",
openai_api_base="https://generativelanguage.googleapis.com/v1beta/openai/",
openai_api_key=os.environ["GEMINI_API_KEY"],
)
---gemini = ChatOpenAI(
model="gemini-2.0-flash",
openai_api_base="https://generativelanguage.googleapis.com/v1beta/openai/",
openai_api_key=os.environ["GEMINI_API_KEY"],
)
---Rate Limit Reference
速率限制参考
| Provider | RPM | RPD | Notes |
|---|---|---|---|
| Groq | 30 | 1,000 | 14,400 RPD for Llama 3.1 8B only |
| Cerebras | 30 | 14,400 | — |
| Gemini Flash | 15 | 1,500 | Not in EU/UK/CH |
| Gemini 2.5 Pro | 5 | 25 | Not in EU/UK/CH |
| GitHub Models | 10–15 | 50–150 | Varies by model tier |
| OpenRouter (free) | 20 | 50 | 1K RPD after $10+ purchase |
| Mistral | 1 req/s | — | 1B tokens/month cap |
| NVIDIA NIM | 40 | — | — |
| Cloudflare Workers AI | — | — | 10K neurons/day |
| Cohere | 20 | — | 1K requests/month |
| 服务商 | 次/分钟 | 次/天 | 说明 |
|---|---|---|---|
| Groq | 30 | 1000 | 仅Llama 3.1 8B支持14400次/天 |
| Cerebras | 30 | 14400 | — |
| Gemini Flash | 15 | 1500 | 欧盟/英国/瑞士不可用 |
| Gemini 2.5 Pro | 5 | 25 | 欧盟/英国/瑞士不可用 |
| GitHub Models | 10–15 | 50–150 | 因模型层级而异 |
| OpenRouter(免费) | 20 | 50 | 消费满10美元后永久解锁1000次/天 |
| Mistral | 1次/秒 | — | 每月10亿 tokens 上限 |
| NVIDIA NIM | 40 | — | — |
| Cloudflare Workers AI | — | — | 10000神经元/天 |
| Cohere | 20 | — | 每月1000次请求 |
Common Troubleshooting
常见问题排查
AuthenticationError- Double-check the env var is set:
echo $GROQ_API_KEY - Ensure the key is for the correct provider
- Some providers (GitHub Models) require a classic PAT, not a fine-grained token
RateLimitError- Implement exponential backoff or use the fallback pattern above
- Switch to a provider with higher limits (Cerebras: 14,400 RPD)
- For Groq, use for the 14,400 RPD limit
llama-3.1-8b-instant
Model not found- Check the exact model ID on the provider's docs/dashboard
- OpenRouter free models have suffix:
:freemeta-llama/llama-3.3-70b-instruct:free - Cloudflare models use prefix:
@cf/@cf/meta/llama-3.3-70b-instruct-fp8-fast
Gemini free tier unavailable
- The free tier is not available in EU, UK, or Switzerland
- Use a VPN or switch to a different provider like Groq or Mistral
Ollama Cloud not working with OpenAI SDK
- Ollama Cloud uses its own API format — use the Python package or raw HTTP
ollama
OpenRouter 50 RPD limit
- Make a one-time $10 credit purchase to unlock 1,000 RPD for free models permanently
- Alternatively, use router to distribute across all free models
openrouter/free
AuthenticationError- 再次检查环境变量是否设置:
echo $GROQ_API_KEY - 确保密钥对应正确的服务商
- 部分服务商(如GitHub Models)需要经典PAT,而非细粒度令牌
RateLimitError- 实现指数退避或使用上述降级策略
- 切换到速率限制更高的服务商(如Cerebras:14400次/天)
- 对于Groq,使用可获得14400次/天的限制
llama-3.1-8b-instant
Model not found- 在服务商文档/控制台中核对准确的模型ID
- OpenRouter免费模型带有后缀:
:freemeta-llama/llama-3.3-70b-instruct:free - Cloudflare模型带有前缀:
@cf/@cf/meta/llama-3.3-70b-instruct-fp8-fast
Gemini免费层不可用
- 免费层在欧盟、英国或瑞士不可用
- 使用VPN或切换到其他服务商如Groq或Mistral
Ollama Cloud无法兼容OpenAI SDK
- Ollama Cloud采用自有API格式 — 使用Python包或原生HTTP请求
ollama
OpenRouter 50次/天限制
- 一次性充值10美元额度,即可永久解锁免费模型1000次/天的限制
- 或使用路由,将请求分散到所有免费模型
openrouter/free
Choosing the Right Provider
选择合适的服务商
Need highest RPD? → Cerebras (14,400 RPD)
Need smartest free model? → Gemini 2.5 Pro (if not in EU/UK/CH)
Need EU-hosted? → Mistral AI (France)
Need most model variety? → OpenRouter (29+ free models) or Cloudflare (48+ models)
Need fastest inference? → Groq (purpose-built inference chips)
Need reasoning model? → DeepSeek-R1 on Groq/OpenRouter/Kluster AI
Need vision? → Gemini Flash, Llama 4 Scout (Groq), GLM-4.6V-Flash (Zhipu)
No rate limit concern? → Cloudflare (10K neurons/day, compute-based)需要最高日请求量? → Cerebras(14400次/天)
需要最智能的免费模型? → Gemini 2.5 Pro(若不在欧盟/英国/瑞士)
需要欧盟托管? → Mistral AI(法国)
需要最多模型选择? → OpenRouter(29+个免费模型)或Cloudflare(48+个模型)
需要最快推理速度? → Groq(专用推理芯片)
需要推理型模型? → Groq/OpenRouter/Kluster AI上的DeepSeek-R1
需要视觉能力? → Gemini Flash、Groq上的Llama 4 Scout、智谱的GLM-4.6V-Flash
无需担心速率限制? → Cloudflare(10000神经元/天,基于计算量)