z-ai-api
Original:🇺🇸 English
Translated
Z.ai API integration for building applications with GLM models. Use when working with Z.ai/ZhipuAI APIs for: (1) Chat completions with GLM-4.7/4.6/4.5 models, (2) Vision/multimodal tasks with GLM-4.6V, (3) Image generation with GLM-Image or CogView-4, (4) Video generation with CogVideoX-3 or Vidu models, (5) Audio transcription with GLM-ASR-2512, (6) Function calling and tool use, (7) Web search integration, (8) Translation, slide/poster generation agents. Triggers: Z.ai, ZhipuAI, GLM, BigModel, Zhipu, CogVideoX, CogView, Vidu.
5installs
Added on
NPX Install
npx skill4agent add jrajasekera/claude-skills z-ai-apiTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Z.ai API Skill
Quick Reference
Base URL:
Coding Plan URL:
Auth:
https://api.z.ai/api/paas/v4https://api.z.ai/api/coding/paas/v4Authorization: Bearer YOUR_API_KEYCore Endpoints
| Endpoint | Purpose |
|---|---|
| Text/vision chat |
| Image generation |
| Video generation (async) |
| Speech-to-text |
| Web search |
| Poll async tasks |
| Translation, slides, effects |
Model Selection
Chat (pick by need):
- — Latest flagship, best quality, agentic coding
glm-4.7 - — Fast, high quality
glm-4.7-flash - — Reliable general use
glm-4.6 - — Fastest, lower cost
glm-4.5-flash
Vision:
- — Best multimodal (images, video, files)
glm-4.6v - — Fast vision
glm-4.6v-flash
Media:
- — High-quality images (HD, ~20s)
glm-image - — Fast images (~5-10s)
cogview-4-250304 - — Video, up to 4K, 5-10s
cogvideox-3 - — Vidu video generation
viduq1-text/image
Implementation Patterns
Basic Chat
python
from zai import ZaiClient
client = ZaiClient(api_key="YOUR_KEY")
response = client.chat.completions.create(
model="glm-4.7",
messages=[
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)OpenAI SDK Compatibility
python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_ZAI_KEY",
base_url="https://api.z.ai/api/paas/v4/"
)
# Use exactly like OpenAI SDKStreaming
python
response = client.chat.completions.create(
model="glm-4.7",
messages=[...],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")Function Calling
python
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}]
response = client.chat.completions.create(
model="glm-4.7",
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
tool_choice="auto"
)
# Handle tool_calls in response.choices[0].message.tool_callsVision (Images/Video/Files)
python
response = client.chat.completions.create(
model="glm-4.6v",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://..."}},
{"type": "text", "text": "Describe this image"}
]
}]
)Image Generation
python
response = client.images.generate(
model="glm-image",
prompt="A serene mountain at sunset",
size="1280x1280",
quality="hd"
)
print(response.data[0].url) # Expires in 30 daysVideo Generation (Async)
python
# Submit
response = client.videos.generate(
model="cogvideox-3",
prompt="A cat playing with yarn",
size="1920x1080",
duration=5
)
task_id = response.id
# Poll for result
import time
while True:
result = client.async_result.get(task_id)
if result.task_status == "SUCCESS":
print(result.video_result[0].url)
break
time.sleep(5)Web Search Integration
python
response = client.chat.completions.create(
model="glm-4.7",
messages=[{"role": "user", "content": "Latest AI news?"}],
tools=[{
"type": "web_search",
"web_search": {
"enable": True,
"search_result": True
}
}]
)
# Access response.web_search for sourcesThinking Mode (Chain-of-Thought)
python
response = client.chat.completions.create(
model="glm-4.7",
messages=[...],
thinking={"type": "enabled"},
stream=True # Recommended with thinking
)
# Access reasoning_content in responseKey Parameters
| Parameter | Values | Notes |
|---|---|---|
| 0.0-1.0 | GLM-4.7: 1.0, GLM-4.5: 0.6 default |
| 0.01-1.0 | Default ~0.95 |
| varies | GLM-4.7: 128K, GLM-4.5: 96K max |
| bool | Enable SSE streaming |
| | Force JSON output |
Error Handling
- 429: Rate limited — implement exponential backoff
- 401: Bad API key — verify credentials
- sensitive: Content filtered — modify input
python
if response.choices[0].finish_reason == "tool_calls":
# Execute function and continue conversation
elif response.choices[0].finish_reason == "length":
# Increase max_tokens or truncate
elif response.choices[0].finish_reason == "sensitive":
# Content was filteredReference Files
For detailed API specifications, consult:
- — Full chat API, parameters, models
references/chat-completions.md - — Function calling, web search, retrieval
references/tools-and-functions.md - — Image, video, audio APIs
references/media-generation.md - — Translation, slides, effects agents
references/agents.md - — Error handling, rate limits
references/error-codes.md