Kafka Schema Registry Skill
Scan a project to identify Kafka applications, extract schemas, generate Terraform for Schema Registry registration, and produce a comprehensive analysis report.
When to Use
Invoke this skill when:
- A user asks to analyze a project for Kafka usage in order to add event schemas or integrate Schema Registry
- A user wants to extract schemas from Kafka producers
- A user wants Terraform to register schemas to Schema Registry
- A user wants to audit Kafka producer/consumer configurations
Deliverables
This skill produces 3 outputs in the target project:
- — Full analysis report with findings, risks, and upgrade recommendations
- — Extracted schema files (Avro, JSON Schema, Protobuf) with PII tagging
- — Terraform configs using Confluent provider to register schemas
Optional: Code Migration Assistance
If the user asks for their application code to be updated to integrate Schema Registry, use the Code Migration Reference to update the code with proper Schema Registry integration patterns.
High-Level Workflow
Phase 0: Initialize
- Check for existing and directory manually
- Note any existing schema infrastructure in the report
Phase 1: Project Scan & Kafka Detection
- Find build files — Search for , , , , etc.
- Detect Kafka dependencies — Look for , , , etc.
- Find producers & consumers — Grep for , , , etc.
- Extract topic names — From string literals, config properties, YAML files
- Identify serializers — Find , , custom serializers
- Build app catalog — Compile findings: app name, language, role, topics, serializer, category
Detailed patterns: Detection Patterns Reference
App catalog structure:
yaml
app_name: module name
language: Java | Python | .NET | Go | Node/TS
role: producer | consumer | both
topics: [list of topics]
serializer_class: value.serializer used
custom_serializer: true | false
schema_format: AVRO | JSON | PROTOBUF | UNKNOWN
sr_integrated: true | false
category: A | B | C | D | E # REQUIRED
Multi-schema topic detection:
- If multiple data models produce to the same topic, create a wrapper schema with /union/
- Generate Terraform with blocks
- Flag prominently in report
Phase 2: Risk Detection
Search for:
auto.register.schemas=true
— Uncontrolled schema evolution (Category C)
- — Eases migration when set
- Custom serializers — Bypass SR entirely (Category E)
Record file path, line number, and affected topics for each occurrence.
Patterns: Detection Patterns Reference
Phase 3: Schema Inference
For each producer:
- Check for existing schema files — , ,
- Infer from data models — Java classes, Pydantic models, TypeScript interfaces, Go structs
- Infer from inline data — HashMap, dict literals, map[string]any, plain objects, JSON strings
- Convert to schemas — Map language types to JSON Schema / Avro / Protobuf
- Tag PII fields — Scan field names for , , , , etc.
PII tagging: Add
(
,
,
,
) to detected fields.
Detailed inference patterns: Schema Inference Reference
Phase 4: Categorize Producers
Classify each producer:
| Category | Criteria |
|---|
| A: Compliant | Confluent serializer + SR + no auto.register |
| A→Header | Already on SR, migrating to headers |
| B: Schema in code, no SR | Data models exist, but no SR integration |
| C: Auto-register | auto.register.schemas=true
|
| D: No schema | Raw strings/bytes, no data model |
| E: Custom serializer | Custom or inline serialization without SR |
CRITICAL: Use exact phrase "Category X" in:
- App catalog field
- Applications Discovered table
- Report section headers
- Terraform comments
- Risk sections
Details: Categorization Reference
Phase 5: Create Schema Files
Directory structure:
schemas/
├── avro/
│ └── {topic}-value.avsc
├── json/
│ └── {topic}-value.json
└── proto/
└── {topic}-value.proto
File naming: MUST use kebab-case (lowercase with hyphens):
- Value:
- Key:
- Examples: ,
user-notifications-value.json
Validate: Call
schema_lint(path: schemas/, fix: true)
if available.
Phase 6: Generate Terraform
File structure (MANDATORY separate files):
terraform/
├── providers.tf # Provider config
├── variables.tf # Variable definitions
├── tags.tf # confluent_tag resources (if PII exists)
├── schemas.tf # Active schemas (A, B, E)
├── flagged-auto-register.tf # Category C only (commented out)
├── outputs.tf # Output values
└── import.sh # Import script
CRITICAL:
- = Categories A, B, E — NOT commented out
- = Category C ONLY — MUST be commented out
- = MUST exist if ANY schema uses
- Each schema resource MUST have comment block: Topic, App, Source, Category
Templates: Terraform Templates Reference
Phase 7: Generate Report
- Executive Summary (metrics + category breakdown)
- Applications Discovered table (EXACT format, Category column MANDATORY)
- RISKS (auto-register, custom serializers)
- Producer Upgrade Recommendations (per app, with "Category X" in heading)
- Migration Rollout Ordering (by category)
- PII Fields Detected
- Terraform Resources Generated
- Next Steps checklist
CRITICAL formatting requirements:
- Applications Discovered = markdown table, NOT narrative sections
- Every app section MUST say "Category X" explicitly
- Terraform comment blocks required for every resource
Template: Report Template Reference
Migration Rollout by Category
- Category B (JSON, no SR): Producers first → consumers
- Category A→Header (already on SR): Verify consumer versions → producers only
- Category C (auto-register): Register via Terraform → disable auto-register → producers fetch latest
- Category E (custom serializers): Consumers first (composite deserializer) → producers
Details: Categorization Reference
Edge Cases
- Monorepos: Treat each service/module with Kafka deps as separate app
- Multi-topic producers: Generate one schema resource per topic
- Shared schemas: One schema file, multiple Terraform resources reference it
- No topic names: If loaded from env vars, use placeholders with TODO
- Test code: Skip test directories unless they contain only schema definitions
- Multiple serializers: Create separate schema files per format
Output Organization
{project_root}/
├── schema-report.md # Analysis report
├── schemas/
│ ├── schema.yaml # Schema project config
│ ├── avro/
│ │ └── {topic}-value.avsc
│ ├── json/
│ │ └── {topic}-value.json
│ └── proto/
│ └── {topic}-value.proto
└── terraform/
├── providers.tf
├── variables.tf
├── tags.tf # PII/PRIVATE/SENSITIVE tags
├── schemas.tf # Active schemas (depends_on tags)
├── flagged-auto-register.tf # Commented-out Category C
├── outputs.tf
└── import.sh # Import existing schemas
Reference Documentation
- Detection Patterns — Patterns for finding Kafka apps, dependencies, producers, consumers, serializers
- Schema Inference — Extract schemas from data models, inline data, PII tagging
- Categorization — Category definitions, rollout order, client version requirements
- Terraform Templates — File structure, templates, naming conventions
- Report Template — Required sections, formatting rules, validation checklist
- Code Migration — Serializer/deserializer implementation patterns for Python, Java, JavaScript, Go, and .NET
Execution Approach
- Use Glob to find build files and schema files
- Use Grep for pattern detection (dependencies, producers, serializers, risks)
- Use Read to inspect source files and data models
- Use Write to create schema files, Terraform configs, and report
No need to use Agent tool — this skill is self-contained and uses direct tool calls.