kafka-schema-registry

Scan a project to identify Kafka applications, extract schemas from data models, tag PII fields, generate Terraform for Confluent Schema Registry registration, and produce a migration report with rollout ordering. Use this skill when a user asks to analyze a folder or repo for Kafka usage, extract schemas, audit producer/consumer configurations, or generate Terraform for Schema Registry.

Provider: Confluent | the Data Streaming Platform Path in repo: skills/kafka-schema-registry/SKILL.md

Skill body

Kafka Schema Registry Skill

Scan a project to identify Kafka applications, extract schemas, generate Terraform for Schema Registry registration, and produce a comprehensive analysis report.

When to Use

Invoke this skill when:

Deliverables

This skill produces 3 outputs in the target project:

  1. schema-report.md — Full analysis report with findings, risks, and upgrade recommendations
  2. schemas/ — Extracted schema files (Avro, JSON Schema, Protobuf) with PII tagging
  3. terraform/ — Terraform configs using Confluent provider to register schemas

Optional: Code Migration Assistance

If the user asks for their application code to be updated to integrate Schema Registry, use the Code Migration Reference to update the code with proper Schema Registry integration patterns.


High-Level Workflow

Phase 0: Initialize

Phase 1: Project Scan & Kafka Detection

  1. Find build files — Search for pom.xml, build.gradle, requirements.txt, package.json, etc.
  2. Detect Kafka dependencies — Look for spring-kafka, confluent-kafka, kafkajs, etc.
  3. Find producers & consumers — Grep for KafkaTemplate, Producer(, producer.send, etc.
  4. Extract topic names — From string literals, config properties, YAML files
  5. Identify serializers — Find value.serializer, KafkaAvroSerializer, custom serializers
  6. Build app catalog — Compile findings: app name, language, role, topics, serializer, category

Detailed patterns: Detection Patterns Reference

App catalog structure:

app_name: module name
language: Java | Python | .NET | Go | Node/TS
role: producer | consumer | both
topics: [list of topics]
serializer_class: value.serializer used
custom_serializer: true | false
schema_format: AVRO | JSON | PROTOBUF | UNKNOWN
sr_integrated: true | false
category: A | B | C | D | E  # REQUIRED

Multi-schema topic detection:

Phase 2: Risk Detection

Search for:

Record file path, line number, and affected topics for each occurrence.

Patterns: Detection Patterns Reference

Phase 3: Schema Inference

For each producer:

  1. Check for existing schema files**/*.avsc, **/*.proto, **/*.schema.json
  2. Infer from data models — Java classes, Pydantic models, TypeScript interfaces, Go structs
  3. Infer from inline data — HashMap, dict literals, map[string]any, plain objects, JSON strings
  4. Convert to schemas — Map language types to JSON Schema / Avro / Protobuf
  5. Tag PII fields — Scan field names for email, ssn, phone, address, etc.

PII tagging: Add confluent:tags (PII, PRIVATE, SENSITIVE, PHI) to detected fields.

Detailed inference patterns: Schema Inference Reference

Phase 4: Categorize Producers

Classify each producer:

Category Criteria
A: Compliant Confluent serializer + SR + no auto.register
A→Header Already on SR, migrating to headers
B: Schema in code, no SR Data models exist, but no SR integration
C: Auto-register auto.register.schemas=true
D: No schema Raw strings/bytes, no data model
E: Custom serializer Custom Serializer<T> or inline serialization without SR

CRITICAL: Use exact phrase “Category X” in:

Details: Categorization Reference

Phase 5: Create Schema Files

Directory structure:

schemas/
├── avro/
│   └── {topic}-value.avsc
├── json/
│   └── {topic}-value.json
└── proto/
    └── {topic}-value.proto

File naming: MUST use kebab-case (lowercase with hyphens):

Initialize: Create schema.yaml.

Validate: Call schema_lint(path: schemas/, fix: true) if available.

Phase 6: Generate Terraform

File structure (MANDATORY separate files):

terraform/
├── providers.tf              # Provider config
├── variables.tf              # Variable definitions
├── tags.tf                   # confluent_tag resources (if PII exists)
├── schemas.tf                # Active schemas (A, B, E)
├── flagged-auto-register.tf  # Category C only (commented out)
├── outputs.tf                # Output values
└── import.sh                 # Import script

CRITICAL:

Templates: Terraform Templates Reference

Phase 7: Generate Report

Create schema-report.md with:

CRITICAL formatting requirements:

  1. Applications Discovered = markdown table, NOT narrative sections
  2. Every app section MUST say “Category X” explicitly
  3. Terraform comment blocks required for every resource

Template: Report Template Reference


Migration Rollout by Category

Details: Categorization Reference


Edge Cases


Output Organization

{project_root}/
├── schema-report.md              # Analysis report
├── schemas/
│   ├── schema.yaml               # Schema project config
│   ├── avro/
│   │   └── {topic}-value.avsc
│   ├── json/
│   │   └── {topic}-value.json
│   └── proto/
│       └── {topic}-value.proto
└── terraform/
    ├── providers.tf
    ├── variables.tf
    ├── tags.tf                    # PII/PRIVATE/SENSITIVE tags
    ├── schemas.tf                 # Active schemas (depends_on tags)
    ├── flagged-auto-register.tf   # Commented-out Category C
    ├── outputs.tf
    └── import.sh                  # Import existing schemas

Reference Documentation


Execution Approach

  1. Use Glob to find build files and schema files
  2. Use Grep for pattern detection (dependencies, producers, serializers, risks)
  3. Use Read to inspect source files and data models
  4. Use Write to create schema files, Terraform configs, and report

No need to use Agent tool — this skill is self-contained and uses direct tool calls.