Go to Sign up
Note: Your files never leave your device. We don't upload, transfer, or store your data.
|
|
|
|
|---|---|---|
|
|
|
Apache Avro is a row-oriented remote procedure call and data serialization framework developed within the Apache Hadoop project. It uses JSON-based schemas to define data structures and compacts binary encoding for efficient storage and transmission. Avro is the default serialization format for Apache Kafka (via Confluent), and is widely used in Apache Spark, Apache Flink, and AWS services.
The CSV to Avro Schema Converter on A.Tools reads your CSV file, analyzes the data in each column, and generates a complete Avro schema in JSON format with automatically inferred types.
All processing runs locally in your browser. No data leaves your device.
The tool examines the actual values in each CSV column and maps them to Avro primitive types:
| Data Pattern | Avro Type | Example Values |
|---|---|---|
| Text, mixed characters | string | "Alice", "NYC" |
| Whole numbers (small) | int | 30, -5, 145 |
| Whole numbers (large) | long | 9223372036854775807 |
| Decimal numbers | float / double | 3.14, 99.99 |
| Boolean | boolean | true, false |
| Empty cells | ["null", "type"] (union) | (empty) |
When a column contains empty cells, the tool generates an Avro union type to allow null values:
{
"name": "age",
"type": ["null", "int"]
}
This is essential for real-world data where some fields may be missing.
The output follows the standard Avro schema specification:
{
"type": "record",
"name": "MyRecord",
"namespace": "com.example",
"fields": [
{"name": "id", "type": "int"},
{"name": "name", "type": "string"},
{"name": "price", "type": "double"}
]
}
Edit your data in-browser before converting:
Undo / Redo — Full edit history.
Add / Delete Rows & Columns — Expand or trim the table.
Transpose — Swap rows and columns.
Delete Empty — Remove empty rows and columns.
Deduplicate — Remove duplicate rows.
ABC / abc / Abc — Batch case conversion.
Find & Replace — With regex support.
First Row as Header — Column headers become Avro field names.
All processing runs client-side via the browser File API. Files are never uploaded, transmitted, or stored. Safe for enterprise data models, proprietary schemas, and production field definitions.
Upload a .csv or .tsv file by dragging it onto the upload area, or click to browse. Alternatively, click Enter Data to type or paste data directly.
Use the toolbar to refine your data:
Add, insert, or delete rows and columns.
Transpose the table.
Remove empty rows/columns or duplicate rows.
Change text case.
Find and replace values (supports regex).
Toggle First Row as Header to define field names.
Click Convert. The tool analyzes each column's data values and generates an Avro schema with inferred types. The JSON schema appears in the Output Data panel.
Click Copy to Clipboard and use the schema with:
Confluent Schema Registry — Register the schema for Kafka topics.
Apache Spark — Define the schema for spark.read.format("avro").
Apache Kafka Producers/Consumers — Embed in producer/consumer config.
AWS Glue / Kinesis Data Analytics — Use as table schema definitions.
Input CSV:
event_id,event_type,user_id,amount,timestamp,processed
1001,purchase,U-501,49.99,2026-05-07T10:30:00Z,true
1002,refund,U-502,,2026-05-07T11:15:00Z,false
1003,purchase,U-503,125.00,2026-05-07T12:00:00Z,true
Output Avro Schema:
{
"type": "record",
"name": "CsvRecord",
"fields": [
{"name": "event_id", "type": "int"},
{"name": "event_type", "type": "string"},
{"name": "user_id", "type": "string"},
{"name": "amount", "type": ["null", "double"]},
{"name": "timestamp", "type": "string"},
{"name": "processed", "type": "boolean"}
]
}
Note: amount is a union ["null", "double"] because row 2 has an empty value.
Input CSV:
id,name,category,price,in_stock,rating1,Widget A,Hardware,12.99,145,4.52,Widget B,Hardware,8.50,0,3.83,Gadget X,Electronics,45.00,23,4.9Output Avro Schema:
{
"type": "record",
"name": "CsvRecord",
"fields": [
{"name": "id", "type": "int"},
{"name": "name", "type": "string"},
{"name": "category", "type": "string"},
{"name": "price", "type": "double"},
{"name": "in_stock", "type": "int"},
{"name": "rating", "type": "double"}
]
}
Input CSV:
sensor_id,temperature,humidity,active,reading_timeS-001,22.5,65.0,true,2026-05-07T08:00:00ZS-002,,78.3,true,2026-05-07T08:00:01ZS-003,19.0,,false,Output Avro Schema:
{
"type": "record",
"name": "CsvRecord",
"fields": [
{"name": "sensor_id", "type": "string"},
{"name": "temperature", "type": ["null", "double"]},
{"name": "humidity", "type": ["null", "double"]},
{"name": "active", "type": ["null", "boolean"]},
{"name": "reading_time", "type": ["null", "string"]}
]
}
Multiple columns have empty values, so most fields use union types.
Apache Avro is a data serialization system that provides:
Rich data structures — Records, enums, arrays, maps, unions.
Compact binary format — Smaller than JSON or XML.
Schema-based — Every data file includes its schema.
Schema evolution — Add/remove fields without breaking consumers.
Language-agnostic — Bindings for Java, Python, C, C++, C#, Go, Ruby, etc.
Avro is defined by the Apache Avro Specification.
An Avro schema is a JSON document with this structure:
{
"type": "record",
"name": "RecordName",
"namespace": "com.example.namespace",
"doc": "Description of this record",
"fields": [
{"name": "fieldName", "type": "string", "doc": "Field description"}
]
}
Key elements:
type: "record" — A record is Avro's equivalent of a struct or class.
name — The record type name.
namespace — Java-style package name for uniqueness.
fields — Array of field definitions, each with name and type.
| Type | Description | Size |
|---|---|---|
null | No value | 0 bytes |
boolean | True or false | 1 byte |
int | 32-bit signed integer | variable (zigzag) |
long | 64-bit signed integer | variable (zigzag) |
float | IEEE 754 single precision | 4 bytes |
double | IEEE 754 double precision | 8 bytes |
bytes | Sequence of 8-bit bytes | variable |
string | Unicode character sequence | variable |
| Aspect | Avro | JSON | Protobuf |
|---|---|---|---|
| Schema format | JSON | None (self-describing) | .proto (IDL) |
| Encoding | Binary | Text | Binary |
| Schema evolution | Full (add/remove/alias) | N/A | Partial |
| Type safety | Strong | Weak | Strong |
| Used by | Kafka, Hadoop, Spark | REST APIs, web | gRPC, Google services |
| Field ordering | Must match writer schema | N/A | By field number |
In Confluent Platform and Kafka:
Schema Registry stores Avro schemas with versioning.
Producers serialize data using a specific schema ID.
Consumers deserialize using the same schema or a compatible evolved version.
The generated schema can be registered directly via the Schema Registry REST API:
POST /subjects/my-topic-value/versions{ "schema": "<generated schema JSON>" }No. All file processing happens entirely in your browser using JavaScript. Your CSV data is never uploaded, transferred, or stored on any server.
Apache Avro is a data serialization framework that uses JSON schemas to define data structures and binary encoding for compact, efficient serialization. It is the default format for Confluent Kafka and is widely used in Hadoop, Spark, and Flink ecosystems.
The tool scans the data values in each CSV column. If all non-empty values are whole numbers within int range, it uses int. Larger integers become long. Decimal values become double. true/false becomes boolean. Everything else defaults to string. Columns with empty cells get union types (["null", "type"]).
Yes. Copy the generated schema JSON and register it via the Schema Registry REST API: POST /subjects/<topic-name>-value/versions with {"schema": "<your schema>"}.
The tool generates Avro primitive types: string, int, long, float, double, boolean, and null. Nullable fields use Avro union types (e.g., ["null", "string"]).
The tool accepts .csv (comma-separated values) and .tsv (tab-separated values) files. You can also enter data manually through the built-in table editor.
Yes. The output is plain JSON. You can modify field names, types, add doc descriptions, change the record name/namespace, or add logical types (e.g., {"type": "long", "logicalType": "timestamp-millis"}) after generation.
A union type is an array of types that allows a field to hold values of different types. The most common use is ["null", "string"] which means the field can be either null or a string. The tool generates unions when a column has empty cells.