Module mz_interchange::avro::schema
source Β· Expand description
Conversion from Avro schemas to Materialize RelationDesc
s.
A few notes for posterity on how this conversion happens are in order.
If the schema is an Avro record, we flatten it to its fields, which become the columns of the relation.
Each individual field is then converted to its SQL equivalent. For most types, this conversion is the obvious one. The only non-trivial counterexample is Avro unions.
Since Avro types are not nullable by default, the typical way normal (i.e., nullable)
SQL fields are represented in Avro is by a union of the underlying type with the
singleton type { Null }; in Avro schema notation, this is ["null", "TheType"]
.
We shall call union types following this pattern Nullability-Pattern Unions.
We shall call all other union types (e.g. ["MyType1", "MyType2"]
or ["null", "MyType1", "MyType2"]
) Essential Unions.
Since there is an obvious way to represent Nullability-Pattern Unions, but not Essential Unions, in the SQL type system,
we must handle Essential Unions with a bit of a hack (at least until Materialize supports union or sum types, which may be never).
When an Essential Union appears as one of the fields of a record, we expand
it to n columns in SQL, where n is the number of non-null variants in the union. These
columns will be given names created by pasting their index at the end of the overall name
of the field. For example, if an Essential Union in a field named "Foo"
has schema [int, bool]
, it will expand to the columns "Foo1": bool, "Foo2": int
. There is an implicit constraint upheld be the source pipeline that only one such column will be non-null
at a time
When an Essential Union appears elsewhere than as one of the fields of a record, there is nothing we can do, because we expect to be able to turn it into exactly one SQL type, not a series of them. Thus, in these cases, we just bail. For example, itβs not possible to ingest an array or map whose element type is an Essential Union.
Structs§
- Schema
Cache π
Functions§
- get_
named_ πcolumns - get_
union_ πcolumns Get the series of (one or more) SQL columns corresponding to an Avro union. See module comments for details. - Converts an Apache Avro schema into a list of column names and types.
- validate_
schema_ π1 Convert an Avro schema to a series of columns and names, flattening the top-level record, if the top node is indeed a record. - validate_
schema_ π2 Get the single column corresponding to a schema node. It is an error if this node should correspond to more than one column (because it is an Essential Union in the sense described in the module docs).