Module mz_interchange::avro::schema

source ·
Expand description

Conversion from Avro schemas to Materialize RelationDescs.

A few notes for posterity on how this conversion happens are in order.

If the schema is an Avro record, we flatten it to its fields, which become the columns of the relation.

Each individual field is then converted to its SQL equivalent. For most types, this conversion is the obvious one. The only non-trivial counterexample is Avro unions.

Since Avro types are not nullable by default, the typical way normal (i.e., nullable) SQL fields are represented in Avro is by a union of the underlying type with the singleton type { Null }; in Avro schema notation, this is ["null", "TheType"]. We shall call union types following this pattern Nullability-Pattern Unions. We shall call all other union types (e.g. ["MyType1", "MyType2"] or ["null", "MyType1", "MyType2"]) Essential Unions. Since there is an obvious way to represent Nullability-Pattern Unions, but not Essential Unions, in the SQL type system, we must handle Essential Unions with a bit of a hack (at least until Materialize supports union or sum types, which may be never).

When an Essential Union appears as one of the fields of a record, we expand it to n columns in SQL, where n is the number of non-null variants in the union. These columns will be given names created by pasting their index at the end of the overall name of the field. For example, if an Essential Union in a field named "Foo" has schema [int, bool], it will expand to the columns "Foo1": bool, "Foo2": int. There is an implicit constraint upheld be the source pipeline that only one such column will be non-null at a time

When an Essential Union appears elsewhere than as one of the fields of a record, there is nothing we can do, because we expect to be able to turn it into exactly one SQL type, not a series of them. Thus, in these cases, we just bail. For example, it’s not possible to ingest an array or map whose element type is an Essential Union.

Structs

Functions

  • Get the series of (one or more) SQL columns corresponding to an Avro union. See module comments for details.
  • Converts an Apache Avro schema into a list of column names and types.
  • Convert an Avro schema to a series of columns and names, flattening the top-level record, if the top node is indeed a record.
  • Get the single column corresponding to a schema node. It is an error if this node should correspond to more than one column (because it is an Essential Union in the sense described in the module docs).