Crate mz_sql

source ·
Expand description

SQL-dataflow translation.

There are two main parts of the SQL–dataflow translation process:

  • Purification eliminates any external state from a SQL AST. It is an asynchronous process that may make network calls to external services. The input and output of purification is a SQL AST.

  • Planning converts a purified AST to a Plan, which describes an action that the system should take to effect the results of the query. Planning is a fast, pure function that always produces the same plan for a given input.

§Details

The purification step is, to our knowledge, unique to Materialize. In other SQL databases, there is no concept of purifying a statement before planning it. The reason for this difference is that in Materialize SQL statements can depend on external state: local files, Confluent Schema Registries, etc.

Presently only CREATE SOURCE statements can depend on external state, though this could change in the future. Consider, for example:

CREATE SOURCE ... FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY 'http://csr:8081'

The shape of the created source is dependent on the Avro schema that is stored in the schema registry running at csr:8081.

This is problematic, because we need planning to be a pure function of its input. Why?

  • Planning locks the catalog while it operates. Therefore it needs to be fast, because only one SQL query can be planned at a time. Depending on external state while holding a lock on the catalog would be seriously detrimental to the latency of other queries running on the system.

  • The catalog persists SQL ASTs across restarts of Materialize. If those ASTs depend on external state, then changes to that external state could corrupt Materialize’s catalog.

Purification is the escape hatch. It is a transformation from SQL AST to SQL AST that “inlines” any external state. For example, we purify the schema above by fetching the schema from the schema registry and inlining it.

CREATE SOURCE ... FORMAT AVRO USING SCHEMA '{"name": "foo", "fields": [...]}'

Importantly, purification cannot hold its reference to the catalog across an await point. That means it can run in its own Tokio task so that it does not block any other SQL commands on the server.

Modules§

  • SQL abstract syntax tree.
  • Catalog abstraction layer.
  • TBD: Currently, sql::func handles matching arguments to their respective built-in functions (for most built-in functions, at least).
  • Provides parsing and convenience functions for working with Kafka from the sql package.
  • Structured name types for SQL objects.
  • SQL normalization routines.
  • Metrics collected by the optimizer.
  • SQL parsing.
  • SQL planning.
  • SQL purification.
  • This module contains the bits of SQL sessions that are required for the SQL layer.

Macros§

Constants§