Module semantic_tokens

Expand description

LSP semantic tokens handler.

Implements textDocument/semanticTokens/full by lexing the document with mz_sql_lexer::lexer::lex and mapping each token to a standard LSP token type. Comments (discarded by the lexer) are recovered via a separate pre-scan that is aware of strings and quoted identifiers.

The output is delta-encoded per LSP 3.16: tokens are sorted by byte offset, split across line boundaries (LSP tokens are line-local), and serialized as a flat [deltaLine, deltaStartChar, length, tokenType, 0] sequence.

Legend indices must match the order declared in the server’s SemanticTokensLegend (see legend_token_types).

Structs§

LineToken 🔒: A line-local semantic token, after multi-line splitting.
RawSpan 🔒: Byte-offset span with an associated semantic token type.

Constants§

TOKEN_TYPE_COMMENT 🔒
TOKEN_TYPE_KEYWORD 🔒
TOKEN_TYPE_NUMBER 🔒
TOKEN_TYPE_OPERATOR 🔒
TOKEN_TYPE_PARAMETER 🔒
TOKEN_TYPE_STRING 🔒
TOKEN_TYPE_VARIABLE 🔒

Functions§

collect_comments 🔒: Pre-scan raw text for -- line comments and /* */ block comments. String bodies and quoted-identifier bodies are skipped so that comment markers inside them are not misidentified.
compute_semantic_tokens 🔒: Computes the semantic tokens for a SQL document.
encode_deltas 🔒: Delta-encode line-local tokens per LSP 3.16.
legend_token_types 🔒: Token types in the order required for legend indices.
lex_token_span 🔒: Map a lexer token to its byte span and semantic type.
line_for_offset 🔒: Binary search for the line containing offset.
line_starts 🔒: Byte offsets of the start of each line (including line 0 at offset 0).
saturating_u32 🔒: Convert a usize (line/column/length in the document) into the u32 width required by the LSP semantic-token wire format. No-op on values below u32::MAX; saturates otherwise. LSP positions are specified to be u32, so any document large enough to saturate is already unrepresentable.
scan_dollar_quoted_len 🔒: Length of a $tag$body$tag$ dollar-quoted string. Matches the outer delimiter using its tag (possibly empty).
scan_hex_string_token_len 🔒: Length of a hex string token: x'...' or X'...'.
scan_ident_len 🔒
scan_parameter_len 🔒
scan_string_token_len 🔒: Length of a string token. May be a normal '...' or extended E'...' / e'...' form (the E prefix is part of the token offset).
skip_double_quoted 🔒: Skip a "..." quoted-identifier body (with doubled-quote escape).
skip_single_quoted 🔒: Skip a '...' string body (with doubled-quote escape). Returns index just past the closing quote, or bytes.len() if unterminated.
split_across_lines 🔒: Split each raw span across line boundaries and compute UTF-16 column offsets. Produces line-local tokens, still in byte-order.
trim_trailing_newline 🔒: Trim a trailing \n or \r\n from a segment so it doesn’t include the line terminator.
utf16_len 🔒: Number of UTF-16 code units in s. ASCII-only fast path returns the byte length; non-ASCII walks chars and sums len_utf16.