Expand description
Interface to the character encoding.
§Raw incremental interface
Methods which name starts with raw_
constitute the raw incremental interface,
the lowest-available API for encoders and decoders.
This interface divides the entire input to four parts:
- Processed bytes do not affect the future result.
- Unprocessed bytes may affect the future result and can be a part of problematic sequence according to the future input.
- Problematic byte is the first byte that causes an error condition.
- Remaining bytes are not yet processed nor read, so the caller should feed any remaining bytes again.
The following figure illustrates an example of successive raw_feed
calls:
1st raw_feed :2nd raw_feed :3rd raw_feed
----------+----:---------------:--+--+---------
| : : | |
----------+----:---------------:--+--+---------
processed unprocessed | remaining
problematic
Since these parts can span the multiple input sequences to raw_feed
,
raw_feed
returns two offsets (one optional)
with that the caller can track the problematic sequence.
The first offset (the first usize
in the tuple) points to the first unprocessed bytes,
or is zero when unprocessed bytes have started before the current call.
(The first unprocessed byte can also be at offset 0,
which doesn’t make a difference for the caller.)
The second offset (upto
field in the CodecError
struct), if any,
points to the first remaining bytes.
If the caller needs to recover the error via the problematic sequence, then the caller starts to save the unprocessed bytes when the first offset < the input length, appends any new unprocessed bytes while the first offset is zero, and discards unprocessed bytes when first offset becomes non-zero while saving new unprocessed bytes when the first offset < the input length. Then the caller checks for the error condition and can use the saved unprocessed bytes for error recovery. Alternatively, if the caller only wants to replace the problematic sequence with a fixed string (like U+FFFD), then it can just discard the first sequence and can emit the fixed string on an error. It still has to feed the input bytes starting at the second offset again.
Structs§
- Error information from either encoder or decoder.
Enums§
- Trap, which handles decoder errors.
Traits§
- Byte writer used by encoders. In most cases this will be an owned vector of
u8
. - Character encoding.
- Decoder converting a byte sequence into a Unicode string. This is a lower level interface, and normally
Encoding::decode
should be used instead. - Encoder converting a Unicode string into a byte sequence. This is a lower level interface, and normally
Encoding::encode
should be used instead. - String writer used by decoders. In most cases this will be an owned string.
Functions§
- Determine the encoding by looking for a Byte Order Mark (BOM) and decoded a single string in memory. Return the result and the used encoding.
Type Aliases§
- A type of the bare function in
DecoderTrap
values. - A type of the bare function in
EncoderTrap
values. - A trait object using dynamic dispatch which is a sendable reference to the encoding, for code where the encoding is not known at compile-time.