pub struct Utf8Error { /* private fields */ }
Expand description
An error that occurs when UTF-8 decoding fails.
This error occurs when attempting to convert a non-UTF-8 byte
string to a Rust string that must be valid UTF-8. For example,
to_str
is one such method.
§Example
This example shows what happens when a given byte sequence is invalid, but ends with a sequence that is a possible prefix of valid UTF-8.
use bstr::{B, ByteSlice};
let s = B(b"foobar\xF1\x80\x80");
let err = s.to_str().unwrap_err();
assert_eq!(err.valid_up_to(), 6);
assert_eq!(err.error_len(), None);
This example shows what happens when a given byte sequence contains invalid UTF-8.
use bstr::ByteSlice;
let s = b"foobar\xF1\x80\x80quux";
let err = s.to_str().unwrap_err();
assert_eq!(err.valid_up_to(), 6);
// The error length reports the maximum number of bytes that correspond to
// a valid prefix of a UTF-8 encoded codepoint.
assert_eq!(err.error_len(), Some(3));
// In contrast to the above which contains a single invalid prefix,
// consider the case of multiple individual bytes that are never valid
// prefixes. Note how the value of error_len changes!
let s = b"foobar\xFF\xFFquux";
let err = s.to_str().unwrap_err();
assert_eq!(err.valid_up_to(), 6);
assert_eq!(err.error_len(), Some(1));
// The fact that it's an invalid prefix does not change error_len even
// when it immediately precedes the end of the string.
let s = b"foobar\xFF";
let err = s.to_str().unwrap_err();
assert_eq!(err.valid_up_to(), 6);
assert_eq!(err.error_len(), Some(1));
Implementations§
Source§impl Utf8Error
impl Utf8Error
Sourcepub fn valid_up_to(&self) -> usize
pub fn valid_up_to(&self) -> usize
Returns the byte index of the position immediately following the last valid UTF-8 byte.
§Example
This examples shows how valid_up_to
can be used to retrieve a
possibly empty prefix that is guaranteed to be valid UTF-8:
use bstr::ByteSlice;
let s = b"foobar\xF1\x80\x80quux";
let err = s.to_str().unwrap_err();
// This is guaranteed to never panic.
let string = s[..err.valid_up_to()].to_str().unwrap();
assert_eq!(string, "foobar");
Sourcepub fn error_len(&self) -> Option<usize>
pub fn error_len(&self) -> Option<usize>
Returns the total number of invalid UTF-8 bytes immediately following
the position returned by valid_up_to
. This value is always at least
1
, but can be up to 3
if bytes form a valid prefix of some UTF-8
encoded codepoint.
If the end of the original input was found before a valid UTF-8 encoded
codepoint could be completed, then this returns None
. This is useful
when processing streams, where a None
value signals that more input
might be needed.