pub struct NsReader<R> { /* private fields */ }
Expand description
A low level encoding-agnostic XML event reader that performs namespace resolution.
Consumes a BufRead
and streams XML Event
s.
Implementations§
Source§impl<R> NsReader<R>
impl<R> NsReader<R>
Builder methods
Sourcepub fn from_reader(reader: R) -> Self
pub fn from_reader(reader: R) -> Self
Creates a NsReader
that reads from a reader.
Sourcepub fn expand_empty_elements(&mut self, val: bool) -> &mut Self
pub fn expand_empty_elements(&mut self, val: bool) -> &mut Self
Changes whether empty elements should be split into an Open
and a Close
event.
When set to true
, all Empty
events produced by a self-closing tag like <tag/>
are
expanded into a Start
event followed by an End
event. When set to false
(the
default), those tags are represented by an Empty
event instead.
Note, that setting this to true
will lead to additional allocates that
needed to store tag name for an End
event. However if check_end_names
is also set, only one additional allocation will be performed that support
both these options.
(false
by default)
Sourcepub fn trim_text(&mut self, val: bool) -> &mut Self
pub fn trim_text(&mut self, val: bool) -> &mut Self
Changes whether whitespace before and after character data should be removed.
When set to true
, all Text
events are trimmed.
If after that the event is empty it will not be pushed.
Changing this option automatically changes the trim_text_end
option.
(false
by default).
WARNING: With this option every text events will be trimmed which is
incorrect behavior when text events delimited by comments, processing
instructions or CDATA sections. To correctly trim data manually apply
BytesText::inplace_trim_start
and BytesText::inplace_trim_end
only to necessary events.
Sourcepub fn trim_text_end(&mut self, val: bool) -> &mut Self
pub fn trim_text_end(&mut self, val: bool) -> &mut Self
Changes whether whitespace after character data should be removed.
When set to true
, trailing whitespace is trimmed in Text
events.
If after that the event is empty it will not be pushed.
(false
by default).
WARNING: With this option every text events will be trimmed which is
incorrect behavior when text events delimited by comments, processing
instructions or CDATA sections. To correctly trim data manually apply
BytesText::inplace_trim_start
and BytesText::inplace_trim_end
only to necessary events.
Changes whether trailing whitespaces after the markup name are trimmed in closing tags
</a >
.
If true the emitted End
event is stripped of trailing whitespace after the markup name.
Note that if set to false
and check_end_names
is true the comparison of markup names is
going to fail erroneously if a closing tag contains trailing whitespaces.
(true
by default)
Sourcepub fn check_end_names(&mut self, val: bool) -> &mut Self
pub fn check_end_names(&mut self, val: bool) -> &mut Self
Changes whether mismatched closing tag names should be detected.
Note, that start and end tags should match literally, they cannot have different prefixes even if both prefixes resolve to the same namespace. The XML
<outer xmlns="namespace" xmlns:p="namespace">
</p:outer>
is not valid, even though semantically the start tag is the same as the end tag. The reason is that namespaces are an extension of the original XML specification (without namespaces) and it should be backward-compatible.
When set to false
, it won’t check if a closing tag matches the corresponding opening tag.
For example, <mytag></different_tag>
will be permitted.
If the XML is known to be sane (already processed, etc.) this saves extra time.
Note that the emitted End
event will not be modified if this is disabled, ie. it will
contain the data of the mismatched end tag.
Note, that setting this to true
will lead to additional allocates that
needed to store tag name for an End
event. However if expand_empty_elements
is also set, only one additional allocation will be performed that support
both these options.
(true
by default)
Sourcepub fn check_comments(&mut self, val: bool) -> &mut Self
pub fn check_comments(&mut self, val: bool) -> &mut Self
Changes whether comments should be validated.
When set to true
, every Comment
event will be checked for not containing --
, which
is not allowed in XML comments. Most of the time we don’t want comments at all so we don’t
really care about comment correctness, thus the default value is false
to improve
performance.
(false
by default)
Source§impl<R> NsReader<R>
impl<R> NsReader<R>
Getters
Sourcepub fn into_inner(self) -> R
pub fn into_inner(self) -> R
Consumes NsReader
returning the underlying reader
See the Reader::into_inner
for examples
Sourcepub fn resolve<'n>(
&self,
name: QName<'n>,
attribute: bool,
) -> (ResolveResult<'_>, LocalName<'n>)
pub fn resolve<'n>( &self, name: QName<'n>, attribute: bool, ) -> (ResolveResult<'_>, LocalName<'n>)
Resolves a potentially qualified element name or attribute name into (namespace name, local name).
Qualified names have the form prefix:local-name
where the prefix
is defined on any containing XML element via xmlns:prefix="the:namespace:uri"
.
The namespace prefix can be defined on the same element as the name in question.
The method returns following results depending on the name
shape,
attribute
flag and the presence of the default namespace:
attribute | xmlns="..." | QName | ResolveResult | LocalName |
---|---|---|---|---|
true | Not defined | local-name | Unbound | local-name |
true | Defined | local-name | Unbound | local-name |
true | any | prefix:local-name | Bound / Unknown | local-name |
false | Not defined | local-name | Unbound | local-name |
false | Defined | local-name | Bound (default) | local-name |
false | any | prefix:local-name | Bound / Unknown | local-name |
If you want to clearly indicate that name that you resolve is an element
or an attribute name, you could use resolve_attribute()
or resolve_element()
methods.
§Lifetimes
'n
: lifetime of a name. Returned local name will be bound to the same lifetime as the name in question.- returned namespace name will be bound to the reader itself
Sourcepub fn resolve_element<'n>(
&self,
name: QName<'n>,
) -> (ResolveResult<'_>, LocalName<'n>)
pub fn resolve_element<'n>( &self, name: QName<'n>, ) -> (ResolveResult<'_>, LocalName<'n>)
Resolves a potentially qualified element name into (namespace name, local name).
Qualified element names have the form prefix:local-name
where the
prefix
is defined on any containing XML element via xmlns:prefix="the:namespace:uri"
.
The namespace prefix can be defined on the same element as the element
in question.
Unqualified elements inherits the current default namespace.
The method returns following results depending on the name
shape and
the presence of the default namespace:
xmlns="..." | QName | ResolveResult | LocalName |
---|---|---|---|
Not defined | local-name | Unbound | local-name |
Defined | local-name | Bound (default) | local-name |
any | prefix:local-name | Bound / Unknown | local-name |
§Lifetimes
'n
: lifetime of an element name. Returned local name will be bound to the same lifetime as the name in question.- returned namespace name will be bound to the reader itself
§Examples
This example shows how you can resolve qualified name into a namespace.
Note, that in the code like this you do not need to do that manually,
because the namespace resolution result returned by the read_resolved_event()
.
use quick_xml::events::Event;
use quick_xml::name::{Namespace, QName, ResolveResult::*};
use quick_xml::reader::NsReader;
let mut reader = NsReader::from_str("<tag xmlns='root namespace'/>");
match reader.read_event().unwrap() {
Event::Empty(e) => assert_eq!(
reader.resolve_element(e.name()),
(Bound(Namespace(b"root namespace")), QName(b"tag").into())
),
_ => unreachable!(),
}
Sourcepub fn resolve_attribute<'n>(
&self,
name: QName<'n>,
) -> (ResolveResult<'_>, LocalName<'n>)
pub fn resolve_attribute<'n>( &self, name: QName<'n>, ) -> (ResolveResult<'_>, LocalName<'n>)
Resolves a potentially qualified attribute name into (namespace name, local name).
Qualified attribute names have the form prefix:local-name
where the
prefix
is defined on any containing XML element via xmlns:prefix="the:namespace:uri"
.
The namespace prefix can be defined on the same element as the attribute
in question.
Unqualified attribute names do not inherit the current default namespace.
The method returns following results depending on the name
shape and
the presence of the default namespace:
xmlns="..." | QName | ResolveResult | LocalName |
---|---|---|---|
Not defined | local-name | Unbound | local-name |
Defined | local-name | Unbound | local-name |
any | prefix:local-name | Bound / Unknown | local-name |
§Lifetimes
'n
: lifetime of an attribute name. Returned local name will be bound to the same lifetime as the name in question.- returned namespace name will be bound to the reader itself
§Examples
use quick_xml::events::Event;
use quick_xml::events::attributes::Attribute;
use quick_xml::name::{Namespace, QName, ResolveResult::*};
use quick_xml::reader::NsReader;
let mut reader = NsReader::from_str("
<tag one='1'
p:two='2'
xmlns='root namespace'
xmlns:p='other namespace'/>
");
reader.trim_text(true);
match reader.read_event().unwrap() {
Event::Empty(e) => {
let mut iter = e.attributes();
// Unlike elements, attributes without explicit namespace
// not bound to any namespace
let one = iter.next().unwrap().unwrap();
assert_eq!(
reader.resolve_attribute(one.key),
(Unbound, QName(b"one").into())
);
let two = iter.next().unwrap().unwrap();
assert_eq!(
reader.resolve_attribute(two.key),
(Bound(Namespace(b"other namespace")), QName(b"two").into())
);
}
_ => unreachable!(),
}
Source§impl<R: BufRead> NsReader<R>
impl<R: BufRead> NsReader<R>
Sourcepub fn read_event_into<'b>(&mut self, buf: &'b mut Vec<u8>) -> Result<Event<'b>>
pub fn read_event_into<'b>(&mut self, buf: &'b mut Vec<u8>) -> Result<Event<'b>>
Reads the next event into given buffer.
This method manages namespaces but doesn’t resolve them automatically.
You should call resolve_element()
if you want to get a namespace.
You also can use read_resolved_event_into()
instead if you want to resolve
namespace as soon as you get an event.
§Examples
use quick_xml::events::Event;
use quick_xml::name::{Namespace, ResolveResult::*};
use quick_xml::reader::NsReader;
let mut reader = NsReader::from_str(r#"
<x:tag1 xmlns:x="www.xxxx" xmlns:y="www.yyyy" att1 = "test">
<y:tag2><!--Test comment-->Test</y:tag2>
<y:tag2>Test 2</y:tag2>
</x:tag1>
"#);
reader.trim_text(true);
let mut count = 0;
let mut buf = Vec::new();
let mut txt = Vec::new();
loop {
match reader.read_event_into(&mut buf).unwrap() {
Event::Start(e) => {
count += 1;
let (ns, local) = reader.resolve_element(e.name());
match local.as_ref() {
b"tag1" => assert_eq!(ns, Bound(Namespace(b"www.xxxx"))),
b"tag2" => assert_eq!(ns, Bound(Namespace(b"www.yyyy"))),
_ => unreachable!(),
}
}
Event::Text(e) => {
txt.push(e.unescape().unwrap().into_owned())
}
Event::Eof => break,
_ => (),
}
buf.clear();
}
assert_eq!(count, 3);
assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);
Sourcepub fn read_resolved_event_into<'b>(
&mut self,
buf: &'b mut Vec<u8>,
) -> Result<(ResolveResult<'_>, Event<'b>)>
pub fn read_resolved_event_into<'b>( &mut self, buf: &'b mut Vec<u8>, ) -> Result<(ResolveResult<'_>, Event<'b>)>
Reads the next event into given buffer and resolves its namespace (if applicable).
Namespace is resolved only for Start
, Empty
and End
events.
For all other events the concept of namespace is not defined, so
a ResolveResult::Unbound
is returned.
If you are not interested in namespaces, you can use read_event_into()
which will not automatically resolve namespaces for you.
§Examples
use quick_xml::events::Event;
use quick_xml::name::{Namespace, QName, ResolveResult::*};
use quick_xml::reader::NsReader;
let mut reader = NsReader::from_str(r#"
<x:tag1 xmlns:x="www.xxxx" xmlns:y="www.yyyy" att1 = "test">
<y:tag2><!--Test comment-->Test</y:tag2>
<y:tag2>Test 2</y:tag2>
</x:tag1>
"#);
reader.trim_text(true);
let mut count = 0;
let mut buf = Vec::new();
let mut txt = Vec::new();
loop {
match reader.read_resolved_event_into(&mut buf).unwrap() {
(Bound(Namespace(b"www.xxxx")), Event::Start(e)) => {
count += 1;
assert_eq!(e.local_name(), QName(b"tag1").into());
}
(Bound(Namespace(b"www.yyyy")), Event::Start(e)) => {
count += 1;
assert_eq!(e.local_name(), QName(b"tag2").into());
}
(_, Event::Start(_)) => unreachable!(),
(_, Event::Text(e)) => {
txt.push(e.unescape().unwrap().into_owned())
}
(_, Event::Eof) => break,
_ => (),
}
buf.clear();
}
assert_eq!(count, 3);
assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);
Sourcepub fn read_to_end_into(
&mut self,
end: QName<'_>,
buf: &mut Vec<u8>,
) -> Result<Span>
pub fn read_to_end_into( &mut self, end: QName<'_>, buf: &mut Vec<u8>, ) -> Result<Span>
Reads until end element is found using provided buffer as intermediate
storage for events content. This function is supposed to be called after
you already read a Start
event.
Returns a span that cover content between >
of an opening tag and <
of
a closing tag or an empty slice, if expand_empty_elements
is set and
this method was called after reading expanded Start
event.
Manages nested cases where parent and child elements have the literally same name.
If corresponding End
event will not be found, the UnexpectedEof
will be returned. In particularly, that error will be returned if you call
this method without consuming the corresponding Start
event first.
If your reader created from a string slice or byte array slice, it is
better to use read_to_end()
method, because it will not copy bytes
into intermediate buffer.
The provided buf
buffer will be filled only by one event content at time.
Before reading of each event the buffer will be cleared. If you know an
appropriate size of each event, you can preallocate the buffer to reduce
number of reallocations.
The end
parameter should contain name of the end element in the reader
encoding. It is good practice to always get that parameter using
BytesStart::to_end()
method.
§Namespaces
While the NsReader
does namespace resolution, namespaces does not
change the algorithm for comparing names. Although the names a:name
and b:name
where both prefixes a
and b
resolves to the same namespace,
are semantically equivalent, </b:name>
cannot close <a:name>
, because
according to the specification
The end of every element that begins with a start-tag MUST be marked by an end-tag containing a name that echoes the element’s type as given in the start-tag
§Examples
This example shows, how you can skip XML content after you read the start event.
use quick_xml::events::{BytesStart, Event};
use quick_xml::name::{Namespace, ResolveResult};
use quick_xml::reader::NsReader;
let mut reader = NsReader::from_str(r#"
<outer xmlns="namespace 1">
<inner xmlns="namespace 2">
<outer></outer>
</inner>
<inner>
<inner></inner>
<inner/>
<outer></outer>
<p:outer xmlns:p="ns"></p:outer>
<outer/>
</inner>
</outer>
"#);
reader.trim_text(true);
let mut buf = Vec::new();
let ns = Namespace(b"namespace 1");
let start = BytesStart::from_content(r#"outer xmlns="namespace 1""#, 5);
let end = start.to_end().into_owned();
// First, we read a start event...
assert_eq!(
reader.read_resolved_event_into(&mut buf).unwrap(),
(ResolveResult::Bound(ns), Event::Start(start))
);
// ...then, we could skip all events to the corresponding end event.
// This call will correctly handle nested <outer> elements.
// Note, however, that this method does not handle namespaces.
reader.read_to_end_into(end.name(), &mut buf).unwrap();
// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(
reader.read_resolved_event_into(&mut buf).unwrap(),
(ResolveResult::Unbound, Event::Eof)
);
Source§impl<'i> NsReader<&'i [u8]>
impl<'i> NsReader<&'i [u8]>
Sourcepub fn read_event(&mut self) -> Result<Event<'i>>
pub fn read_event(&mut self) -> Result<Event<'i>>
Reads the next event, borrow its content from the input buffer.
This method manages namespaces but doesn’t resolve them automatically.
You should call resolve_element()
if you want to get a namespace.
You also can use read_resolved_event()
instead if you want to resolve namespace
as soon as you get an event.
There is no asynchronous read_event_async()
version of this function,
because it is not necessary – the contents are already in memory and no IO
is needed, therefore there is no potential for blocking.
§Examples
use quick_xml::events::Event;
use quick_xml::name::{Namespace, ResolveResult::*};
use quick_xml::reader::NsReader;
let mut reader = NsReader::from_str(r#"
<x:tag1 xmlns:x="www.xxxx" xmlns:y="www.yyyy" att1 = "test">
<y:tag2><!--Test comment-->Test</y:tag2>
<y:tag2>Test 2</y:tag2>
</x:tag1>
"#);
reader.trim_text(true);
let mut count = 0;
let mut txt = Vec::new();
loop {
match reader.read_event().unwrap() {
Event::Start(e) => {
count += 1;
let (ns, local) = reader.resolve_element(e.name());
match local.as_ref() {
b"tag1" => assert_eq!(ns, Bound(Namespace(b"www.xxxx"))),
b"tag2" => assert_eq!(ns, Bound(Namespace(b"www.yyyy"))),
_ => unreachable!(),
}
}
Event::Text(e) => {
txt.push(e.unescape().unwrap().into_owned())
}
Event::Eof => break,
_ => (),
}
}
assert_eq!(count, 3);
assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);
Sourcepub fn read_resolved_event(&mut self) -> Result<(ResolveResult<'_>, Event<'i>)>
pub fn read_resolved_event(&mut self) -> Result<(ResolveResult<'_>, Event<'i>)>
Reads the next event, borrow its content from the input buffer, and resolves its namespace (if applicable).
Namespace is resolved only for Start
, Empty
and End
events.
For all other events the concept of namespace is not defined, so
a ResolveResult::Unbound
is returned.
If you are not interested in namespaces, you can use read_event()
which will not automatically resolve namespaces for you.
There is no asynchronous read_resolved_event_async()
version of this function,
because it is not necessary – the contents are already in memory and no IO
is needed, therefore there is no potential for blocking.
§Examples
use quick_xml::events::Event;
use quick_xml::name::{Namespace, QName, ResolveResult::*};
use quick_xml::reader::NsReader;
let mut reader = NsReader::from_str(r#"
<x:tag1 xmlns:x="www.xxxx" xmlns:y="www.yyyy" att1 = "test">
<y:tag2><!--Test comment-->Test</y:tag2>
<y:tag2>Test 2</y:tag2>
</x:tag1>
"#);
reader.trim_text(true);
let mut count = 0;
let mut txt = Vec::new();
loop {
match reader.read_resolved_event().unwrap() {
(Bound(Namespace(b"www.xxxx")), Event::Start(e)) => {
count += 1;
assert_eq!(e.local_name(), QName(b"tag1").into());
}
(Bound(Namespace(b"www.yyyy")), Event::Start(e)) => {
count += 1;
assert_eq!(e.local_name(), QName(b"tag2").into());
}
(_, Event::Start(_)) => unreachable!(),
(_, Event::Text(e)) => {
txt.push(e.unescape().unwrap().into_owned())
}
(_, Event::Eof) => break,
_ => (),
}
}
assert_eq!(count, 3);
assert_eq!(txt, vec!["Test".to_string(), "Test 2".to_string()]);
Sourcepub fn read_to_end(&mut self, end: QName<'_>) -> Result<Span>
pub fn read_to_end(&mut self, end: QName<'_>) -> Result<Span>
Reads until end element is found. This function is supposed to be called
after you already read a Start
event.
Returns a span that cover content between >
of an opening tag and <
of
a closing tag or an empty slice, if expand_empty_elements
is set and
this method was called after reading expanded Start
event.
Manages nested cases where parent and child elements have the literally same name.
If corresponding End
event will not be found, the UnexpectedEof
will be returned. In particularly, that error will be returned if you call
this method without consuming the corresponding Start
event first.
The end
parameter should contain name of the end element in the reader
encoding. It is good practice to always get that parameter using
BytesStart::to_end()
method.
There is no asynchronous read_to_end_async()
version of this function,
because it is not necessary – the contents are already in memory and no IO
is needed, therefore there is no potential for blocking.
§Namespaces
While the NsReader
does namespace resolution, namespaces does not
change the algorithm for comparing names. Although the names a:name
and b:name
where both prefixes a
and b
resolves to the same namespace,
are semantically equivalent, </b:name>
cannot close <a:name>
, because
according to the specification
The end of every element that begins with a start-tag MUST be marked by an end-tag containing a name that echoes the element’s type as given in the start-tag
§Examples
This example shows, how you can skip XML content after you read the start event.
use quick_xml::events::{BytesStart, Event};
use quick_xml::name::{Namespace, ResolveResult};
use quick_xml::reader::NsReader;
let mut reader = NsReader::from_str(r#"
<outer xmlns="namespace 1">
<inner xmlns="namespace 2">
<outer></outer>
</inner>
<inner>
<inner></inner>
<inner/>
<outer></outer>
<p:outer xmlns:p="ns"></p:outer>
<outer/>
</inner>
</outer>
"#);
reader.trim_text(true);
let ns = Namespace(b"namespace 1");
let start = BytesStart::from_content(r#"outer xmlns="namespace 1""#, 5);
let end = start.to_end().into_owned();
// First, we read a start event...
assert_eq!(
reader.read_resolved_event().unwrap(),
(ResolveResult::Bound(ns), Event::Start(start))
);
// ...then, we could skip all events to the corresponding end event.
// This call will correctly handle nested <outer> elements.
// Note, however, that this method does not handle namespaces.
reader.read_to_end(end.name()).unwrap();
// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(
reader.read_resolved_event().unwrap(),
(ResolveResult::Unbound, Event::Eof)
);
Sourcepub fn read_text(&mut self, end: QName<'_>) -> Result<Cow<'i, str>>
pub fn read_text(&mut self, end: QName<'_>) -> Result<Cow<'i, str>>
Reads content between start and end tags, including any markup. This
function is supposed to be called after you already read a Start
event.
Manages nested cases where parent and child elements have the literally same name.
This method does not unescape read data, instead it returns content “as is” of the XML document. This is because it has no idea what text it reads, and if, for example, it contains CDATA section, attempt to unescape it content will spoil data.
Any text will be decoded using the XML current decoder()
.
Actually, this method perform the following code:
let span = reader.read_to_end(end)?;
let text = reader.decoder().decode(&reader.inner_slice[span]);
§Examples
This example shows, how you can read a HTML content from your XML document.
use quick_xml::events::{BytesStart, Event};
use quick_xml::reader::NsReader;
let mut reader = NsReader::from_str(r#"
<html>
<title>This is a HTML text</title>
<p>Usual XML rules does not apply inside it
<p>For example, elements not needed to be "closed"
</html>
"#);
reader.trim_text(true);
let start = BytesStart::new("html");
let end = start.to_end().into_owned();
// First, we read a start event...
assert_eq!(reader.read_event().unwrap(), Event::Start(start));
// ...and disable checking of end names because we expect HTML further...
reader.check_end_names(false);
// ...then, we could read text content until close tag.
// This call will correctly handle nested <html> elements.
let text = reader.read_text(end.name()).unwrap();
assert_eq!(text, Cow::Borrowed(r#"
<title>This is a HTML text</title>
<p>Usual XML rules does not apply inside it
<p>For example, elements not needed to be "closed"
"#));
// Now we can enable checks again
reader.check_end_names(true);
// At the end we should get an Eof event, because we ate the whole XML
assert_eq!(reader.read_event().unwrap(), Event::Eof);
Methods from Deref<Target = Reader<R>>§
Sourcepub fn buffer_position(&self) -> usize
pub fn buffer_position(&self) -> usize
Gets the current byte position in the input data.
Useful when debugging errors.
Sourcepub fn decoder(&self) -> Decoder
pub fn decoder(&self) -> Decoder
Get the decoder, used to decode bytes, read by this reader, to the strings.
If encoding
feature is enabled, the used encoding may change after
parsing the XML declaration, otherwise encoding is fixed to UTF-8.
If encoding
feature is enabled and no encoding is specified in declaration,
defaults to UTF-8.