arrow_array::array

Type Alias StringViewArray

Source
pub type StringViewArray = GenericByteViewArray<StringViewType>;
Expand description

A GenericByteViewArray that stores utf8 data

See GenericByteViewArray for format and layout details.

§Example

use arrow_array::StringViewArray;
let array = StringViewArray::from_iter_values(vec!["hello", "world", "lulu", "large payload over 12 bytes"]);
assert_eq!(array.value(0), "hello");
assert_eq!(array.value(3), "large payload over 12 bytes");

Aliased Type§

struct StringViewArray { /* private fields */ }

Implementations§

Source§

impl StringViewArray

Source

pub fn to_binary_view(self) -> BinaryViewArray

Source

pub fn is_ascii(&self) -> bool

Returns true if all data within this array is ASCII

Source§

impl<T: ByteViewType + ?Sized> GenericByteViewArray<T>

Source

pub fn new( views: ScalarBuffer<u128>, buffers: Vec<Buffer>, nulls: Option<NullBuffer>, ) -> Self

Create a new GenericByteViewArray from the provided parts, panicking on failure

§Panics

Panics if GenericByteViewArray::try_new returns an error

Source

pub fn try_new( views: ScalarBuffer<u128>, buffers: Vec<Buffer>, nulls: Option<NullBuffer>, ) -> Result<Self, ArrowError>

Create a new GenericByteViewArray from the provided parts, returning an error on failure

§Errors
Source

pub unsafe fn new_unchecked( views: ScalarBuffer<u128>, buffers: Vec<Buffer>, nulls: Option<NullBuffer>, ) -> Self

Create a new GenericByteViewArray from the provided parts, without validation

§Safety

Safe if Self::try_new would not error

Source

pub fn new_null(len: usize) -> Self

Create a new GenericByteViewArray of length len where all values are null

Source

pub fn new_scalar(value: impl AsRef<T::Native>) -> Scalar<Self>

Create a new Scalar from value

Source

pub fn from_iter_values<Ptr, I>(iter: I) -> Self
where Ptr: AsRef<T::Native>, I: IntoIterator<Item = Ptr>,

Creates a GenericByteViewArray based on an iterator of values without nulls

Source

pub fn into_parts(self) -> (ScalarBuffer<u128>, Vec<Buffer>, Option<NullBuffer>)

Deconstruct this array into its constituent parts

Source

pub fn views(&self) -> &ScalarBuffer<u128>

Returns the views buffer

Source

pub fn data_buffers(&self) -> &[Buffer]

Returns the buffers storing string data

Source

pub fn value(&self, i: usize) -> &T::Native

Returns the element at index i

§Panics

Panics if index i is out of bounds.

Source

pub unsafe fn value_unchecked(&self, idx: usize) -> &T::Native

Returns the element at index i without bounds checking

§Safety

Caller is responsible for ensuring that the index is within the bounds of the array

Source

pub unsafe fn inline_value(view: &u128, len: usize) -> &[u8]

Returns the first len bytes the inline value of the view.

§Safety
  • The view must be a valid element from Self::views() that adheres to the view layout.
  • The len must be the length of the inlined value. It should never be larger than 12.
Source

pub fn iter(&self) -> ArrayIter<&Self>

Constructs a new iterator for iterating over the values of this array

Source

pub fn bytes_iter(&self) -> impl Iterator<Item = &[u8]>

Returns an iterator over the bytes of this array, including null values

Source

pub fn prefix_bytes_iter( &self, prefix_len: usize, ) -> impl Iterator<Item = &[u8]>

Returns an iterator over the first prefix_len bytes of each array element, including null values.

If prefix_len is larger than the element’s length, the iterator will return an empty slice (&[]).

Source

pub fn suffix_bytes_iter( &self, suffix_len: usize, ) -> impl Iterator<Item = &[u8]>

Returns an iterator over the last suffix_len bytes of each array element, including null values.

Note that for StringViewArray the last bytes may start in the middle of a UTF-8 codepoint, and thus may not be a valid &str.

If suffix_len is larger than the element’s length, the iterator will return an empty slice (&[]).

Source

pub fn slice(&self, offset: usize, length: usize) -> Self

Returns a zero-copy slice of this array with the indicated offset and length.

Source

pub fn gc(&self) -> Self

Returns a “compacted” version of this array

The original array will not be modified

§Garbage Collection

Before GC:

                                       ┌──────┐                 
                                       │......│                 
                                       │......│                 
┌────────────────────┐       ┌ ─ ─ ─ ▶ │Data1 │   Large buffer  
│       View 1       │─ ─ ─ ─          │......│  with data that
├────────────────────┤                 │......│ is not referred
│       View 2       │─ ─ ─ ─ ─ ─ ─ ─▶ │Data2 │ to by View 1 or
└────────────────────┘                 │......│      View 2     
                                       │......│                 
   2 views, refer to                   │......│                 
  small portions of a                  └──────┘                 
     large buffer                                               

After GC:

┌────────────────────┐                 ┌─────┐    After gc, only
│       View 1       │─ ─ ─ ─ ─ ─ ─ ─▶ │Data1│     data that is  
├────────────────────┤       ┌ ─ ─ ─ ▶ │Data2│    pointed to by  
│       View 2       │─ ─ ─ ─          └─────┘     the views is  
└────────────────────┘                                 left      
                                                                  
                                                                  
        2 views                                                  

This method will compact the data buffers by recreating the view array and only include the data that is pointed to by the views.

Note that it will copy the array regardless of whether the original array is compact. Use with caution as this can be an expensive operation, only use it when you are sure that the view array is significantly smaller than when it is originally created, e.g., after filtering or slicing.

Note: this function does not attempt to canonicalize / deduplicate values. For this feature see GenericByteViewBuilder::with_deduplicate_strings.

Source

pub unsafe fn compare_unchecked( left: &GenericByteViewArray<T>, left_idx: usize, right: &GenericByteViewArray<T>, right_idx: usize, ) -> Ordering

Compare two GenericByteViewArray at index left_idx and right_idx

Comparing two ByteView types are non-trivial. It takes a bit of patience to understand why we don’t just compare two &u8 directly.

ByteView types give us the following two advantages, and we need to be careful not to lose them: (1) For string/byte smaller than 12 bytes, the entire data is inlined in the view. Meaning that reading one array element requires only one memory access (two memory access required for StringArray, one for offset buffer, the other for value buffer).

(2) For string/byte larger than 12 bytes, we can still be faster than (for certain operations) StringArray/ByteArray, thanks to the inlined 4 bytes. Consider equality check: If the first four bytes of the two strings are different, we can return false immediately (with just one memory access).

If we directly compare two &u8, we materialize the entire string (i.e., make multiple memory accesses), which might be unnecessary.

  • Most of the time (eq, ord), we only need to look at the first 4 bytes to know the answer, e.g., if the inlined 4 bytes are different, we can directly return unequal without looking at the full string.
§Order check flow

(1) if both string are smaller than 12 bytes, we can directly compare the data inlined to the view. (2) if any of the string is larger than 12 bytes, we need to compare the full string. (2.1) if the inlined 4 bytes are different, we can return the result immediately. (2.2) o.w., we need to compare the full string.

§Safety

The left/right_idx must within range of each array

Trait Implementations§

Source§

impl From<Vec<&str>> for StringViewArray

Source§

fn from(v: Vec<&str>) -> Self

Converts to this type from the input type.
Source§

impl From<Vec<Option<&str>>> for StringViewArray

Source§

fn from(v: Vec<Option<&str>>) -> Self

Converts to this type from the input type.
Source§

impl From<Vec<Option<String>>> for StringViewArray

Source§

fn from(v: Vec<Option<String>>) -> Self

Converts to this type from the input type.
Source§

impl From<Vec<String>> for StringViewArray

Source§

fn from(v: Vec<String>) -> Self

Converts to this type from the input type.
Source§

impl<'a> StringArrayType<'a> for &'a StringViewArray

Source§

fn is_ascii(&self) -> bool

Returns true if all data within this string array is ASCII
Source§

fn iter(&self) -> ArrayIter<Self>

Constructs a new iterator
Source§

impl<T: ByteViewType + ?Sized> Array for GenericByteViewArray<T>

Source§

fn as_any(&self) -> &dyn Any

Returns the array as Any so that it can be downcasted to a specific implementation. Read more
Source§

fn to_data(&self) -> ArrayData

Returns the underlying data of this array
Source§

fn into_data(self) -> ArrayData

Returns the underlying data of this array Read more
Source§

fn data_type(&self) -> &DataType

Returns a reference to the DataType of this array. Read more
Source§

fn slice(&self, offset: usize, length: usize) -> ArrayRef

Returns a zero-copy slice of this array with the indicated offset and length. Read more
Source§

fn len(&self) -> usize

Returns the length (i.e., number of elements) of this array. Read more
Source§

fn is_empty(&self) -> bool

Returns whether this array is empty. Read more
Source§

fn offset(&self) -> usize

Returns the offset into the underlying data used by this array(-slice). Note that the underlying data can be shared by many arrays. This defaults to 0. Read more
Source§

fn nulls(&self) -> Option<&NullBuffer>

Returns the null buffer of this array if any. Read more
Source§

fn logical_null_count(&self) -> usize

Returns the total number of logical null values in this array. Read more
Source§

fn get_buffer_memory_size(&self) -> usize

Returns the total number of bytes of memory pointed to by this array. The buffers store bytes in the Arrow memory format, and include the data as well as the validity map. Note that this does not always correspond to the exact memory usage of an array, since multiple arrays can share the same buffers or slices thereof.
Source§

fn get_array_memory_size(&self) -> usize

Returns the total number of bytes of memory occupied physically by this array. This value will always be greater than returned by get_buffer_memory_size() and includes the overhead of the data structures that contain the pointers to the various buffers.
Source§

fn logical_nulls(&self) -> Option<NullBuffer>

Returns a potentially computed NullBuffer that represents the logical null values of this array, if any. Read more
Source§

fn is_null(&self, index: usize) -> bool

Returns whether the element at index is null according to Array::nulls Read more
Source§

fn is_valid(&self, index: usize) -> bool

Returns whether the element at index is not null, the opposite of Self::is_null. Read more
Source§

fn null_count(&self) -> usize

Returns the total number of physical null values in this array. Read more
Source§

fn is_nullable(&self) -> bool

Returns false if the array is guaranteed to not contain any logical nulls Read more
Source§

impl<T: ByteViewType + ?Sized> Clone for GenericByteViewArray<T>

Source§

fn clone(&self) -> Self

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<T: ByteViewType + ?Sized> Debug for GenericByteViewArray<T>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<FROM, V> From<&GenericByteArray<FROM>> for GenericByteViewArray<V>
where FROM: ByteArrayType, FROM::Offset: OffsetSizeTrait + ToPrimitive, V: ByteViewType<Native = FROM::Native>,

Efficiently convert a GenericByteArray to a GenericByteViewArray

For example this method can convert a StringArray to a StringViewArray.

If the offsets are all less than u32::MAX, the new GenericByteViewArray is built without copying the underlying string data (views are created directly into the existing buffer)

Source§

fn from(byte_array: &GenericByteArray<FROM>) -> Self

Converts to this type from the input type.
Source§

impl<T: ByteViewType + ?Sized> From<ArrayData> for GenericByteViewArray<T>

Source§

fn from(value: ArrayData) -> Self

Converts to this type from the input type.
Source§

impl<'a, Ptr, T> FromIterator<&'a Option<Ptr>> for GenericByteViewArray<T>
where Ptr: AsRef<T::Native> + 'a, T: ByteViewType + ?Sized,

Source§

fn from_iter<I: IntoIterator<Item = &'a Option<Ptr>>>(iter: I) -> Self

Creates a value from an iterator. Read more
Source§

impl<Ptr, T: ByteViewType + ?Sized> FromIterator<Option<Ptr>> for GenericByteViewArray<T>
where Ptr: AsRef<T::Native>,

Source§

fn from_iter<I: IntoIterator<Item = Option<Ptr>>>(iter: I) -> Self

Creates a value from an iterator. Read more
Source§

impl<T: ByteViewType + ?Sized> PartialEq for GenericByteViewArray<T>

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.