Function arrow::compute::kernels::cast::cast_with_options

source ·
pub fn cast_with_options(
    array: &dyn Array,
    to_type: &DataType,
    cast_options: &CastOptions<'_>,
) -> Result<Arc<dyn Array>, ArrowError>
Expand description

Try to cast array to to_type if possible.

Returns a new Array with type to_type if possible.

Accepts CastOptions to specify cast behavior. See also cast().

§Behavior

  • Boolean to Utf8: true => ‘1’, false => 0
  • Utf8 to Boolean: true, yes, on, 1 => true, false, no, off, 0 => false, short variants are accepted, other strings return null or error
  • Utf8 to Numeric: strings that can’t be parsed to numbers return null, float strings in integer casts return null
  • Numeric to Boolean: 0 returns false, any other value returns true
  • List to List: the underlying data type is cast
  • List to FixedSizeList: the underlying data type is cast. If safe is true and a list element has the wrong length it will be replaced with NULL, otherwise an error will be returned
  • Primitive to List: a list array with 1 value per slot is created
  • Date32 and Date64: precision lost when going to higher interval
  • Time32 and Time64`: precision lost when going to higher interval
  • Timestamp and Date{32|64}: precision lost when going to higher interval
  • Temporal to/from backing Primitive: zero-copy with data type change
  • Float32/Float64 to Decimal(precision, scale) rounds to the scale decimals (i.e. casting 6.4999 to Decimal(10, 1) becomes 6.5).

Unsupported Casts (check with can_cast_types before calling):

  • To or from StructArray
  • List to Primitive
  • Interval and Duration

§Timestamps and Timezones

Timestamps are stored with an optional timezone in Arrow.

§Casting timestamps to a timestamp without timezone / UTC

// can use "UTC" if chrono-tz feature is enabled, here use offset based timezone
let data_type = DataType::Timestamp(TimeUnit::Second, None);
let a = Int64Array::from(vec![1_000_000_000, 2_000_000_000, 3_000_000_000]);
let b = cast(&a, &data_type).unwrap();
let b = b.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_000_000, b.value(1)); // values are the same as the type has no timezone
// use display to show them (note has no trailing Z)
assert_eq!("2033-05-18T03:33:20", display::array_value_to_string(&b, 1).unwrap());

§Casting timestamps to a timestamp with timezone

Similarly to the previous example, if you cast numeric values to a timestamp with timezone, the cast kernel will not change the underlying values but display and other functions will interpret them as being in the provided timezone.

// can use "Americas/New_York" if chrono-tz feature is enabled, here use offset based timezone
let data_type = DataType::Timestamp(TimeUnit::Second, Some("-05:00".into()));
let a = Int64Array::from(vec![1_000_000_000, 2_000_000_000, 3_000_000_000]);
let b = cast(&a, &data_type).unwrap();
let b = b.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_000_000, b.value(1)); // values are still the same
// displayed in the target timezone (note the offset -05:00)
assert_eq!("2033-05-17T22:33:20-05:00", display::array_value_to_string(&b, 1).unwrap());

§Casting timestamps without timezone to timestamps with timezone

When casting from a timestamp without timezone to a timestamp with timezone, the cast kernel interprets the timestamp values as being in the destination timezone and then adjusts the underlying value to UTC as required

However, note that when casting from a timestamp with timezone BACK to a timestamp without timezone the cast kernel does not adjust the values.

Thus round trip casting a timestamp without timezone to a timestamp with timezone and back to a timestamp without timezone results in different values than the starting values.

let data_type  = DataType::Timestamp(TimeUnit::Second, None);
let data_type_tz = DataType::Timestamp(TimeUnit::Second, Some("-05:00".into()));
let a = Int64Array::from(vec![1_000_000_000, 2_000_000_000, 3_000_000_000]);
let b = cast(&a, &data_type).unwrap(); // cast to timestamp without timezone
let b = b.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_000_000, b.value(1)); // values are still the same
// displayed without a timezone (note lack of offset or Z)
assert_eq!("2033-05-18T03:33:20", display::array_value_to_string(&b, 1).unwrap());

// Convert timestamps without a timezone to timestamps with a timezone
let c = cast(&b, &data_type_tz).unwrap();
let c = c.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_018_000, c.value(1)); // value has been adjusted by offset
// displayed with the target timezone offset (-05:00)
assert_eq!("2033-05-18T03:33:20-05:00", display::array_value_to_string(&c, 1).unwrap());

// Convert from timestamp with timezone back to timestamp without timezone
let d = cast(&c, &data_type).unwrap();
let d = d.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_018_000, d.value(1)); // value has not been adjusted
// NOTE: the timestamp is adjusted (08:33:20 instead of 03:33:20 as in previous example)
assert_eq!("2033-05-18T08:33:20", display::array_value_to_string(&d, 1).unwrap());