Crate arc_swap

Expand description

Making Arc itself atomic

The library provides a type that is somewhat similar to what RwLock<Arc<T>> is or Atomic<Arc<T>> would be if it existed, optimized for read-mostly update-seldom scenarios, with consistent performance characteristics.

Motivation

There are many situations in which one might want to have some data structure that is often read and seldom updated. Some examples might be a configuration of a service, routing tables, snapshot of some data that is renewed every few minutes, etc.

In all these cases one needs:

Being able to read the current value of the data structure, fast.
Using the same version of the data structure over longer period of time ‒ a query should be answered by a consistent version of data, a packet should be routed either by an old or by a new version of the routing table but not by a combination, etc.
Perform an update without disrupting the processing.

The first idea would be to use RwLock<T> and keep a read-lock for the whole time of processing. Update would, however, pause all processing until done.

Better option would be to have RwLock<Arc<T>>. Then one would lock, clone the Arc and unlock. This suffers from CPU-level contention (on the lock and on the reference count of the Arc) which makes it relatively slow. Depending on the implementation, an update may be blocked for arbitrary long time by a steady inflow of readers.

static ROUTING_TABLE: Lazy<RwLock<Arc<RoutingTable>>> = Lazy::new(|| {
    RwLock::new(Arc::new(RoutingTable))
});

fn process_packet(packet: Packet) {
    let table = Arc::clone(&ROUTING_TABLE.read().unwrap());
    table.route(packet);
}

The ArcSwap can be used instead, which solves the above problems and has better performance characteristics than the RwLock, both in contended and non-contended scenarios.

static ROUTING_TABLE: Lazy<ArcSwap<RoutingTable>> = Lazy::new(|| {
    ArcSwap::from_pointee(RoutingTable)
});

fn process_packet(packet: Packet) {
    let table = ROUTING_TABLE.load();
    table.route(packet);
}

Type aliases

The most interesting types in the crate are the ArcSwap and ArcSwapOption (the latter similar to Atomic<Option<Arc<T>>>). These are the types users will want to use.

Note, however, that these are type aliases of the ArcSwapAny. While that type is the low-level implementation and usually isn’t referred to directly in the user code, all the relevant methods (and therefore documentation) is on it.

Atomic orderings

Each operation on the ArcSwapAny type callable concurrently (eg. load, but not into_inner) contains at least one SeqCst atomic read-write operation, therefore even operations on different instances have a defined global order of operations.

Less usual needs

There are some utilities that make the crate useful in more places than just the basics described above.

The load_signal_safe method can be safely used inside unix signal handlers (it is the only one guaranteed to be safe there).

The Cache allows further speed improvements over simply using load every time. The downside is less comfortable API (the caller needs to keep the cache around). Also, a cache may keep the older version of the value alive even when it is not in active use, until the cache is re-validated.

The access module (and similar traits in the cache module) allows shielding independent parts of application from each other and from the exact structure of the whole configuration. This helps structuring the application and giving it access only to its own parts of the configuration.

Finally, the gen_lock module allows further customization of low-level locking/concurrency details.

Performance characteristics

There are several performance advantages of ArcSwap over RwLock.

Lock-free readers

All the read operations are always lock-free. Most of the time, they are actually wait-free, the notable exception is the first load access in each thread (across all the instances of ArcSwap), as it sets up some thread-local data structures.

Whenever the documentation talks about contention in the context of ArcSwap, it talks about contention on the CPU level ‒ multpile cores having to deal with accessing the same cache line. This slows things down (compared to each one accessing its own cache line), but an eventual progress is still guaranteed and the cost is significantly lower than parking threads as with mutex-style contention.

Unfortunately writers are not lock-free. A reader stuck (suspended/killed) in a critical section (few instructions long in case of load) may block a writer from completion. Nevertheless, a steady inflow of new readers nor other writers will not block the writer.

Speeds

The base line speed of read operations is similar to using an uncontended Mutex. However, load suffers no contention from any other read operations and only slight ones during updates. The load_full operation is additionally contended only on the reference count of the Arc inside ‒ so, in general, while Mutex rapidly loses its performance when being in active use by multiple threads at once and RwLock is slow to start with, ArcSwap mostly keeps its performance even when read by many threads in parallel.

Write operations are considered expensive. A write operation is more expensive than access to an uncontended Mutex and on some architectures even slower than uncontended RwLock. However, it is faster than either under contention.

There are some (very unscientific) benchmarks within the source code of the library.

The exact numbers are highly dependant on the machine used (both absolute numbers and relative between different data structures). Not only architectures have a huge impact (eg. x86 vs ARM), but even AMD vs. Intel or two different Intel processors. Therefore, if what matters is more the speed than the wait-free guarantees, you’re advised to do your own measurements.

Further speed improvements may be gained by the use of the Cache.

Consistency

The combination of wait-free guarantees of readers and no contention between concurrent loads provides consistent performance characteristics of the synchronization mechanism. This might be important for soft-realtime applications (the CPU-level contention caused by a recent update/write operation might be problematic for some hard-realtime cases, though).

Choosing the right reading operation

There are several load operations available. While the general go-to one should be load, there may be situations in which the others are a better match.

The load usually only borrows the instance from the shared ArcSwap. This makes it faster, because different threads don’t contend on the reference count. There are two situations when this borrow isn’t possible. If the content gets changed, all existing Guards are promoted to contain an owned instance. The promotion is done by the writer, but the readers still need to decrement the reference counts of the old instance when they no longer use it, contending on the count.

The other situation derives from internal implementation. The number of borrows each thread can have at each time (across all Guards) is limited. If this limit is exceeded, an onwed instance is created instead.

Therefore, if you intend to hold onto the loaded value for extended time span, you may prefer load_full. It loads the pointer instance (Arc) without borrowing, which is slower (because of the possible contention on the reference count), but doesn’t consume one of the borrow slots, which will make it more likely for following loads to have a slot available. Similarly, if some API needs an owned Arc, load_full is more convenient.

There’s also load_signal_safe. This is the only method guaranteed to be safely usable inside a unix signal handler. It has no advantages outside of them, so it makes it kind of niche one.

Additionally, it is possible to use a Cache to get further speed improvement at the cost of less comfortable API and possibly keeping the older values alive for longer than necessary.

Examples

extern crate arc_swap;
extern crate crossbeam_utils;

use std::sync::Arc;

use arc_swap::ArcSwap;
use crossbeam_utils::thread;

fn main() {
    let config = ArcSwap::from(Arc::new(String::default()));
    thread::scope(|scope| {
        scope.spawn(|_| {
            let new_conf = Arc::new("New configuration".to_owned());
            config.store(new_conf);
        });
        for _ in 0..10 {
            scope.spawn(|_| {
                loop {
                    let cfg = config.load();
                    if !cfg.is_empty() {
                        assert_eq!(**cfg, "New configuration");
                        return;
                    }
                }
            });
        }
    }).unwrap();
}

Features

The weak feature adds the ability to use arc-swap with the Weak pointer too, through the ArcSwapWeak type. The needed std support is stabilized in rust version 1.45 (as of now in beta).

Internal details

The crate uses a hybrid approach of stripped-down hazard pointers and something close to a sharded spin lock with asymmetric read/write usage (called the generation lock).

Further details are described in comments inside the source code and in two blog posts:

Limitations

This currently works only for Sized types. Unsized types have „fat pointers“, which are twice as large as the normal ones. The AtomicPtr doesn’t support them. One could use something like AtomicU128 for them. The catch is this doesn’t exist and the difference would make it really hard to implement the debt storage/stripped down hazard pointers.

A workaround is to use double indirection:

// This doesn't work:
// let data: ArcSwap<[u8]> = ArcSwap::new(Arc::from([1, 2, 3]));

// But this does:
let data: ArcSwap<Box<[u8]>> = ArcSwap::from_pointee(Box::new([1, 2, 3]));