rust smart pointers

Smart pointers are data structures that not only act like a pointer but also have additional metadata and capabilities.

In Rust, which uses the concept of ownership and borrowing, and additional difference between references and smart pointers is that references are pointers that only borrow data; in contrast, in many cases, smart pointers own the data they point to.

Some common smart pointers in the standard library:

  • Box<T> for allocating values on the heap
  • Rc<T>, a reference counting type that enables multiple ownership
  • Ref<T> and RefMut<T>, accessed through RefCell<T>, a type that enforces the borrowing rules at runtime instead of compile time

Using Box<T> to point to data on the heap

Use Box to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.

Mostly used in these situations:

  • When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size
  • When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so
  • When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type
// storing an i32 value on the heap using a box
fn main() {
    let b = Box::new(5);
    println!("b = {}", b);
}

When a box goes out of scope, as b does at the end of main, it will be deallocated. The deallocation happens for the box (stored on the stack) and the data it points (stored on the heap).

Enabling recursive types with boxes

// definition of `List` that uses `Box<T>` in order to have a known size
enum List {
    Cons(i32, Box<List>),
    Nil,
}
 
use crate::List::{Cons, Nil};
 
fn main() {
    let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}

Deref trait

Implementing the Deref trait allows you to customize the behavior of the deference operator, *. By implementing Deref in such a way that a smart pointer can be treated like a regular reference, you can write code that operates on references and use that code with smart pointers too.

Defining our own smart pointer

use std::ops::Deref;
 
impl<T> Deref for MyBox<T> {
    // defines an associated type for the Deref trait to use
    type Target = T;
 
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}
 
struct MyBox<T>(T);
 
impl<T> MyBox<T> {
    fn new(x: T) -> MyBox<T> {
        MyBox(x)
    }
}
 
fn main() {
    let x = 5;
    let y = MyBox::new(x);
 
    assert_eq!(5, x);
    // behind the scene, Rust actually run `*(y.deref())`
    assert_eq!(5, *y);
}

Deref coercion

Deref coercion is a convenience that Rust performs on arguments to functions and methods.

Deref coercion works only on types that implement the Deref trait.

use std::ops::Deref;
 
impl<T> Deref for MyBox<T> {
    type Target = T;
 
    fn deref(&self) -> &T {
        &self.0
    }
}
 
struct MyBox<T>(T);
 
impl<T> MyBox<T> {
    fn new(x: T) -> MyBox<T> {
        MyBox(x)
    }
}
 
fn hello(name: &str) {
    println!("Hello, {}!", name);
}
 
fn main() {
    let m = MyBox::new(String::from("Rust"));
    // &m is a reference to `MyBox<String>` value
    hello(&m);
    // without deref coersion:
    hello(&(*m)[..]);
}

Deref coercion with mutability

You can use the DerefMut trait to override the * operator on mutable references.

Rust does deref coercion when it finds types and trait implementations in three cases:

  • From &T to &U when T: Deref<Target=U>
  • From &mut T to &mut U when T: DerefMut<Target=U>
  • From &mut T to &U when T: Deref<Target=U>

/!\ Immutable references will never coerce to mutable references.

Running code on cleanup with the Drop trait

In Rust, you can specify that a particular bit of code be run whenever a value goes out of scope, and the compiler will insert this code automatically.

struct CustomSmartPointer {
    data: String,
}
 
impl Drop for CustomSmartPointer {
    fn drop(&mut self) {
        println!("Dropping CustomSmartPointer with data `{}`!", self.data);
    }
}
 
// program output:
// $ cargo run
//    Compiling drop-example v0.1.0 (file:///projects/drop-example)
//     Finished dev [unoptimized + debuginfo] target(s) in 0.60s
//      Running `target/debug/drop-example`
// CustomSmartPointers created.
// Dropping CustomSmartPointer with data `other stuff`!
// Dropping CustomSmartPointer with data `my stuff`!
fn main() {
    let c = CustomSmartPointer {
        data: String::from("my stuff"),
    };
    let d = CustomSmartPointer {
        data: String::from("other stuff"),
    };
    println!("CustomSmartPointers created.");
}

/!\ Variables are dropped in the reverse order of their creation!

Dropping a value early with std::mem::drop

Occasionally, you might want to clean up a value early, e.g. release a lock.

We are not allowed to explicitly call drop.

struct CustomSmartPointer {
    data: String,
}
 
impl Drop for CustomSmartPointer {
    fn drop(&mut self) {
        println!("Dropping CustomSmartPointer with data `{}`!", self.data);
    }
}
 
fn main() {
    let c = CustomSmartPointer {
        data: String::from("some data"),
    };
    println!("CustomSmartPointer created.");
    // no need to import the std::mem::drop as the function is in the prelude, i.e. set brought into
    // the scope of every program
    drop(c);
    println!("CustomSmartPointer dropped before the end of main.");
}

Rc<T> the reference counted smart pointer

There are cases when a single value might have multiple owners, e.g. in graph data structures, multiple edges might point to the same node, and that node is conceptually owned by all of the edges that point to it.

To enable multiple ownership, Rust has a type called Rc<T>, i.e. Reference Counting. It keeps track of the number of references to a value to determine whether or not the value is still in use. If there are zero references to a value, the value can be cleaned up without any references becoming invalid.

enum List {
    Cons(i32, Rc<List>),
    Nil,
}
 
use crate::List::{Cons, Nil};
use std::rc::Rc;
 
fn main() {
    let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
    // Rust convention: use `Rc::clone`` instead of `a.clone`
    // because it will only increments the reference count, which
    // doesn't take much time, whereas `a.clone` may make a deep
    // copy of all the data in most types' implementations of clone
    let b = Cons(3, Rc::clone(&a));
    let c = Cons(4, Rc::clone(&a));
}
enum List {
    Cons(i32, Rc<List>),
    Nil,
}
 
use crate::List::{Cons, Nil};
use std::rc::Rc;
 
// $ cargo run
//    Compiling cons-list v0.1.0 (file:///projects/cons-list)
//     Finished dev [unoptimized + debuginfo] target(s) in 0.45s
//      Running `target/debug/cons-list`
// count after creating a = 1
// count after creating b = 2
// count after creating c = 3
// count after c goes out of scope = 2
fn main() {
    let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
    println!("count after creating a = {}", Rc::strong_count(&a));
    let b = Cons(3, Rc::clone(&a));
    println!("count after creating b = {}", Rc::strong_count(&a));
    {
        let c = Cons(4, Rc::clone(&a));
        println!("count after creating c = {}", Rc::strong_count(&a));
    }
    println!("count after c goes out of scope = {}", Rc::strong_count(&a));
}

RefCell<T> and the interior mutability pattern

With references and Box<T>, the borrowing rules’ invariants are enforced at compile time. With RefCell<T>, these invariants are enforced at runtime. With references, if you break these rules, you’ll get a compiler error. With RefCell<T>, if you break these rules, your program will panic and exit.

The RefCell<T> type is useful when you’re sure your code follows the borrowing rules but the compiler is unable to understand and guarantee that.

There are situations in which it would be useful for a value to mutate itself in its methods but appear immutable to other code.

Some example: mock objects.

Keeping track of borrows at runtime with RefCell<T>

When creating immutable and mutable references, we use the & and &mut syntax, respectively. With RefCell<T>, we use the borrow and borrow_mut methods, which are part of the safe API that belongs to RefCell<T>. The borrow method returns the smart pointer type Ref<T>, and borrow_mut returns the smart pointer type RefMut<T>. Both types implement Deref, so we can treat them like regular references.

impl Messenger for MockMessenger {
    fn send(&self, message: &str) {
        let mut one_borrow = self.sent_messages.borrow_mut();
        let mut two_borrow = self.sent_messages.borrow_mut();
 
        one_borrow.push(String::from(message));
        two_borrow.push(String::from(message));
    }
}
// $ cargo test
//    Compiling limit-tracker v0.1.0 (file:///projects/limit-tracker)
//     Finished test [unoptimized + debuginfo] target(s) in 0.91s
//      Running unittests (target/debug/deps/limit_tracker-e599811fa246dbde)
// 
// running 1 test
// test tests::it_sends_an_over_75_percent_warning_message ... FAILED
// 
// failures:
// 
// ---- tests::it_sends_an_over_75_percent_warning_message stdout ----
// thread 'main' panicked at 'already borrowed: BorrowMutError', src/lib.rs:60:53
// note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
// 
// 
// failures:
//     tests::it_sends_an_over_75_percent_warning_message
// 
// test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
// 
// error: test failed, to rerun pass '--lib'

The code panicked at runtime.

Having multiple owners of mutable data

A common way to use RefCell<T> is in combination with Rc<T>. You can get a value that can have multiple owners and that you can mutate.

#[derive(Debug)]
enum List {
    Cons(Rc<RefCell<i32>>, Rc<List>),
    Nil,
}
 
use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;
 
// $ cargo run
//    Compiling cons-list v0.1.0 (file:///projects/cons-list)
//     Finished dev [unoptimized + debuginfo] target(s) in 0.63s
//      Running `target/debug/cons-list`
// a after = Cons(RefCell { value: 15 }, Nil)
// b after = Cons(RefCell { value: 3 }, Cons(RefCell { value: 15 }, Nil))
// c after = Cons(RefCell { value: 4 }, Cons(RefCell { value: 15 }, Nil))
fn main() {
    let value = Rc::new(RefCell::new(5));
 
    let a = Rc::new(Cons(Rc::clone(&value), Rc::new(Nil)));
 
    let b = Cons(Rc::new(RefCell::new(3)), Rc::clone(&a));
    let c = Cons(Rc::new(RefCell::new(4)), Rc::clone(&a));
 
    *value.borrow_mut() += 10;
 
    println!("a after = {:?}", a);
    println!("b after = {:?}", b);
    println!("c after = {:?}", c);
}

Reference cycles can leak memory

use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;
 
#[derive(Debug)]
enum List {
    Cons(i32, RefCell<Rc<List>>),
    Nil,
}
 
impl List {
    fn tail(&self) -> Option<&RefCell<Rc<List>>> {
        match self {
            Cons(_, item) => Some(item),
            Nil => None,
        }
    }
}
 
fn main() {
    let a = Rc::new(Cons(5, RefCell::new(Rc::new(Nil))));
 
    println!("a initial rc count = {}", Rc::strong_count(&a));
    println!("a next item = {:?}", a.tail());
 
    let b = Rc::new(Cons(10, RefCell::new(Rc::clone(&a))));
 
    println!("a rc count after b creation = {}", Rc::strong_count(&a));
    println!("b initial rc count = {}", Rc::strong_count(&b));
    println!("b next item = {:?}", b.tail());
 
    if let Some(link) = a.tail() {
        *link.borrow_mut() = Rc::clone(&b);
    }
 
    println!("b rc count after changing a = {}", Rc::strong_count(&b));
    println!("a rc count after changing a = {}", Rc::strong_count(&a));
 
    // Uncomment the next line to see that we have a cycle;
    // it will overflow the stack
    // println!("a next item = {:?}", a.tail());
    // because Rust will try to print this cycle with `a` pointing
    // to `b` point to `a` and so forth until it overflows the
    // stack.
 
    //                b
    //                |
    //                v
    // a -> [5, ] -> [10, ]
    //         ^         |
    //         |         |
    //         +_________+
}

Preventing reference cycles: turning a Rc<T> into a Weak<T>

Rc::clone increases the strong_count of a Rc<T> instance. A Rc<T> instance is only cleaned up if its strong_count is 0.

You can create a weak reference to the value within a Rc<T> instance by calling Rc::downgrade and passing a reference to the Rc<T>. When you call Rc::downgrade, you get a smart pointer of type Weak<T>. Instead of increasing the strong_count in the Rc<T> instance by 1, it will increase the weak_count by 1. The difference with strong_count is that it doesn’t need to be 0 for the Rc<T> instance to be cleaned up.

Hence, weak references don’t express an ownership relationship.

Because the value that Weak<T> references might have been dropped, you must make sure the value still exists. You can do this by calling the upgrade method on a Weak<T> instance, which will return an Option<Rc<T>>.

use std::cell::RefCell;
use std::rc::{Rc, Weak};
 
#[derive(Debug)]
struct Node {
    value: i32,
    parent: RefCell<Weak<Node>>,
    children: RefCell<Vec<Rc<Node>>>,
}
 
// leaf parent = Some(Node { value: 5, parent: RefCell { value: (Weak) },
// children: RefCell { value: [Node { value: 3, parent: RefCell { value: (Weak) },
// children: RefCell { value: [] } }] } })
fn main() {
    let leaf = Rc::new(Node {
        value: 3,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![]),
    });
 
    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
 
    let branch = Rc::new(Node {
        value: 5,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![Rc::clone(&leaf)]),
    });
 
    *leaf.parent.borrow_mut() = Rc::downgrade(&branch);
 
    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
}