rust smart pointers
Smart pointers are data structures that not only act like a pointer but also have additional metadata and capabilities.
In Rust, which uses the concept of ownership and borrowing, and additional difference between references and smart pointers is that references are pointers that only borrow data; in contrast, in many cases, smart pointers own the data they point to.
Some common smart pointers in the standard library:
Box<T>
for allocating values on the heapRc<T>
, a reference counting type that enables multiple ownershipRef<T>
andRefMut<T>
, accessed throughRefCell<T>
, a type that enforces the borrowing rules at runtime instead of compile time
Using Box<T>
to point to data on the heap
Use Box
to store data on the heap rather than the stack.
What remains on the stack is the pointer to the heap data.
Mostly used in these situations:
- When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size
- When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so
- When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type
// storing an i32 value on the heap using a box
fn main() {
let b = Box::new(5);
println!("b = {}", b);
}
When a box goes out of scope, as b
does at the end of main
, it
will be deallocated. The deallocation happens for the box (stored
on the stack) and the data it points (stored on the heap).
Enabling recursive types with boxes
// definition of `List` that uses `Box<T>` in order to have a known size
enum List {
Cons(i32, Box<List>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}
Deref
trait
Implementing the Deref
trait allows you to customize the behavior
of the deference operator, *
. By implementing Deref
in such a
way that a smart pointer can be treated like a regular reference,
you can write code that operates on references and use that code
with smart pointers too.
Defining our own smart pointer
use std::ops::Deref;
impl<T> Deref for MyBox<T> {
// defines an associated type for the Deref trait to use
type Target = T;
fn deref(&self) -> &Self::Target {
&self.0
}
}
struct MyBox<T>(T);
impl<T> MyBox<T> {
fn new(x: T) -> MyBox<T> {
MyBox(x)
}
}
fn main() {
let x = 5;
let y = MyBox::new(x);
assert_eq!(5, x);
// behind the scene, Rust actually run `*(y.deref())`
assert_eq!(5, *y);
}
Deref coercion
Deref coercion is a convenience that Rust performs on arguments to functions and methods.
Deref coercion works only on types that implement the Deref
trait.
use std::ops::Deref;
impl<T> Deref for MyBox<T> {
type Target = T;
fn deref(&self) -> &T {
&self.0
}
}
struct MyBox<T>(T);
impl<T> MyBox<T> {
fn new(x: T) -> MyBox<T> {
MyBox(x)
}
}
fn hello(name: &str) {
println!("Hello, {}!", name);
}
fn main() {
let m = MyBox::new(String::from("Rust"));
// &m is a reference to `MyBox<String>` value
hello(&m);
// without deref coersion:
hello(&(*m)[..]);
}
Deref coercion with mutability
You can use the DerefMut
trait to override the *
operator on
mutable references.
Rust does deref coercion when it finds types and trait implementations in three cases:
- From
&T
to&U
whenT: Deref<Target=U>
- From
&mut T
to&mut U
whenT: DerefMut<Target=U>
- From
&mut T
to&U
whenT: Deref<Target=U>
/!\ Immutable references will never coerce to mutable references.
Running code on cleanup with the Drop
trait
In Rust, you can specify that a particular bit of code be run whenever a value goes out of scope, and the compiler will insert this code automatically.
struct CustomSmartPointer {
data: String,
}
impl Drop for CustomSmartPointer {
fn drop(&mut self) {
println!("Dropping CustomSmartPointer with data `{}`!", self.data);
}
}
// program output:
// $ cargo run
// Compiling drop-example v0.1.0 (file:///projects/drop-example)
// Finished dev [unoptimized + debuginfo] target(s) in 0.60s
// Running `target/debug/drop-example`
// CustomSmartPointers created.
// Dropping CustomSmartPointer with data `other stuff`!
// Dropping CustomSmartPointer with data `my stuff`!
fn main() {
let c = CustomSmartPointer {
data: String::from("my stuff"),
};
let d = CustomSmartPointer {
data: String::from("other stuff"),
};
println!("CustomSmartPointers created.");
}
/!\ Variables are dropped in the reverse order of their creation!
Dropping a value early with std::mem::drop
Occasionally, you might want to clean up a value early, e.g. release a lock.
We are not allowed to explicitly call drop
.
struct CustomSmartPointer {
data: String,
}
impl Drop for CustomSmartPointer {
fn drop(&mut self) {
println!("Dropping CustomSmartPointer with data `{}`!", self.data);
}
}
fn main() {
let c = CustomSmartPointer {
data: String::from("some data"),
};
println!("CustomSmartPointer created.");
// no need to import the std::mem::drop as the function is in the prelude, i.e. set brought into
// the scope of every program
drop(c);
println!("CustomSmartPointer dropped before the end of main.");
}
Rc<T>
the reference counted smart pointer
There are cases when a single value might have multiple owners, e.g. in graph data structures, multiple edges might point to the same node, and that node is conceptually owned by all of the edges that point to it.
To enable multiple ownership, Rust has a type called Rc<T>
, i.e.
Reference Counting. It keeps track of the number of references
to a value to determine whether or not the value is still in use.
If there are zero references to a value, the value can be cleaned
up without any references becoming invalid.
enum List {
Cons(i32, Rc<List>),
Nil,
}
use crate::List::{Cons, Nil};
use std::rc::Rc;
fn main() {
let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
// Rust convention: use `Rc::clone`` instead of `a.clone`
// because it will only increments the reference count, which
// doesn't take much time, whereas `a.clone` may make a deep
// copy of all the data in most types' implementations of clone
let b = Cons(3, Rc::clone(&a));
let c = Cons(4, Rc::clone(&a));
}
enum List {
Cons(i32, Rc<List>),
Nil,
}
use crate::List::{Cons, Nil};
use std::rc::Rc;
// $ cargo run
// Compiling cons-list v0.1.0 (file:///projects/cons-list)
// Finished dev [unoptimized + debuginfo] target(s) in 0.45s
// Running `target/debug/cons-list`
// count after creating a = 1
// count after creating b = 2
// count after creating c = 3
// count after c goes out of scope = 2
fn main() {
let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
println!("count after creating a = {}", Rc::strong_count(&a));
let b = Cons(3, Rc::clone(&a));
println!("count after creating b = {}", Rc::strong_count(&a));
{
let c = Cons(4, Rc::clone(&a));
println!("count after creating c = {}", Rc::strong_count(&a));
}
println!("count after c goes out of scope = {}", Rc::strong_count(&a));
}
RefCell<T>
and the interior mutability pattern
With references and Box<T>
, the borrowing rules’ invariants are
enforced at compile time. With RefCell<T>
, these invariants are
enforced at runtime. With references, if you break these rules,
you’ll get a compiler error. With RefCell<T>
, if you break these
rules, your program will panic and exit.
The RefCell<T>
type is useful when you’re sure your code follows
the borrowing rules but the compiler is unable to understand and
guarantee that.
There are situations in which it would be useful for a value to mutate itself in its methods but appear immutable to other code.
Some example: mock objects.
Keeping track of borrows at runtime with RefCell<T>
When creating immutable and mutable references, we use the &
and
&mut
syntax, respectively. With RefCell<T>
, we use the borrow
and borrow_mut methods, which are part of the safe API that belongs
to RefCell<T>
. The borrow
method returns the smart pointer type
Ref<T>
, and borrow_mut
returns the smart pointer type
RefMut<T>
. Both types implement Deref
, so we can treat them
like regular references.
impl Messenger for MockMessenger {
fn send(&self, message: &str) {
let mut one_borrow = self.sent_messages.borrow_mut();
let mut two_borrow = self.sent_messages.borrow_mut();
one_borrow.push(String::from(message));
two_borrow.push(String::from(message));
}
}
// $ cargo test
// Compiling limit-tracker v0.1.0 (file:///projects/limit-tracker)
// Finished test [unoptimized + debuginfo] target(s) in 0.91s
// Running unittests (target/debug/deps/limit_tracker-e599811fa246dbde)
//
// running 1 test
// test tests::it_sends_an_over_75_percent_warning_message ... FAILED
//
// failures:
//
// ---- tests::it_sends_an_over_75_percent_warning_message stdout ----
// thread 'main' panicked at 'already borrowed: BorrowMutError', src/lib.rs:60:53
// note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
//
//
// failures:
// tests::it_sends_an_over_75_percent_warning_message
//
// test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
//
// error: test failed, to rerun pass '--lib'
The code panicked at runtime.
Having multiple owners of mutable data
A common way to use RefCell<T>
is in combination with Rc<T>
.
You can get a value that can have multiple owners and that you can
mutate.
#[derive(Debug)]
enum List {
Cons(Rc<RefCell<i32>>, Rc<List>),
Nil,
}
use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;
// $ cargo run
// Compiling cons-list v0.1.0 (file:///projects/cons-list)
// Finished dev [unoptimized + debuginfo] target(s) in 0.63s
// Running `target/debug/cons-list`
// a after = Cons(RefCell { value: 15 }, Nil)
// b after = Cons(RefCell { value: 3 }, Cons(RefCell { value: 15 }, Nil))
// c after = Cons(RefCell { value: 4 }, Cons(RefCell { value: 15 }, Nil))
fn main() {
let value = Rc::new(RefCell::new(5));
let a = Rc::new(Cons(Rc::clone(&value), Rc::new(Nil)));
let b = Cons(Rc::new(RefCell::new(3)), Rc::clone(&a));
let c = Cons(Rc::new(RefCell::new(4)), Rc::clone(&a));
*value.borrow_mut() += 10;
println!("a after = {:?}", a);
println!("b after = {:?}", b);
println!("c after = {:?}", c);
}
Reference cycles can leak memory
use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;
#[derive(Debug)]
enum List {
Cons(i32, RefCell<Rc<List>>),
Nil,
}
impl List {
fn tail(&self) -> Option<&RefCell<Rc<List>>> {
match self {
Cons(_, item) => Some(item),
Nil => None,
}
}
}
fn main() {
let a = Rc::new(Cons(5, RefCell::new(Rc::new(Nil))));
println!("a initial rc count = {}", Rc::strong_count(&a));
println!("a next item = {:?}", a.tail());
let b = Rc::new(Cons(10, RefCell::new(Rc::clone(&a))));
println!("a rc count after b creation = {}", Rc::strong_count(&a));
println!("b initial rc count = {}", Rc::strong_count(&b));
println!("b next item = {:?}", b.tail());
if let Some(link) = a.tail() {
*link.borrow_mut() = Rc::clone(&b);
}
println!("b rc count after changing a = {}", Rc::strong_count(&b));
println!("a rc count after changing a = {}", Rc::strong_count(&a));
// Uncomment the next line to see that we have a cycle;
// it will overflow the stack
// println!("a next item = {:?}", a.tail());
// because Rust will try to print this cycle with `a` pointing
// to `b` point to `a` and so forth until it overflows the
// stack.
// b
// |
// v
// a -> [5, ] -> [10, ]
// ^ |
// | |
// +_________+
}
Preventing reference cycles: turning a Rc<T>
into a Weak<T>
Rc::clone
increases the strong_count
of a Rc<T>
instance.
A Rc<T>
instance is only cleaned up if its strong_count
is 0.
You can create a weak reference to the value within a Rc<T>
instance by calling Rc::downgrade
and passing a reference to the
Rc<T>
. When you call Rc::downgrade
, you get a smart pointer of
type Weak<T>
. Instead of increasing the strong_count
in the
Rc<T>
instance by 1, it will increase the weak_count
by 1.
The difference with strong_count
is that it doesn’t need to be 0
for the Rc<T>
instance to be cleaned up.
Hence, weak references don’t express an ownership relationship.
Because the value that Weak<T>
references might have been
dropped, you must make sure the value still exists.
You can do this by calling the upgrade
method on a Weak<T>
instance, which will return an Option<Rc<T>>
.
use std::cell::RefCell;
use std::rc::{Rc, Weak};
#[derive(Debug)]
struct Node {
value: i32,
parent: RefCell<Weak<Node>>,
children: RefCell<Vec<Rc<Node>>>,
}
// leaf parent = Some(Node { value: 5, parent: RefCell { value: (Weak) },
// children: RefCell { value: [Node { value: 3, parent: RefCell { value: (Weak) },
// children: RefCell { value: [] } }] } })
fn main() {
let leaf = Rc::new(Node {
value: 3,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![]),
});
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
let branch = Rc::new(Node {
value: 5,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![Rc::clone(&leaf)]),
});
*leaf.parent.borrow_mut() = Rc::downgrade(&branch);
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
}