Intro

Pointers aren’t exclusive to Rust, in fact they’re most synonymous with c programming language.

Pointers

Are variables that contain an address in memory, not a value. Hence the name, pointer.

In chapter 4 we learned about References, denoted using &, which borrow the value they point to.
They simply reference the value, and nothing else, there’s no overhead.
// What's "overhead" mean in this context?

smartpointers on the other hand are a datastructure with metadata and capabilities and is part of the standard library Just like Pointers, Smart Pointers aren’t a Rust exclusive and actually originate with cpp.

Since Rust is all about ownership and borrowing, there’s a difference between references and smart pointers.
While references only borrow data, in many cases smart pointers own the data they point to.

This chapter will cover the relevant traits, general design pattern, and Interior Mutability pattern.

Interior Mutability Pattern

This is a pattern that exposes an API to mutate an interior value for an immutable type.

Structs and Traits

In Rust, smart pointers are usually structs that implement the Deref and Drop traits

  • The Deref trait allows an instance of the smart pointer struct to behave like a reference so that you can write your code to work with either references or smart pointers.
  • The Drop trait allows you to customize the code that’s run when an instance of the smart pointer goes out of scope.
    More on these traits later.

Common Smart Pointers

Smart pointers follow a general design pattern, and many libraries have their implementations.
Common smart pointers that this chapter will cover include:

  • Box<T>, for allocating values on the heap // I've been curious about this
  • Rc<T>, a reference counting type that enables multiple ownership
  • Ref<T> and RefMut<T>, accessed through RefCell<T>, a type that enforces the borrowing rules at runtime instead of compile time

Box type

The most straightforward smart pointer is the Box<T>. It allows for storing data on the Heap instead of the Stack, leaving only the pointer of the heap data on the stack.

  • Data on heap
  • Pointer on stack

There’s no performance overhead, they simply store the data as mentioned, with no extra capabilities.

Common Use Cases

  • When a type’s size isn’t known at compile time and used in a context where it’s size needs to be exact
  • When a large amount of data needs to transfer ownership and ensuring it isn’t copied
  • Owning a value and only caring about its type implementing a particular trait, rather than it being a specific type

Storing Data on Heap

To interact with values stored within a Box<T> we use the following syntax:

fn main(){
	let b = Box::new(5);
	println!("b is {}", b);
}

Now the value of b is allocated and stored on the heap instead of the stack, and the data owned is treated like any other. Once it goes out of scope of the function, it’ll be deallocated, this is both for the box stored on the stack and the data it points to which is on the heap.

That was purely an example, storing individual i32 values on the stack is more appropriate, let’s look at practical uses.

Enabling Recursive Types

A recursive type’s value can have another value of the same type as part of itself. This is an issue since Rust needs to know the space needed at compile time!
Since Boxes have a known size, we can enable recursive types by inserting a box in the type’s definition.

An example of a Recursive Type is the Cons list, which is a common data type in functional languages.

Cons List

A list made of nested pairs, similar to a Linked List. Originates with LISP and its dialects, and is named after the construct function.
For example, here’s a pseudocode representation of a cons list containing the list 1, 2, 3 with each pair in parentheses: (1, (2, (3, Nil)))

Each item is a value and a pair, with the last pair containing only Nil value. The list is built by recursively calling the cons function on these pairs.

Cons lists aren’t super useful in Rust, a Vec<T> is usually better suited for most purposes but it demonstrates recursive types well.

Let’s consider the following Enum defining a cons list.

enum List{
	Cons*(i32, List),
	Nil
}

Note, Nil isn’t a value or type, it’s just an enum variant.

Implementing the example from earlier, the code would be something like

use crate::List::{Cons, Nil} // importing our Enum
 
fn main(){
	let list = Cons(1, Cons(2, Cons(3, Nil)));
}

Each List Enum recursively holds another Cons variant until the last is of the Nil variant signaling the end of the list.

Compiling this code results in an error that’ll say something like :

$ cargo run
   Compiling cons-list v0.1.0 (file:///projects/cons-list)
error[E0072]: recursive type `List` has infinite size
 --> src/main.rs:1:1
 
 # expanded error message...

Rust is unable to determine the size of the type and how much to allocate.

Computing Non-Recursive Type’s Size

When the compiler goes through code and looks at an Enum for example, it’ll calculate the sizeof its largest variant, in terms of allocated space, and account for that.

However, when dealing with Recursive types, each type can potentially hold an infinite number of itself. A pretty clear issue.
An infinite Cons list: a rectangle labeled 'Cons' split into two smaller rectangles. The first smaller rectangle holds the label 'i32', and the second smaller rectangle holds the label 'Cons' and a smaller version of the outer 'Cons' rectangle. The 'Cons' rectangles continue to hold smaller and smaller versions of themselves until the smallest comfortably sized rectangle holds an infinity symbol, indicating that this repetition goes on forever.

Getting a Recursive type of Known Size

Since the compiler struggles to allocate for the recursively define types, the error message from before also drops a hint for us.

help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle
  |
2 |     Cons(i32, Box<List>),
  |               ++++    +

Indirection in this context means instead of storing a value directly, change the data structure to store a pointer to the value instead. I,e, store it on the heap and pointer on the stack…a Box!

Since a Box is a pointer, Rust will always know how much space it needs, pointer sizes don’t change. Placing a box inside the Cons variant will allow for pointing to the next List value that will be on the heap not inside a Cons variant.

Updating the definition

enum List{
	Cons(i32, Box<List>),
	Nil
}

and

enum List {
    Cons(i32, Box<List>),
    Nil,
}
 
use crate::List::{Cons, Nil};
 
fn main() {
    let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}

A Cons variant needs the space for an i32 and a pointer, something the compiler can determine.

Variant structure

Box Summary

Boxes only provide indirection and heap allocation, allowing us to build around types that have unknown sizes at compile time or are recursively defined. There are no performance overheads and no extra capabilities associated with them.

The Box<T> type is a smart pointer because it implements the Deref trait, which allows values to be treated as references. When a box goes out of scope, both the box and the data it points to are cleaned up using the Drop trait which it also implements.

Treating Smart Pointers Like Regular References

When you implement the Deref trait for a type, you can customize the behavior of the dereference operator *. // Sounds about right; and don't confuse it with multiplication and glob operators!

It’s possible to implement this trait in a way that smart pointers are treated like regular references, allowing us to write code that works on both smart pointers and regular references.

Following Reference to Value

References are a type of pointer, one that points to a value in memory. [
Here’s an example of using the dereference operator to follow a ref to its value.

fn main(){
	let x = 3;
	let y = &x;
	
	assert_eq!(3,x);
	assert_eq!(3, *y);
}

The * tells the compiler to follow the pointer (address) stored at y. // same as C
Both assert statements return true, remove the * and you’ll encounter a type error.

Using Box<T> Like a Reference

We can use Box<T> on the previous example and dereferencing would work the same way.

fn main() {
    let x = 5;
    let y = Box::new(x);
 
    assert_eq!(5, x);
    assert_eq!(5, *y);
}

However, this points to a copy of the value of x rather than the same thing. The assertions will be true since they reach the same value eventually despite not being the same instances of that value in memory.

Simple Implementation of Smart Pointer

This is meant to drive the point home regarding smart pointers by implementing our own Box<T> without storing on the heap and only focus on deref trait.

The Box type is ultimately a tuple struct with one element, so that’s what we’ll do.

struct MyBox<T>(T);
 
impl<T> MyBox<T> {
	// just a generator 
	fn new(x: T) -> MyBox<T>{
		MyBox(x)
	}
}

Now let’s use it.

fn main(){
	let x = 5;
	let y = MyBox::new(x);
	
	assert_eq!(5, x);
	assert_eq!(5, *y);
}

Compiling this results in an error because we haven’t implemented the deref trait.

Implementing Deref

This trait’s provided by the Standard library, and it borrows self and returns reference to the value of the inner data.
To implement the trait, we must implement a method with the same name that adheres to the required signature.

use std::ops::Deref;
 
impl<T> Deref for MyBox<T> {
    type Target = T;
 
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

The type Target = T; syntax defines an associated type for the Deref trait to use, something that’ll be expanded on in Learning Rust - Ch. 20.
Returning &self.0 returns a reference to the the value that we need to access using *, since it’s a single element tuple-struct.

“Without the Deref trait, the compiler can only dereference & references. The deref method gives the compiler the ability to take a value of any type that implements Deref and call the deref method to get a reference that it knows how to dereference.”

Now when dereferncing *y from our main function, the compiler can substitute * with a call to our implementation of deref.

Using Deref Coercion in Fn and Methods

Deref Coercion: converts a reference to a type that implements the Deref trait into a reference to another type.
For example converting &String to &str, because String implements a deref that returns &str.

Rust does this as a convenience but it only works on types that implement Deref and is automatic.
It happens when a ref is passed to a particular type’s value as an argument to a function or method that doesn’t match the parameter type in the function or method definition. A sequence of calls to the deref method converts the type provided into the type the parameter needs.

Here’s an example using MyBox<T>.
Take this simple fn

fn hello(name: &str) {
    println!("Hello, {name}!");
}

Now

fn main() {
    let m = MyBox::new(String::from("Rust"));
    hello(&m);
}

Deref coercion makes this work by calling Deref until it gets a type that matches the function definition, in this case &str.

“The (*m) dereferences the MyBox<String> into a String. Then, the & and [..] take a string slice of the String that is equal to the whole string to match the signature of hello.”

With Mutable References

Similar to using Deref trait to override * on immutable references, the DerefMut trait does the same for mutable references.

There are 3 cases where Rust does deref coercion when it finds types and trait implementations:

  1. From &T to &U when T: Deref<Target=U>
  2. From &mut T to &mut U when T: DerefMut<Target=U>
  3. From &mut T to &U when T: Deref<Target=U> // The parameter of the deref method isn't mutable

In the third case, Rust will also coerce a mutable reference -> immutable one. But NOT the reverse.
Because borrowing rules state, if you have a mutable reference, that mutable reference must be the only reference
to that data!

Running on Cleanup with Drop Trait

The Drop trait allows defining what happens when a value is about to be dropped, go out of scope.
The implementation can be on any type and the code can be used to free resources such as files or network connections.

This trait is prevalent in context of Smart Pointers and is used to deallocate memory.
Instead of relying on the programmer to manually free, close, and cleanup resources, in Rust you can specify that behavior using the Drop trait and the compiler will run that code when that type goes out of scope.
The body of the drop method is where you would place any logic that you wanted to run when an instance of your type goes out of scope.

As expected, the Drop trait requires implementing the drop method for a type. The method takes a mutable reference of self.

Here’s an example:

struct CustomSmartPointer {
    data: String,
}
 
impl Drop for CustomSmartPointer {
    fn drop(&mut self) {
        println!("Dropping CustomSmartPointer with data `{}`!", self.data);
    }
}
 
fn main() {
    let c = CustomSmartPointer {
        data: String::from("my stuff"),
    };
    let d = CustomSmartPointer {
        data: String::from("other stuff"),
    };
    println!("CustomSmartPointers created");
}

Compiling and running this code results in:

$ cargo run
   Compiling drop-example v0.1.0 (file:///projects/drop-example)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.60s
     Running `target/debug/drop-example`
CustomSmartPointers created
Dropping CustomSmartPointer with data `other stuff`!
Dropping CustomSmartPointer with data `my stuff`!

// notice the reverse order of the drop? Due to Stacks.

When dealing with locks

Can’t drop a smart pointer to free a lock using drop because Rust doesn’t let you call the Drop trait’s drop method manually; instead, you have to call the std::mem::drop function provided by the standard library.

Manually Dropping

Trying to call drop manually results in:

fn main() {
    let c = CustomSmartPointer {
        data: String::from("some data"),
    };
    println!("CustomSmartPointer created");
    c.drop();
    println!("CustomSmartPointer dropped before the end of main");
}
$ cargo run
   Compiling drop-example v0.1.0 (file:///projects/drop-example)
error[E0040]: explicit use of destructor method

destructormethod is a programming term for a method that frees memory, analogous to constructors.

The reason we can’t call it manually is because it’ll be called automatically by Rust when appropriate, if we call it that’ll lead to a doublefreeerror.

Instead, use the std::mem::drop function is different from the drop method in the Drop trait. We call it by passing as an argument the value we want to force-drop.

fn main() {
    let c = CustomSmartPointer {
        data: String::from("some data"),
    };
    println!("CustomSmartPointer created");
    drop(c);
    println!("CustomSmartPointer dropped before the end of main");
}

Dropping Summary

Code in a Drop trait implementation can be used in many ways to cleanup resource use in a safe and simple way.
It also alleviates any pain resulting from accidental drops since the ownership mode also ensures drop method only gets called once when a value is no longer needed.

The Reference-Counted Smart Pointer

In most cases value ownership is clear and simple, but there are cases where a value is conceptually has multiple owners.
Example: a graph data structure, what if several edges point to the same node? Then all those nodes own that node and it
cannot be deleted unless no edges point to it…

To enable multiple ownership explicitly, use the type Reference Counting Rc<T>. Which keeps track of the number of references to a value, this determines whether it’s in use or not.
When count is zero, then it can be dropped.

Use case

When a need to allocate data on the heap for multiple parts of a program to read, and it’s difficult to determine at compile time which part finishes needing the data last, otherwise the last user would be the owner and the normal rules are followed…
// Concurrency comes to mind, BUT this is only for single-threaded programs.

Cons List example

We discussed Cons Lists earlier when talking about Box type.
But here’s a recap:

Implementing this definition of the list using Box<T> doesn’t work:

enum List {
    Cons(i32, Box<List>),
    Nil,
}
 
use crate::List::{Cons, Nil};
 
fn main() {
    let a = Cons(5, Box::new(Cons(10, Box::new(Nil))));
    let b = Cons(3, Box::new(a));
    let c = Cons(4, Box::new(a));
}

and results in an error, because each variant of the Cons owns its data, so creating b moves a and its data into it. While references may remedy this, that involves lifetime parameters.

But by using Rc<T> instead of Box<T>, each Cons variant can hold a value and an Rc pointing to a List.
Instead of moving a when creating b a clone is made. This results in a total of 3 references of a and the Rc<T> keeps thatr count and won’t clean up until it reaches 0.

use std::rc::Rc;
 
enum List {
    Cons(i32, Rc<List>), // change the type holding the reference
    Nil,
}
 
fn main() {
    let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
    let b = Cons(3, Rc::clone(&a)); // pass ref of a
    let c = Cons(4, Rc::clone(&a));
}

The type is provided by the standard library and needs to be brought into scope

Using Rc::clone and not a.clone()

“We could have called a.clone() rather than Rc::clone(&a), but Rust’s convention is to use Rc::clone in this case. The implementation of Rc::clone doesn’t make a deep copy of all the data like most types’ implementations of clone do. The call to Rc::clone only increments the reference count, which doesn’t take much time.”

The Rc::strong_count method returns the reference count so it’s possible to check how many cloned references the type’s keeping track of at a given moment.

Mutable References

Using mutable references, Rc<T> allows sharing data between multiple parts of a program, but read-only. Otherwise that violates the borrowing rules.

RefCell<T> and the Interior Mutability Pattern

This stuff’s heavy and I won’t bother trying, I read the chapter as is.

However, here’s a recap of the reasons to choose Box<T>, Rc<T>, or RefCell<T>:

  • Rc<T> enables multiple owners of the same data; Box<T> and RefCell<T> have single owners.
  • Box<T> allows immutable or mutable borrows checked at compile time; Rc<T> allows only immutable borrows checked at compile time; RefCell<T> allows immutable or mutable borrows checked at runtime.
  • Because RefCell<T> allows mutable borrows checked at runtime, you can mutate the value inside the RefCell<T> even when the RefCell<T> is immutable.

The example with the Messenger trait is easy to follow.

Memory Leaks from Reference Cycles

This chapter. discusses that while difficult, it’s not impossible to have scenarios where memory is never cleaned up in Rust, resulting in Memory Leaks.

by using Rc<T> and RefCell<T>: It’s possible to create references where items refer to each other in a cycle. This creates memory leaks because the reference count of each item in the cycle will never reach 0, and the values will never be dropped.

Ref Cycles can be prevented using Weak<T> which is a reference type that doesn’t increase the Rc::strong_count thus not affecting the count that leads to cycles. Read more

Summary

This chapter covered a lot of concepts, types, traits, and patterns.
Here’s a highlight

  • The Box<T> type has known size and points to data on the heap
  • The Rc<T> keeps count of references to a value on the heap, allowing multiple owners
  • The RefCell<T> type provides Interior Mutability, allowing us to internally mutate immutable values while maintaining their outward immutability; and enforces borrow rules at runtime
  • The Deref trait allows for implementing dereferencing for custom types where * doesn’t work out of the box, this implementation makes it possible to use * on these types as Rust calls the method deref
  • The Drop trait can be used to define behavior that happens when a value is no longer in use and is cleaned up

For more on Smart Pointers and how to build custom ones, “The Rustonomicon”.