Intro

The standard library has a number of very useful DataStructure known as Collections. A collection contains multiple values, but unlike arrays and tuples these values are stored on the heap. Meaning the amount (size) of data doesn’t need to be known and compile time, i.e. they’re Dynamically Sized.

This chapter will cover 3 different collections.

  • Vectors: allows you to store a variable number of values next to each other.
  • Strings: is a collection of characters. // Mentioned before but more details in this chapter.
  • Hash Maps: allows you to associate a value with a specific key. A specific type of the more general DS, the Map.

Vectors

Vectors in Rust are Vec<T> allows the storing of multiple values of the same data type. It stores the elements next to each other in memory, meaning their addresses in memory are sequential.

Creating a Vector

To instantiate a new empty vector, use Vec::new Example:

let my_vec: Vec<i32> = Vec::new();

When creating a vector, you should annotate the type to make it clear, since both the compiler and readers won’t know what values are expected at this point. ==// It’s also because they’re implemented using Generics.==

If you create a vector with values from the start, then the compiler can infer the type. And that’s why the Rust has a macro to place given values in a new vector.

let another_vec = vec![1,2,3,4];

Adding Values

You can only add values to a mutable variable, vectors are no different. So make your vector mut then use the push/1 method to add a new value to the end of the vector. Ex:

let mut v = Vec::new();
v.push(1);
v.push(2);
 
// v is now [1,2]

Reading Values

There are two ways to get a value from a Vector, regular indexing and the get method.

Indexing is simple, you tap into the vector with the index of the element you want. Ex:

let v = vec![1,2,3,4];
let snd: &i32 = &v[1];

Notice the &, we’re not trying to grab ownership, just a reference.

The get method is similar, you pass the index of the element you want but the return type is Option<T>. Which you can then use with a match statement. Ex:

let v = vec![1,2,3,4];
let fourth: Option<&i32> = v.get(3);
 
match fourth {
    Some(fourth) => println!("The third element is {third}"),
    None => println!("ehh, wrong."),
}

Why both methods?

Rust gives you the freedom of choice to decide how the program behaves. For example, when accessing an out of bounds index, it won’t panic because the match will account for it and return None.

Mutable and Immutable Vector References

A vector cannot have mutable and immutable references in the same scope, for reasons mentioned in Learning Rust - Ch.4 (Ownership, Strings, and Slices). But also because of how they function.

Take this example that doesn’t compile.

    let mut v = vec![1, 2, 3, 4, 5];
    let first = &v[0];
    v.push(6);
    
    println!("The first element is: {first}");

There are two different references in scope here. This doesn’t compile because vectors are dynamic meaning they dynamically allocate heap memory, and since the values are all stored next to each other. So if there isn’t enough space, the new memory blocks that are allocated will need to have all the old values read and copied over so all elements are kept adjacent to one another.

Iterating Over Vector

Using a for loop, we can iterate over a vector to get the values and even mutate them if it’s mutable.

Example:

let v = vec![1, 2, 3, 4, 5];
for i in &v{
	println!("{i}");
}   

Now to mutate

let mut v = vec![6, 29, 33, 4, 25];
for i in &mut v{
	*i += 5;
}

Notice the * on line 3, this is needed to change the value that a mutable ref refers to. It’s the dereference operator *. // More will be covered in the next chapter.

Iterating over a vector, whether immutably or mutably, is safe because of the borrow checker’s rules. Trying to insert or remove items in the for loop bodies results in a compiler error. It’s safe because the reference held by the for loop prevents simultaneous modification of the whole vector.

Enums with Multiple Values

Since a vector can only hold the same type, it’s sometimes not the best to use. When the need arises to bundle values of different types together, we resort back to Enums.

Example:

enum SpreadsheetCell {
	Int(i32),
	Float(f64),
	Text(String),
}
 
let row = vec![
	SpreadsheetCell::Int(3),
	SpreadsheetCell::Text(String::from("blue")),
	SpreadsheetCell::Float(10.12),
];

By creating an Enum for all the spreadsheet’s cell types under a single enum type, we can then store those cell values together in a vector. This would be a vector of type SpreadsheetCell not the respective value types of the cells…clever.

Dropping Values

Similar to a struct, when vectors are freed when out of scope. For example:

{
	let v = vec![1, 2, 3, 4];
 
	// do stuff with v
} // <- v goes out of scope and is freed here

When this happens, the contents are also dropped, removed from heap. The borrow checker will always ensure that the references to the vector are only used while it’s valid and in scope.

API Docs

To find more Vector methods and dive into their definitions, check out the API documentation.


UTF-8 Text With Strings

The UTF-8 encoding is one of many reasons new Rust programmers have trouble with Strings.

String? What is it?

At its core, Rust only has one string type, the str and its borrowed sibling the &str. We covered these in Chapter 4. We mentioned string slices which are refs to UTF-8 encoded string data stored somewhere else. String literals are stored in the program’s binary and therefore are string slices…

The type String is provided by the standard library rather than being a core type of the language. It’s a growable, mutable, owned, UTF-8 encoded string type. Meaning it’s just another collection provided by the library, akin to a custom data structure.

Creating a String

We already know the String::new function that creates a new string that data’s then loaded into. But we can also turn literals into strings using the .to_string method or the function String::from which accepts a str.

let s = "initial contents".to_string();
let r = String::from("initial contents");

The reason there are many ways to do the same thing is a mix of preferences, niche uses, edge-cases, etc. No matter, they all have their place and time.

Since Strings are UTF-8 encoded, we can do things like:

    let hello = String::from("السلام عليكم");
    let hello = String::from("Dobrý den");
    let hello = String::from("Hello");
    let hello = String::from("שלום");
    let hello = String::from("नमस्ते");
    let hello = String::from("こんにちは");
    let hello = String::from("안녕하세요");
    let hello = String::from("你好");
    let hello = String::from("Olá");
    let hello = String::from("Здравствуйте");
    let hello = String::from("Hola");

Updating

Strings are dynamic. To push data we can use the + operator or use the format! macro to concatenate them.

Appending to a String

There are two ways to do this, with push_str and push.

Grow a String by using the push_str method to append a string slice. Ex:

    let mut s = String::from("foo");
    s.push_str("bar");

The push_str method takes a string slice because we don’t necessarily want to take ownership of the parameter.

Whereas the push method takes a single character as a parameter and appends it.

let mut s = String::from("lo");
s.push('l');

Concatenation

As mentioned, either the + operator or use the format! macro work for this purpose.

let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2;

Note: that on line 3, s1 is moved and can no longer be used, whereas s2 can. The reason is because of the signature of the method used when we use +, it’s the add method.

fn add(self, s: &str) -> String

This method is defined with generics, hence why the + operator is overloaded. To read more on this, check out this section.

The problem with the + operator for concatenation arises when we have multiple strings to join together.

let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
 
let s = s1 + "-" + &s2 + "-" + &s3;

This is hard to read and therefore potentially unpredictable, it’s best to use the macro.

The format! macro is similar to an f-string in Python, it allows us to format the string visually and safely.

let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
 
let formatted = format!("{s1}-{s2}-{s3}");

This macro functions similar to println! but instead of writing to the console, it returns ownership of the new string to the variable.

Indexing

Unlike other programming languages, accessing a character by indexing in Rust will yield an error. Try it, and see the error.

This is because of how Strings are represented internally, they wrap around Vec<u8>, a vector of unsigned 8-bit integers. Ex:

let hello = String::from("Hola");

In this case, len will be 4, which means the vector storing the string "Hola" is 4 bytes long. Each of these letters takes one byte (8 bits) when encoded in UTF-8.

But look at this one

let hello = String::from("Здравствуйте");

This example’s actually 24 characters as far as Rust’s concerned, because that’s the number of bytes it takes to encode it, because in UTF-8, because each Unicode scalar value in that string takes 2 bytes of storage. This is why indexing doesn’t always work and is unsafe, so Rust prohibits it.

Bytes, Scalar Values, & Grapheme Clusters

This is a super interesting topic into how UTF-8 encodes some strings from non-latin text, but I won’t go into it. Check out [this](in UTF-8, because each Unicode scalar value in that string takes 2 bytes of storage.).

Slicing Strings

Since indexing is not the best idea, as we cannot always know the return type… If there’s a need to use indices with strings, then it’s best to use them to grab a slice. This way use [] with a range to create a string slice containing particular bytes: like so:

let hello = "اهلا";
let s = &hello[0..3];

In this example, s is a &str not a String and will contain the first 3 bytes of the characters (upper exclusive).

Iterating Over Strings

For the reasons mentioned, it’s best to be specific when iterating over strings whether you want to iterate over characters or bytes.

  • Individual scalar values .chars method on the string slice.
  • For bytes .bytes method.

Examples

for c in "Зд".chars() {
    println!("{c}");
}
 
// З д
for b in "Зд".bytes() {
    println!("{b}");
}
 
// 208 151 208 180
 

Valid Unicode scalar values can also be comprised of more than one byte!

Summary

They summarize the reasoning behind their design choices for Rust’s UTF-8 approach here.


Hash Maps and key-Value Pairs

Like all Hash Tables and Hashmaps the ones in Rust, HashMap<k,v>, also store values associated with a key using a hashing function.

Creating a HashMap

Like most collections there’s new method, the HashMap::new lets you create a new and empty hashmap.

Example:

use std::collections::HashMap;
 
let mapu = HashMap::new();
 
mapu.insert(String::from("bob"), 10);

In this example we create a new map and see the method used to add values to it, the insert(k,v) method.

Reading Values

Reading a value from a hashmap is super simple. The get method takes a key and returns the value associated with it, but there’s some things to consider… Example:

let name = String::from("bob");
let numz = mapu.get(&name).copied().unwrap_or(0);

The get method returns an Option, in this example it’s Option<&i32> to be exact, so what if it’s None? The unwrap_or method handles that by giving it a value to return, 0 in this case.

By calling copied on the key that's passed to get method, the returned type is an Option<i32> rather than an Option<&i32>...

Iterating Over Keys

To iterate over all the key-value pairs in a hashmap, a rather common functionality, we do the following:

use std::collections::HashMap;
 
let mut scores = HashMap::new();
 
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
 
for (key, value) in &scores {
	println!("{key}: {value}");
}

Worth noting, that the order of the key-value pairs printed in the previous example is arbitrary…

Updating Hashmaps

Instead of a dedicated update method, the insert method will overwrite the value associated with a key if it already exists. This is permitted because all keys are unique in a hashmap by definition…

If you want to check if a key doesn’t exist before deciding whether to insert (overwrite) or keep the value, a combination of the entry and or_insert methods come in handy.

For example

use std::collections::HashMap;
 
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10); // inserts 10
 
scores.entry(String::from("Yellow")).or_insert(50); // inserts 50
scores.entry(String::from("Blue")).or_insert(50); // doesn't insert because entry exists...
 
println!("{scores:?}");

“The or_insert method on Entry is defined to return a mutable reference to the value for the corresponding Entry key if that key exists, and if not, it inserts the parameter as the new value for this key and returns a mutable reference to the new value. This technique is much cleaner than writing the logic ourselves and, in addition, plays more nicely with the borrow checker.”


Summary

Vectors, Strings, and HashMaps are all collections that provide great functionality and different uses. Each comes with various methods that allow us to do handy things for managing data within them.

Exercises to try

  1. Given a list of integers, use a vector and return the median (when sorted, the value in the middle position) and mode (the value that occurs most often; a hash map will be helpful here) of the list.
  2. Convert strings to pig latin. The first consonant of each word is moved to the end of the word and ay is added, so first becomes irst-fay. Words that start with a vowel have hay added to the end instead (apple becomes apple-hay). Keep in mind the details about UTF-8 encoding!
  3. Using a hash map and vectors, create a text interface to allow a user to add employee names to a department in a company; for example, “Add Sally to Engineering” or “Add Amir to Sales.” Then let the user retrieve a list of all people in a department or all people in the company by department, sorted alphabetically.