Typestate builder pattern in Rust

31 May 2024

Greetings fellow Rustaceans 🦀!

Today, I’ll try to describe how you can leverage the typestate pattern alongside the well-known (and beloved) builder pattern to enforce compile-time checks, ensuring that the builder is used correctly. I’ll try to keep it as straightforward as possible.

The Builder Pattern

If you’ve used Rust for a while, chances are you’ve come accross the builder pattern. It’s a staple in many popular crates such as, for example, reqwest, serde and clap to name a few.

Essentially the builder pattern is a design pattern that provides a flexible and clear way to construct complex objects. This pattern is especially useful when an object must be created with a specific configuration of many possible parameters, some of which may be optional. In the builder pattern, instead of using numerous constructors, the object to be built is constructed using a separate builder object. It’s typical for a builder object to have chaining methods (these methods set up configuration parameters and return the same builder object to allow for method chaning) and a build method (after all the desired parameters are set, this method is called to construct the final object).

Consider this simple example struct that we intend to construct with a dedicated builder. Keep in mind that this is just a concise demonstration; typically, a builder is more suited to constructing larger structs where its benefits become more apparent.

struct ServiceEntry {
    name: String,
    enabled: bool,
    logs_max_size: u128,
}

impl ServiceEntry {
    pub fn report(&self) {
        println!("Service details:");
        println!("name: {}", self.name);
        println!("enabled: {}", self.enabled);
        println!("logs max size: {}", self.logs_max_size);
    }
}

We have a ServiceEntry with a few properties and a single report method that just prints them.

Now let’s define a builder for it.

struct ServiceEntryBuilder {
    name: Option<String>,
    enabled: bool,
    logs_max_size: u128,
}

Typically a builder will have a constructor with some default values, then we have the chaining methods and finally the build method.

impl ServiceEntryBuilder {
    pub fn new() -> Self {
        Self {
            name: None,
            enabled: true,
            logs_max_size: 10 * 1024,
        }
    }

    pub fn name(mut self, name: impl Into<String>) -> Self {
        self.name = Some(name.into());
        self
    }

    pub fn enabled(mut self, enabled: bool) -> Self {
        self.enabled = enabled;
        self
    }

    pub fn logs_max_size(mut self, logs_max_size: u128) -> Self {
        self.logs_max_size = logs_max_size;
        self
    }

    pub fn build(self) -> ServiceEntry {
        ServiceEntry {
            name: self.name.unwrap(),
            enabled: self.enabled,
            logs_max_size: self.logs_max_size,
        }
    }
}

We want to consume (or take ownership of) self in the build method in order to ensure that it cannot be used once the method has been called. Also notice that for name we take “anything that implements Into<String>” (isn’t Rust cool?!).

So far so good - pretty simple.

Here is how our builder is used:

fn main() {
    let service = ServiceEntryBuilder::new()
        .enabled(false)
        .logs_max_size(20 * 1024)
        .name("my-service")
        .build();

    service.report();
}

Prints:

$ cargo run -q
Service details:
name: my-service
enabled: false
logs max size: 20480

Now, if you’re a keen observer you’ve already spotted a pretty substantial issue in the build method. Namely the .unwrap() on name.

impl ServiceEntryBuilder {
    // ...

    pub fn build(self) -> ServiceEntry {
        ServiceEntry {
            name: self.name.unwrap(), // Can panic if it's not set. Not good!
            enabled: self.enabled,
            logs_max_size: self.logs_max_size,
        }
    }
}

If we omit the name and try to build again our application will panic.

fn main() {
    let service = ServiceEntryBuilder::new()
        .enabled(false)
        .logs_max_size(20 * 1024)
        // .name("my-service")
        .build();

    service.report();
}

Prints:

$ cargo run -q
thread 'main' panicked at src/main.rs:50:29:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Wouldn’t it be cool if we can force name to be provided on compile time instead of panicking during runtime?

We can do that using the type system.

The Typestate Pattern

We need a way to prevent build to be called unless a name has been provided. So let’s create a new type to represent this state for us:

struct NoName;

Next we will make our builder to be generic over the type we want to enforce.

struct ServiceEntryBuilder<T> {
    name: T,
    enabled: bool,
    logs_max_size: u128,
}

Next, let’s move the implementation with the methods for enabled and logs to work with any type of name. Essentially we don’t want to change them, so we move them to the most generic implementation of the builder.

impl<T> ServiceEntryBuilder<T> {
    pub fn enabled(mut self, enabled: bool) -> Self {
        self.enabled = enabled;
        self
    }

    pub fn logs_max_size(mut self, logs_max_size: u128) -> Self {
        self.logs_max_size = logs_max_size;
        self
    }
}

Now, we move our constructor to a concrete implementation that has name be of type NoName (think of it as starting with the NoName type “state”).

impl ServiceEntryBuilder<NoName> {
    pub fn new() -> Self {
        Self {
            name: NoName,
            enabled: true,
            logs_max_size: 10 * 1024,
        }
    }
}

We want to express (using our type system), that after a user has provided us with a name we “transition” to a different type “state” (you can think of it as a sort of a state machine) that now has a name. Let’s see how this can be done.

impl ServiceEntryBuilder<NoName> {
    pub fn new() -> Self {
        Self {
            name: NoName,
            enabled: true,
            logs_max_size: 10 * 1024,
        }
    }

    pub fn name(self, name: impl Into<String>) -> ServiceEntryBuilder<String> {
        ServiceEntryBuilder {
            name: name.into(),
            enabled: self.enabled,
            logs_max_size: self.logs_max_size,
        }
    }
}

Notice that we now don’t return Self from name(), but instead return a new concrete implementation of ServiceEntryBuilder over String, which looks like:

impl ServiceEntryBuilder<String> {
    pub fn build(self) -> ServiceEntry {
        ServiceEntry {
            name: self.name,
            enabled: self.enabled,
            logs_max_size: self.logs_max_size,
        }
    }

    pub fn name(mut self, name: impl Into<String>) -> Self {
        self.name = name.into();
        self
    }
}

In this new implementation we can now guarantee that name is set and is of type String, so we got rid of the ugly .unwrap(). Awesome!

On the usage side everything still works as expected:

fn main() {
    let service = ServiceEntryBuilder::new()
        .enabled(false)
        .logs_max_size(256 * 1024)
        .name("my-service")
        .build();

    service.report();
}

Here is the output:

$ cargo run -q
Service details:
name: my-service
enabled: false
logs max size: 262144

However, let’s see the cool part. If we omit name we now get a compiler error:

fn main() {
    let service = ServiceEntryBuilder::new()
        .enabled(false)
        .logs_max_size(256 * 1024)
        // .name("my-service")
        .build();

    service.report();
}

If we run cargo run again we get the following error:

$ cargo run -q
error[E0599]: no method named `build` found for struct `ServiceEntryBuilder<NoName>` in the current scope
  --> src/main.rs:76:10
   |
20 |   struct ServiceEntryBuilder<T> {
   |   ----------------------------- method `build` not found for this struct
...
72 |       let service = ServiceEntryBuilder::new()
   |                     --------------------------
   |                     |
   |  ___________________method `build` is available on `ServiceEntryBuilder<NoName>`
   | |
73 | |         .enabled(false)
   | |          -------------- method `build` is available on `ServiceEntryBuilder<NoName>`
74 | |         .logs_max_size(256 * 1024)
75 | |         // .name("my-service")
76 | |         .build();
   | |         -^^^^^ method not found in `ServiceEntryBuilder<NoName>`
   | |_________|
   |
   |
   = note: the method was found for
           - `ServiceEntryBuilder<String>`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `builder-typestate` (bin "builder-typestate") due to 1 previous error

The Rust compiler has the best error messages, it even suggests that we can find the build method in the implementation of the builder over String. The reason is of course, because, indeed we don’t have a build method on the implementation over NoName, which is exactly what we want. Isn’t that cool?!

Furthermore we didn’t lose the flexibility of our builder, we can still have the chaining methods in any order we want and we can re-use them to our liking.

fn main() {
    let service = ServiceEntryBuilder::new()
        .enabled(false)
        .name("my-service")
        .logs_max_size(20 * 1024)
        .name("me-again")
        .enabled(true)
        .logs_max_size(10 * 1024)
        .build();

    service.report();
}

Works and outputs:

$ cargo run -q
Service details:
name: me-again
enabled: true
logs max size: 10240

Conclusion

To wrap things up, the builder pattern is a real game changer when you’re dealing with complex object setups in Rust. It not only simplifies your code but also keeps it clean and easy to manage. When you throw the typestate pattern into the mix, things get even better. This combo enforces an order to the building process, catching slip-ups during compile time instead of leaving them for runtime surprises.

Happy coding!