Rewriting the CLI in Rust: Was It Worth It?

This is the story of how we did a CLI rewrite for the second time in as many years and brought a new CLI to production powered by Rust. Hope you brought your glasses, because we are going to be reading some code. Let’s get started.


Introduction

If you’re new around here, Railway is an infrastructure company that provides instant application deployments to developers.

Before working here, I was an active member of the Railway community helping others out on the Discord server and contributing to open source repos such as Nixpacks. When I joined full time last month as a Support Engineer, I immediately turned my attention to the CLI experience, which … wasn’t the best.

I’d already worked on the CLI in the past via random ad-hoc PRs — but this was to solve my own personal fires, not company ones.

It turns out the company fires are quite a bit more involved and interesting. Pulling down the previous CLI, I noticed a number of issues:

  • Lack of documentation: The CLI was extremely feature rich but largely undocumented. As a result, wrangling routes to port would have been an immense undertaking
  • Impending API changes: With the release of the Public API, some routes on the CLI would be deprecated or replaced
  • Major assumption changes: The current version of the CLI was made before Railway introduced the concept of multiple services, which meant that some workflows didn’t 1:1 map with how developers were using it. As an example, railway up would always prompt you with the service to up which would quickly get annoying. In the early days of Railway the main goal of the CLI then was to inject environment variables to your local DB development experience. This changed.

With these pain points in mind, I spent some time using git blame to understand the rationale behind prior decisions.

Rewrite or Rehabilitate?

API changes meant that routes from the undocumented API were due for deprecation. So whatever we chose to do, I had better not break the following:

  • Existing use cases such as GitHub Actions/CircleCI should work exactly the same as before
  • The user experience should be the same or better than the previous CLI
  • All commands should be fully documented
  • All endpoints used should be part of the Public API

We then had two available paths: either I could take on a refactor (which would be the third refactor), or I could rewrite the CLI from the ground up.

We spent some time looking into keeping the codebase but ran into some interesting quirks with the Go language:

  • GraphQL types are separate from queries/mutations in Go, so for a scenario in which you’d forget to update types, you’d run into issues. Because we were undergoing a major API overhaul, we wanted to make sure that we would keep type safety moving forward
  • The CLI parser library we were using in Go (Cobra) defined commands dynamically using functions that were called at init. In layman’s terms, this allowed you to grab an arg that didn’t exist, which caused problems

One last thing that made it difficult to deal with the existing CLI was that in the GraphQL lib we were using, our queries and mutations were just defined in strings. You can see why that might be a problem.

It became clear to me that a rewrite would be a better use of time.

A Ferrous Proposition

There are a number of reasons a Rust rewrite was appealing:

  • The new Public API places an outsized focus on normalization of types
  • Rust has great hooks into strongly typed GraphQL queries
  • Rust has libraries that provide UX improvements in the CLI out of the box, like tables, loading states, and error handling
  • Rust’s focus on DevEx philosophically fits with Railway’s mindset
  • Rust is fun to write!

Vercel came to some similar conclusions recently with Turborepo:

For us, it's worth using tools that prioritize up-front correctness.

As Railway matures, this is proving to be the case for us as well. As one example, Rust makes it possible to generate types and validate queries and mutations against the API’s schema (which can be introspected). Data validation errors begone!

Migrating the Repo

After getting my RFC stamped, I branched and got to work on CLI v3.

CLI v3 defines commands as structs, and each command and its arguments struct are scoped to a file. This allows for better organization and better isolation. Let’s look at some code.

Go Command Handling

In the old version of the CLI, we had an init.go where the commands would be initiated on start of the CLI. This was fine at the time, but if you needed to work with command logic, you later found yourself with four open file panes trying to maintain your contextual sanity.

func init() {
	// Initializes all commands
	handler := cmd.New()

	rootCmd.PersistentFlags().BoolP("verbose", "v", false, "Print verbose output")

	loginCmd := addRootCmd(&cobra.Command{
		Use:   "login",
		Short: "Login to your Railway account",
		RunE:  contextualize(handler.Login, handler.Panic),
	})
	loginCmd.Flags().Bool("browserless", false, "--browserless")

	addRootCmd(&cobra.Command{
		Use:   "whoami",
		Short: "Get the current logged in user",
		RunE:  contextualize(handler.Whoami, handler.Panic),
	})
	... 
}

As an example of too much context switching, in CLI v2 you had to set flags the following way:

variablesSetCmd := &cobra.Command{
		Use:     "set key=value",
		Short:   "Create or update the value of a variable",
		RunE:    contextualize(handler.VariablesSet, handler.Panic),
		Args:    cobra.MinimumNArgs(1),
		Example: "  railway variables set NODE_ENV=prod NODE_VERSION=12",
	}
	variablesCmd.AddCommand(variablesSetCmd)
	variablesSetCmd.Flags().StringP("service", "s", "", "Fetch variables accessible to a specific service")
	variablesSetCmd.Flags().Bool("skip-redeploy", false, "Skip redeploying the specified service after changing the variables")
	variablesSetCmd.Flags().Bool("replace", false, "Fully replace all previous variables instead of updating them")
	variablesSetCmd.Flags().Bool("yes", false, "Skip all confirmation dialogs")

Flags are defined after the function to create the command that’s called, and they are not associated with a struct, so there is no type safety.

Many CI users who extended the CLI for their own needs would complain that we didn’t do much to enforce the parameters since you could try to parse an argument which did not exist.

serviceName, err := req.Cmd.Flags().GetString("service")
if err != nil {
	return err
}

This meant that every time you wanted to retrieve a flag or argument, you had to account for an error in case that argument was not defined. I think I counted 120 err != nil guards in all the controllers. Not great.

Rust Command Handling

In CLI v3, commands, arguments, and logic are all in the same file. This means that the entry point of the CLI takes a list of exported modules and gives you type safe access to those commands. (Which in theory, means you can make an SDK with these modules...)

/// Interact with 🚅 Railway via CLI
#[derive(Parser)]
#[clap(author, version, about, long_about = None)]
#[clap(propagate_version = true)]
pub struct Args {
    #[clap(subcommand)]
    command: Commands,

    /// Output in JSON format
    #[clap(global = true, long)]
    json: bool,
}

// Generates the commands based on the modules in the commands directory
// Specify the modules you want to include in the commands_enum! macro
commands_enum!(
    add,
    completion,
    delete,
    domain,
		...
);

#[tokio::main]
async fn main() -> Result<()> {
    let cli = Args::parse();

    Commands::exec(cli).await?;

    Ok(())
}

Maybe it’s not the most exciting thing in the world, but now we were getting a bunch of free improvements to DevEx and user experience. Let’s break it down.

To start, we swapped out the old CLI argument parser for clap, a macro-based library for Rust which does a bit of *magic*.

/// Interact with 🚅 Railway via CLI

As seen above, clap encodes help text and about menus in docstrings. These are basically “fancy” comments indicated by a triple slash. In this case, we set the global CLI help text to “Interact with 🚅 Railway via CLI."

#[derive(Parser)]
#[clap(author, version, about, long_about = None)]
#[clap(propagate_version = true)]

The Parser derivation is what actually constructs our argument parser! It takes a variety of arguments, which are defined in the #[clap(...)] lines. Here we ask clap to display the author, version, and about text from above when railway --help is run. We also ask it to infer the version from Cargo.toml.

#[clap(subcommand)]
command: Commands,

Next up are all the subcommands. These are what actually do ... the things.
We define them in an
enum (fancy typed set of things), which happens next in the commands_enum!.

// Generates the commands based on the modules in the commands directory
// Specify the modules you want to include in the commands_enum! macro
commands_enum!(
    add,
    completion,
    delete,
    domain,
		...
);

The commands_enum! invokes a Rust macro. Macros can be thought of as little code machines -- they take in some input and output Rust code, which is then inserted into the file where the macro was invoked.
This macro specifically consumes the subcommand modules (from the
src/commands directory) and spits out an enum which describes each of their command() methods.
Take a moment to think about how this was handled in Go … not great, right?

/// Output in JSON format
#[clap(global = true, long)]
json: bool,

Lastly, we have global argument(s), with more coming soon to a Cloud near you. These apply to every subcommand as well as the global railway command.

…So What?

Yeah, type safety is nice, but as they say, “Users don’t see your code.” Rewriting our CLI in Rust doesn’t mean anything to you the user unless you get something from it.

We think we’ve delivered a far better developer experience with the better patterns and libraries we were able to get from Rust.

For example:

Clean tables

We strategically yoinked (scientific terminology) code from Nixpacks, another Rust project we maintain, to allow us to display your environment variables in a fancy and easy-to-read table!

CLI v3 introduces clean and readable tables

CLI v3 introduces clean and readable tables

Menu UI

Rust happens to have a wonderful ecosystem of libraries and the absolutely stellar inquire library provides nice primitives to build a fuzzy select menu with ~3 lines of code.

CLI v3 has dramatically better menu navigation

CLI v3 has dramatically better menu navigation

Global flags

Since clap makes it trivial to add global arguments, we added a global --json flag to output clean, boring JSON for scripts and the like.

Non-interactive mode

Building on the scripting support from earlier, we now detect if you’re running in CI/CD and automatically confirm any Y/n prompts while requiring flags to replace interactive input from the user.

Was it Worth it?

This re-write was not without its challenges. It required an organized effort from the whole team to generate a clean, documented, and friendly v3 CLI interface.

Surprisingly, one of the most unexpectedly difficult parts of the release was perfecting the CI/CD pipeline. This involved making sure we got every user’s target (some of you guys are running the CLI on your phones??), and updating the install.sh script to work with Rust binaries.

When we announced the CLI transition and preview, we didn’t realize the amount of different use cases people bringing to the CLI. This led to weeks of added testing. We even broke some things along the way like our CI, and, regretfully, some other people’s CI.

In the end, we’re really happy with the initial CLI v3 that we’re shipping to general availability today. It provides a much richer user experience while incorporating the new Public API and is easier to work with because it’s better documented.

We’re also excited for the doors this opens for future development. In particular we’re excited about building an interactive dashboard that updates in realtime using the tui crate. This would mean a live metrics dashboard in your terminal! We’d also like to expand this notion to provide a full birds-eye view of your services on Railway.

Additionally, we now support log streaming in the CLI which means we can potentially add support for the CLI as a log forwarding agent to something like Vector. And the close integration with the Public API opens the door to creating a Terraform provider for Railway…

There are so many possibilities open to us now.

Getting Started

If you’d like to try the new CLI v3, read the docs to find out how to upgrade using Brew, NPM, cURL, or Scoop.

And as always, let us know what you think or drop us a line if you run into any issues.