Avatar of Jake CooperJake Cooper

Scaling Railway: Roadmap

If you don’t know Railway, we’re an infrastructure company that provides instant application deployments to developers.

The key to our platform is speed. If you want to spin up a database or a job queuing service, you can do that on Railway in a few seconds. Here’s a real-time example of spinning up Postgres.

Creating a new Postgres database in under 10 seconds

Creating a new Postgres database in under 10 seconds

Since we started two years ago, our growth has been up and to the right.

We are powering production deployments for 200,000+ people, with thousands more joining every single day.

Railway company user metrics

Railway company user metrics

Reasonable minds might disagree about why we’ve seen such tremendous growth, but to me it’s because we are absolutely hell-bent on making infrastructure — maybe THE shittiest DX of the entire application stack — feel like magic.

We’re two years in and databases continue to be the most popular reason people come to Railway — and for good reason! We don’t think there’s ever been a faster way to provision a database on the internet.

As for the growth, we’re really happy that people are taken with our approach, but we’re now facing what you might call a BIG problem.

We need to take a product that is beloved for a fairly narrow use case and scale it into an essential piece of internet infrastructure.

In a series of blogposts over the next few weeks, we’re going to take you behind the scenes into how we’re scaling Railway to be the infrastructure company we’ve all always wanted — from Platform to Infrastructure, to Support and Comms, and everything else in between.

This week we’re starting with the Railway Roadmap.

The Railway Roadmap in 2020 was very simple: let users spin-up a database in a few seconds. That was basically it. It didn’t need to be complicated, but it had to feel like magic.

Since we founded Railway to make it trivial for engineers to deploy code without thinking about servers, we started with databases. Where else was there a stronger desire to avoid thinking about servers? It was a perfect fit.

Once we created the v0 of Railway, we started getting hammered with questions from users looking to expand their use cases.

  • “Can you make this deploy my Django app?”
  • “How do I deploy a node server onto this?”
  • “When can I host [insert static web framework here] on Railway?”

Good point, getting a database was just the first step. Next we needed to add a way to wire-up a service to your database.

A few months later, in true startup fashion, we bolted a service builder onto the v0.

An early (2020) version of Railway Deployments

An early (2020) version of Railway Deployments

There was a lot of jank, but the result was what we wanted. It was now possible to spin-up a database with a service alongside. We were getting somewhere!

This is what it looked like when we first added Redis support. You might notice that we now had the concept of a Plugin, which is essentially a database with some vars.

An early (2020) implementation of adding a Redis caching service to a deployment

An early (2020) implementation of adding a Redis caching service to a deployment

Once we added support for a single service, we started getting hammered by our users once again.

  • “This is great, but I’ve got a Celery worker + web server that I can’t deploy”
  • “Now I need to add a caching service to this project, how can I do that?”
  • “Can you help me configure a messaging queue on top of the service I just deployed?”

Our numbers were cranking but we didn’t have a great answer for our users within the product.

For a while we just told people to run multiple processes in the same service, but that was a horrible experience for a variety of reasons:

  • Interspersing logs from multiple processes became unreadable
  • Metrics became useless/blurred together
  • Processes couldn’t scale independently

We’d planned from the beginning to let users add more than one service to Railway and it was now clear that we had to bring this capability out in a big way.

To implement multiple services, we introduced the Railway Canvas at the beginning of 2022.

A current version of a Railway Project

A current version of a Railway Project

The Canvas allows Railway users to express increasingly more complex infrastructure concepts.

You know how before we just magically deployed your database? Now if you gave us a repo, we could magically deploy the services within it!

This is the current state of the product, more or less, and we couldn’t be happier with the current level of user engagement.

Though by now it may come as no surprise, we’ve again been getting hammered by users since our latest release:

  • “We need a region in US-East/EU/Asia”
  • “We need to be able to deploy something complicated, like ClickHouse/Temporal/etc”
  • “We need different configurations of infrastructure for each of our teams/members/etc”

If you can spot the trend at this point, it’s that every time we release something, our users start clamoring for more and more and more complexity.

So this is the problem we face. We exist because shipping software is too complicated. We are obsessed with keeping it simple and making infrastructure feel like magic. This approach has brought us hundreds of thousands of users who spend a lot of time asking for more complexity!

Our answer to this and many more conundrums, is Railway v2.

For Railway v2, which is what we’re calling the next leap forward in the capabilities of our product, we’re going long on three dimensions to cover all of the use cases our users demand:

  • Service depth - persistent storage, scaling, generalizable config
  • Environment breadth - regions, private networking, variable/config/dependency management
  • Project height - collaboratively evolving applications over time, across environments

First we supported databases, then we supported databases with a single service, then multiple services — and now we’re making the jump to deploy vastly more complicated applications.

These are the big ticket items we’ll be shipping in Q1 and Q2:

  • Private networking
  • Multi-plugin support
  • Horizontal scaling
  • Persistent storage
  • Regions
  • Jobs

We’ll have succeeded with v2 when two things are true:

  1. A user can come to the platform and push a button to deploy ANY service(s) or application(s) of arbitrary complexity, wired seamlessly into their existing infrastructure
  2. Complexity isn’t just “magically hidden” but provided incrementally, as layers, just in time and as the user needs it through intuitive interfaces

In Railway v2, a user needs to be able to give the command “Deploy Temporal/ClickHouse/Supabase/etc” and receive a running-and-wired version of the software alongside the company’s existing infrastructure, in a few seconds.

No VPC peering, no 30-second rainbow wheel. Instantly. And with zero upfront config.

All of the power; none of the pain.

To accomplish this, not only do we need to be able to provision systems of arbitrary complexity, we need to:

  • Synchronize/Apply changes across environments, with overrides (prod, staging, dev, etc)
  • Allow users to tap into additional features such as networking + storage to express more complex applications
  • Scale up to millions, and eventually billions, of users

It’s a tall order, but the alternative of settling for the current status quo is even more daunting.

When you look back and think of the most rewarding moments of your life, is comparing m5d.2xlarge vs n2-standard-2 one of them? What about spending an hour digging around in the IAM settings panel to find that 1-in-a-million setting you’re looking for? Us neither.

Developers should spend less time on infrastructure, and more time building products.

For the next generation of great companies to be built on the internet, they need a tool that helps them do just that.

We believe Railway will be that tool.

If we do our job correctly and Railway becomes one of the best ways people build and deploy software, it will be because we’ve successfully figured out how to make today’s infrastructure standards look massively cumbersome by comparison.

In 2023, we’ll be shipping the features required to bring our vision into existence, from introducing horizontal scaling to opening our first regions to stacking persistent storage — and a lot more. We’ll be blogging about it, alongside, in real-time, for the whole world to see.

In the rest of this series we’ll tell you about what we’re doing across the company to power Railway and our next phase of growth. We’ll also tell stories from our users who have been on a tear building applications on Railway.

Thanks for riding along with us, we’re happy to have you onboard.

P.S Sound interesting? We’re hiring.