Jake Cooper

May 28, 2021

How We Work

In a slight deviation of our regular content, and setting tone for a future direction of the blog, we're going to be talking about something a bit different today; how Railway works.

Now, this isn't an engineering blogpost on the engineering challenges behind Railway (there will be many of these in the future I promise). No, this more of meta lookback on how we operate as a company, and why we do the things we do.

I wanted to write this for a few reasons:

I think it's beneficial to document this. For the current team, for new hires, and for the many who come next over the years.
I don't think many companies blog about their operation process, and I've always found it insanely fascinating.

Finally, I think running a company completely remote by default presents a, for lack of a better word, shitload of operational challenges. In the same way you need observability for distributed infrastructure, you need observability for distributed teams.

Side Effect Driven Development

I've always found standups to be remarkably inefficient. You stand in a room with 6-20 people, doze off for a bit, and then get suddenly jarred awake because it's your time to recite whatever you worked on for the previous day.

The goal of a standup is often to keep operational tabs on people. We don't really have that luxury. We're far too small, and if someone requires a watchful eye so to speak, well, they're probably not really a good fit at this time simply because we as a team don't have time to make sure everyone's exactly on task.

This brings us to our first operational point: standups, we don't do em.

No Daily Standups

Railway doesn't perform daily standups. Instead, we have a #daily-wraps channel, where you post what you did the previous day and what you're aiming to do tomorrow

Example

Not only does this mean we remove the need for syncronous meetings (we work across many timezones), but it also means you can breakout into a thread based on what someone's said

This allows us to collaborate, checkin across the team, and also have a timeline of what people have been up to.

This isn't a strict, gotta do it by EOD thing. Quite the contrary; a few members of the team do it at the end of the workday, others prefer the start of the following day.

Minimize Reoccurring Meetings

When you're working in a distributed fashion, it's frankly a pain in the ass to line up meetings. However, you can't just NOT have meetings. As a result, Railway has 3 meetings per week:

Monday Kickoff/Sprint Planning
Tuesday Moonshot/Minutia Jam Session
Friday Wrap/Sprint Retro

These meetings alternate

Sprint Planning: Plan our individual next 2 weeks, as a team
Moonshot Meetings: Discuss large rocks moving forward + galaxy brain ideas
Friday Wrap + Demos: Tweak changelog and see how the sprint is going
Monday Kickoff: Light planning for the week
Minutia Meeting: Iron out current process/flows/UI tweaks/etc
Sprint Retro + Demos: Tweak changelog and wrap the sprint

This provides us with a nice balance of sync and async interactions

Discord As a Central Hub

Sometimes, wraps don't get done that day, or people forget. This is fine because 1) Shit happens and 2) we have automated status updating throughout Discord.

A few other automated action channels:

#buzz: Any arbitrary updates. Commits, stars, updates, etc. High bandwidth channel to see what everyone's working on, as a side effect of their actual work.

#social: Whenever Railway is mentioned on Social, something is posted.

#pull-requests: Any new PRs or comments will end up in Pull Requests.

We have a few more channels, but I want to touch on the ethos of WHY we have those channels versus just talking about all of them. It comes down to what we define as metawork vs real work.

Meta Work vs Real Work

Work can be split up into two distinct categories:

Real Work: Your actual job, and anything required by you to do your job
Metawork: Byproducts of your job, like updating statuses, planning etc

The goal of Railway's process is to Maximize Real Work and Minimize Metawork. Simple right? In theory at least, in practice it's a little bit harder.

The way we do this is by ruthlessly optimizing an action/process via our core OS of SaaS tools:

Discord (Synchronous)
Linear (Imminent Async)
Notion (Indefinite Async)

These three tools provide the baseline to operate all of our company, completely distributed, across the spectrum from synchronous to asynchronous.

Synchronous - Discord

We use Discord for a variety of reasons

It's fun: One of our core ideologies is "Tools with Emotion". We believe that, if you work with shit products, you'll produce shit products. Discord has a fun, emotional element to it that Slack simply lacks.
Bandwidth Augmentation: Text works alright most of the time, but for many conversations, you need a higher bandwidth medium. Discord allows you to jump immedately from text to voice, simply by clicking into a breakout room. No Zoom link, so this happens quite often (Daily). Screenshare built in is another huge plus.
One Location To Rule Them All: Our support as well as all our team comms exist in Discord. This let's us work immediately and directly with users without leaving our central command hub; we're talking to users 24/7.

Now, if we could run the whole company on Discord, we would, but it also fails miserably at a few things. The largest one of these: It's completely ephemeral. Anything you say will be immediately lost to the void. This results in:

The same question being asked many times
People missing context in different timezones

Since we're remote first, we need to preserve that context into another medium. One that's a more durable longterm store of value.

For this purposes we have two async stores:

Immediate - Linear
Indefinite - Notion

Immediate Async - Linear

Linear serves as a personal task queue of tasks which have definite acceptance criteria (e.g "Fix this bug here's how to reproduce", "Build this service off this ERD")

Within Linear there should be no "open ended" tasks, given we use Linear for our two week sprints.

If something can be fixed quickly, it should be resolved when it pops up (In Discord).
If it can be slotted in the sprint, it should be added immediately to the Linear cycle.
If it cannot be prioritized, it needs to either be added to the Notion Roadmap, or we drop it.

Note the last item; we try not to use the backlog. Anything consequential enough should either be resolved now, planned inside of Notion using an Engineering Requirements Document (ERD), or dropped until it comes up later and is deemed immediately pressing or prioritizable.

This allows us to work on exclusively the things we truly, as individuals, think should get done. If we deem it worthy to complete as a team, it'll end up in the Notion.

Indefinite Async - Notion

Notion is our completely indefinite async store for information. It houses:

README: Railway's cultures, values, mission. It's a readme since we expect anybody who joins to edit it and tweak it over time. It should evolve with every hire.
ERDs: Engineering Requirement Documents. Any feature of consequence should have an ERD.
Team: Info about the team and their working hours + timezones. Our bot uses these for various things (pinging oncall, etc)

When someone joins the team, they're given an onboarding scratchpad. Throughout their first week, I'll have daily checkins where we iron out gripes and change the process. In this way, we're constantly improving our workflows while new hires get up to speed

Meetings: Every meeting at Railway requires an itinerary built by the person wanting to have the meeting. This does not apply to adhoc voice meetings
Service Documentation: Service level docs for each service, as well as language guidelines/structure (e.g Our usage of onion routing architecture for all microservices)
Roadmap: Long term planning. Roadmap contains "Galaxy Brain Ideas" as well as "Roadmap TODOs". When people think of stuff and agree on Discord, we add them to the roadmap. Then anybody can look at the roadmap, prioritize, and either create an ERD or move it directly to Linear.
Blog: The official Railway Blog, backed by Notion. This post is written in Notion, and rendered onto blog.railway.app. You can see more about this in our post here.
Changelog: The official Railway Changelog, Backed by Notion. Whenever we ship something of consequence throughout the week, we add it to the changelog. On Fridays, we sit down, pick 3 things, and highlight them.

Now, most of these line items are pretty self explanatory, but I'd like to touch on one in particular: ERDs

Engineering Requirement Documents

At Railway, we write ERDs for anything larger than a small feature. This isn't a strict rule; we trust every person to build quality software, but often we lean towards these ERDs because they're extremely good ways to solicit/collect feedback and answer any open questions.

In general, these ERDs comprise the following parts:

Background: Why is this change important?
Architectural Changes: What changes need to be made and how-ish they'll be done
Open Questions: Questions, either by the owner of the document, or by reviewers
Notes: A misc section on anything that might be helpful
Once an ERD is written, we post it in Discord and is reviewed promptly in the same manner our Pull Requests are reviewed. Once the ERD is clear and stamped by someone, it can be broken up, moved to Linear, and implemented.

Summary

So, in summary, the work cycle for Railway can be broken down into a pretty clean feedback control system:

This system works as we define, as a group, the most relevant/pertinent deadlines. It's agile, with a hint of waterfall, all without any of the PM/management BS. Obviously this won't work forever, but for now, and likely until we 10x our current headcount, it should hold.

In future blogposts, I'll talk about our engineering flows. We have Preview deployments using Railway, as Railway is built on Railway. But that's a post for another time. See ya then!