Avatar of Ray ChenRay Chen

Support Engineering Is Engineering

We wrote about our journey scaling support at Railway in Scaling Railway: Serving 250k Developers with One Support Engineer. We’ve since doubled our total users to >500K and added another Support Engineer to help us scale.

The article above has garnered widespread attention and questions from our users and peers in the technology landscape — we take a unique approach to Support Engineering that isn’t usually seen anywhere else, so we’re writing this to give you a deeper look into how we approach Support Engineering at Railway.

Let’s jump into it!

Support Engineering is Engineering

Support Engineers at Railway are full-stack engineers with a passion for increasing the IOPS between human interactions. Support Engineering at its heart is a networking issue, but instead of shipping packets to computers, we ship conversations to humans.

As Support Engineers, we build interfaces between humans and systems. At large, we focus on bringing structure to the lossy chaotic nature of how humans interact, and we concern ourselves with how information can be transmitted efficiently.

The “OSI Model” of Support Engineering

The “OSI Model” of Support Engineering

Example: Community Help Escalation

Our Discord’s #help channel receives many questions daily. Some of these questions need to be flagged to Railway team members, so we built a command for our awesome Community Moderators to bring conversations to our attention:

Flagging community help threads to Railway team members

Flagging community help threads to Railway team members

Our Community Moderators have the power to use a /team command that will sync the Discord thread into our support tool, Plain:

Flagged community help threads in our support tool, Plain.com

Flagged community help threads in our support tool, Plain.com

Users should never have to reach out for support

Think of support as a try/catch block:

Support as a try/catch block

Support as a try/catch block

If your code is triggering the catch block frequently, you’re either doing something wrong (in your product) or throwing an error where you shouldn’t (unclear documentation, etc.)

Support exists to catch users falling through the cracks of our product. If n number of users reach out asking about issue y, we know they’re not going to be the last ones to ask about it. We take those conversations with users and turn them into core product improvements where possible, advise them on workarounds if not, point them to a documentation page if it exists (and create it if it does not), and so on and so forth.

By treating every support outreach as an opportunity for us to improve the product, we repair any cracks in the product, and ensure that other users in similar scenarios get a soft landing instead of a hard crash. There should be no reason you need support from us — the product should support you sufficiently.

…but if they do, they must be able to do it easily

Nobody likes filling out a multi-step form going through many drop-down selectors. People rarely enjoy conversational AI support chatbots, because they’re neither conversational nor helpful (how many of them are just decision trees masquerading as AI presented as buttons making you click through prompts?)

Humans desire to talk to each other; we’re communal creatures.

At Railway, engineers talk to users directly. We have a Support Rotation where different team members are responsible for answering support queries, on a weekly rotating basis. Support Engineers only work the support queue when they are on rotation for a week at a time.

Members on rotation include non-technical staff, such as our Head of Operations, and also includes product area experts such as our Product and Platform Engineers. If you’ve ever reached out for support, you’ll notice you could be talking to different people depending on the week — this is why!

The rotating support function frees up our Support Engineers to build lasting product improvements and improve future rotation experience for all.

Example: Documentation

We educate users on how Railway works through our docs, and also build it ourselves:

Our documentation is open source, allowing anyone to contribute to it.

Example: Support Panel

We built the Support Panel in the Railway Dashboard to centralize all available resources for users:

Support Panel in Railway Dashboard

Support Panel in Railway Dashboard

Alongside that, we build forms to enable us to resolve user issues quicker, such as refund requests:

Refund form on Railway

Refund form on Railway

Submitting a refund request shows up on our end with all information we need to process the request:

Refund request via our support tool, Plain.com

Refund request via our support tool, Plain.com

This enabled us to process refunds within seconds of receiving them in most cases, instead of minutes.

It’s also about supporting the team

Our users are not just Railway end-users. We also support the organization by steamrolling manual work. Any painful manual task or category of issues gets operationalized and automated to some reasonable degree, whether in the form of workflow improvements or custom internal tooling.

Example: Single pane of glass for Incident Handling

We work in Discord, and all our communications happen within our Discord Server in private channels. Tabbing out to our status page during an incident to provide updates is time-consuming and requires context-switching.

So, we built a /incident create|update|resolve command that spins up a Discord thread as the singular source of truth for incident communications:

Incident handling in Discord

Incident handling in Discord

Example: Automating On-Call Handovers

Handing over an on-call rotation can be a lossy process. Knowing the start or end of a rotation requires you to look at the schedule, and you might also need to share notes about what went on during your rotation for awareness.

We automated on-call handover notice and discussion to let people know who’s going on-call, and provide a structured way to share handover notes:

Automated on-call handover

Automated on-call handover

Our end-game: Community Forum

Every support organization wrangles with having too many tools. Each tool in the repertoire increases the conversation latency and reduces resolution speed by grabbing precious context cycles.

For us, our feedback and conversation loop is scattered across our Community Discord, Email Support, and other platforms such as Slack for our business users. We centralize them into our support tool, Plain, as much as we can.

We think there’s a better way for us to do this, and also dogfood our own product at the same time. This is why we’re investing into building out our Community Forum from scratch:

While this is still an early preview, our grand vision for the Railway Community Forum is to have a cohesive user experience at every level of the product. We want this to be the one-stop Central Station for all your Railway product-related needs: unified support, product roadmap, feedback, discussion, and overall help center.

Closing

Our support system is a giant feedback control loop. Support Engineering is about building systems that we can throw into a while(1); loop to compound our abilities to build systems that 100x’es our horsepower.

Our goal is to have 1 Support Engineer for every 1M Total Users, and our bet is that by increasing the IOPS between human conversations and investing in the logistical side of how Railway ships, we can 100x our product and organization at large.

We build for a few key demographics:

  • Users, companies, & teams: We have ~500K total users on the platform, along with companies and teams that rely on us for critical workloads on a higher support tier. We build communities to allow our users to talk to support each other, and we support critical companies & teams directly by building the system that allows them to have a direct chat bridge with us in Slack, Discord, MS Teams, etc.
  • Community members & moderators: We have a new Community Forum and an active Discord server with 20K+ developers, along with 10+ amazing community moderators. We built our Community Forum from scratch, along with Discord bots to help us operationalize day-to-day tasks.
  • Railway team members: We’re a lean and small team of 20 highly-talented individuals, building for ~25K monthly deploying users using ~250 different tech stacks. We love user conversations, and Support Engineers build better tooling to enable efficient interactions with our users.

This comes with many challenges, both software and human, that compounds with scale — it’s scary, it’s big, and they’re not easy tracks to lay.

If the above excites you, we want to hear from you 🙂