Description
Slack enables people around the world to communicate and collaborate together, from the world’s largest public companies to the smallest of startups. We take performance and reliability very seriously. A taste of our scale:
During the week, our users spend over a billion minutes a day active in our product.
At peak usage, a million messages a minute passed through Slack.
Every day we see over 15 million simultaneously connected users
For millions of people, Slack is their primary communication tool for work and more and they expect it to be exceptionally reliable and fast year-round.
About the Team
The Platform Orchestration team is responsible for building and operating the foundational systems that manage how services are deployed, operated, and monitored across Slack’s infrastructure. Our focus is on creating a unified, automated, and scalable platform that enables engineering teams to deliver reliable and compliant services with speed and confidence.
We design and maintain the orchestration layers that power:
Fleet management — ensuring services are consistently deployed and run across diverse environments.
Continuous integration and delivery — enabling safe, observable, and metrics-driven software releases.
Incident response workflows — automating incident coordination and improving response times through intelligent tooling.
Reliability measurement — defining and computing metrics that reflect user experience and system health.
Service discovery and ownership — maintaining a centralized catalog that promotes visibility, accountability, and operational clarity.
Compliant data and artifact movement — orchestrating secure workflows that meet evolving privacy and regulatory standards.
Our systems integrate deeply with AWS cloud infrastructure to provide consistency, reliability, and control at scale. Ultimately, we aim to abstract complexity, standardize best practices, and enable teams across Slack to build and operate services that are resilient, efficient, and scalable for the long term.
What you will be doing:
Lead end-to-end delivery of software projects that improve how services are deployed, managed, and observed across Slack’s infrastructure.
Design and build foundational orchestration systems that enable reliable, scalable, and compliant service operations.
Develop tools that streamline continuous delivery, automate incident workflows, and track service health and reliability metrics.
Implement platform solutions using infrastructure-as-code, automation frameworks, and AWS-native services to drive consistency and scalability.
Collaborate closely with engineering teams to integrate platform capabilities into their workflows—empowering them to ship faster and operate more reliably.
Contribute to and extend internal frameworks that support deployment workflows, service ownership, and lifecycle management.
Advocate for platform efficiency and cost-conscious engineering through tooling, automation, and cross-team engagement.
Drive adoption of best practices in service orchestration and cloud resource usage—ultimately helping Slack deliver better experiences at lower operational cost.
What You Should Have
Strong software engineering fundamentals—you write clean, maintainable code and design systems that scale. You bring engineering rigor to platform and infrastructure work.
Curiosity and a systems mindset—you're deeply interested in how platform layers and infrastructure orchestration work, and you enjoy sharing that knowledge to uplift those around you.
A strong grasp of reliability, scalability, and operational efficiency, especially in the context of deploying and managing distributed systems.
A collaborative, mentoring mindset—you model engineering best practices around testing, design, documentation, and operational excellence.
Solid experience operating backend systems in production—you can point to tools, workflows, or platforms you’ve helped build or scale.
Hands-on experience with AWS—you’re comfortable navigating complex environments and have depth in at least a few core services.
Experience building infrastructure as code using tools like Terraform, with a preference for reusable and maintainable abstractions.
A passion for creating platform solutions that empower engineering teams to move faster, be more reliable, and operate with confidence.
A pragmatic approach to problem-solving, with a drive to automate and simplify operations at scale
Bonus Points
Deep knowledge of specific AWS domains such as IAM, EC2, or networking.
AWS certifications (Professional or Specialty-level).
Experience building or maintaining platforms that support high-scale CI/CD, service orchestration, or incident automation.
Demonstrated success in reducing cloud costs or improving platform efficiency through tooling or automation.
For roles in San Francisco and Los Angeles: Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.
