Description
Job Title: Senior Member of Technical Staff (SMTS) - Site Reliability Engineer (Cloud Automation)
Location: New York, NY; San Francisco, CA
About the Team
The Cloud Platform Engineering team builds and operates the highly available, active-active mission-critical infrastructure that powers Salesforce at scale. We treat the internal cloud as a product designed to maximize developer velocity through automation-first thinking and a strict "No Ticket-Ops" philosophy. We are defining the next generation of platform engineering by running a LEAN, innovative team that leverages AI as much as humanly possible. By integrating AI agents directly into our GitOps workflows and our enterprise WorkOS (Slack), we aim to build a smart, secure platform that our internal developers love.
The Shared Team DNA
While every member of our team has a distinct focus area, we are all "T-shaped" engineers who learn from one another. Regardless of your title, you must share our collective passion for:
Customer Focus: Treating internal developers as our primary customers and prioritizing their velocity and user experience.
Automation: Eradicating manual toil and "ticket-ops" via GitOps and AI-augmented workflows.
Security: Believing that security should be "shifted left" and built into the code, not bolted on as an afterthought.
SRE Mindset: Engineering for failure, prioritizing self-healing systems, and maintaining a 99.999% availability standard.
Observability: Relying on telemetry, centralized logging, and ChatOps to proactively identify and resolve issues.
About the Role
While the Architect determines how our platform should be designed, you are the engineer who actually builds the cloud infrastructure engine. As a Senior SRE focusing on Cloud Automation, you will partner closely with our enterprise CI/CD teams to seamlessly integrate our platform's capabilities into the developer workflow. You are responsible for building the infrastructure vending machines and the enterprise-grade infrastructure-as-code (IaC) modules that abstract away cloud complexity. You will empower internal developers to provision secure, compliant environments in minutes via self-service ChatOps workflows.
Your Impact - Responsibilities
The Vending Machine: Build, maintain, and scale the automated provisioning workflows that orchestrate the creation of new, fully governed multi-account cloud environments.
"Golden Modules": Author, test, and maintain a library of pre-approved Infrastructure-as-Code templates that internal developers will consume. Ensure these modules enforce our strictest standards by default.
Shift-Left Integrations: Partner with the enterprise CI/CD team to plug our platform's automated security scanning, Policy-as-Code evaluations, and cost-estimation checks directly into the developer's Pull Request process.
Resilience & Observability: Implement data-plane-driven automated failover mechanisms, and develop integrations that connect our provisioning tools to our enterprise WorkOS (Slack) for real-time operational intelligence.
Minimum Qualifications
Bachelor's degree in Computer Science, Computer Engineering, Software Engineering or relevant work experience
7+ years of software engineering or Site Reliability Engineering experience in large-scale cloud environments.
Expert-level proficiency in Infrastructure-as-Code (strictly Terraform) and managing state in highly distributed architectures.
Strong programming skills in Python, Go, or similar languages used for building automation tooling and API integrations.
Proven experience operating multi-region, active-active cloud environments and implementing automated disaster recovery tests.
Deep understanding of GitOps workflows and integrating infrastructure guardrails into existing enterprise CI/CD pipelines.
*LI-Y
For roles in San Francisco and Los Angeles: Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.
