Lead Software Engineer (Observability & Telemetry)

Airkit

Airkit

Software Engineering
Bellevue, WA, USA
Posted on Mar 24, 2026

Description

Join the team responsible for innovating and maintaining the massive-scale, distributed systems that monitor Salesforce’s infrastructure.

This position is located in the Bellevue office and requires onsite presence.

The Network Visibility and Telemetry team is responsible for designing, building, and operating a set of systems and services which deliver metrics, telemetry and alerting for data center infrastructure (network, storage, etc). We are part of the Infrastructure Strategy Datacenter Operations organization, which is a dynamic, global team delivering and supporting technology infrastructure to meet the substantial growth needs of the business.

In this role, you will leverage your experience in building and deploying large-scale systems to automate systems services across all types of infrastructure (storage, network, server), enable the collection of infrastructure telemetry, make the infrastructure visible and accessible, and ensure that alerts are generated where action is needed.

Required Skills:

  • A related technical degree required.

  • 8+ years of proven experience with supporting a codebase for distributed services implemented in Java and/or Python

  • Experience with automation of systems services and processes.

  • Excellent analytical and problem-solving skills

  • A long-standing practice of using Source Control (e.g. git) and unit testing

  • Experience in publishing and consuming REST APIs

  • CI/CD experience with Jenkins

  • Knowledge of Linux (RedHat) including configuration, packages, services, daemons, shells, and troubleshooting

  • Experience with configuration automation tools such as Ansible, Puppet, and/or Chef.

  • Experience in fast-paced, technical environments experiencing rapid growth and change

  • Ability to adapt, to be flexible, and to learn quickly in a dynamic environment

  • Excellent organizational skills including ability to prioritize tasks efficiently with high level of attention to detail

  • Ability to work under tight deadlines while coordinating several projects at a time and responding to changing business and technical conditions

Desired skills:

  • Development experience in Clojure.

  • Experience with the monitoring and alerting of network infrastructure - routers, switches, load balancers, etc. - in a high-availability, always-on datacenter environment

  • Experience with the monitoring and alerting of storage infrastructure - switches, arrays, etc - in a high-availability, always-on data center environment

  • Experience with container orchestration systems, i.e., Docker and Kubernetes

  • Experience with Terraform, Helm, and Spinnaker.

  • Strong Network Engineering Skills: SNMP, BGP, OSPF or ISIS, LAN switching technologies, backbone, load balancers, IPv4/IPv6 addressing and subnetting.

  • Experience with application protocols and troubleshooting for the same (i.e., HTTP, HTTPS, TCP/UDP)

  • Experience with application databases and document stores, e.g. Elasticsearch, Cassandra

  • Experience in writing systems automation in a high level language such as python.

  • Previous experience as Scrum Master or Product Owner on an Agile Dev Team is a nice to have, especially if you enjoyed it.