Technical Duty Officer (Lead Senior Site Reliability Engineer)
Xero
Software Engineering, IT
San Mateo, CA, USA
Posted on Jan 14, 2025
Xero is a beautiful, easy-to-use platform that helps small businesses and their accounting and bookkeeping advisors grow and thrive.
At Xero, our purpose is to make life better for people in small business, their advisors, and communities around the world. This purpose sits at the centre of everything we do. We support our people to do the best work of their lives so that they can help small businesses succeed through better tools, information and connections. Because when they succeed they make a difference, and when millions of small businesses are making a difference, the world is a more beautiful place.
As Xero grows there is a continued need for a keen focus on reliability to ensure customers receive service that exceeds their expectations. Xero’s Incident and Problem Management team are a part of the Site Reliability Engineering (SRE) organization and are responsible for the build, delivery and ongoing maintenance of robust process and tooling around Incident management. The team is responsible for driving enduring reliability at Xero through robust, consistent and fast response to high severity incidents. They are responsible for building a world class process and ensuring that process matures as the demands of the business grows.
This position requires an experienced SRE professional with a strong technical background, deep experience in SRE, a passion for building and delivering robust processes and extensive experience of leading technical response to high severity cloud issues. As a seasoned and relentless professional, they will drive best practice across the business and contribute to the ongoing transformation of the Xero SRE culture. As an expert communicator, they will lead technical discussions to identify and track actions associated with and identified during incident situations.
What You'll Do:
- Owns the incident management process and ensures it drives enduring reliability across all products and services within Xero.
- Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution.
- Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department.
- Act as a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team.
- Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability.
- Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiently.
- Provides ongoing training across the business to ensure the process is well understood and adhered to. This includes training appropriate engineering resources who will own Incident commander actions for lower priority issues.
- Dives into causes of Incidents and examines, on a proactive basis, the potential causes of future incidents and works with engineering teams to remove the risk of that failure scenarioBuild playbooks and automated response to Business continuity and DR situations to ensure response is quick and effective.
What You'll Bring With You:
- 5+ years of experience as a Site Reliability Engineer, with relevant experience in an Operations or Engineering environment.
- Experience troubleshooting AWS hosted servicesNetworking knowledge and able to troubleshoot TCP/IP, SSL/TLS, DNSSEC, IPsec, and BGP issues.
- Coding experience (preferably Python) building tools, scripting, or automationStrong communication (oral & written) skills including the ability to translate technical issues/concepts into agreed actions.
Why Xero?
Diversity of people brings diversity of thought, and we like that. Our human-first culture of respect, fairness, and inclusion is what helps Xeros thrive and work and beyond. Offering very generous paid leave to use however you’d like (plus statutory holidays!), dedicated paid leave to care for your physical and mental wellbeing as well as an Employee Assistance Program to access mental health care for you and your family, employee resource groups, wellbeing programming and allowances, medical, dental, vision, and disability insurance, fertility and family forming financial support, 401k contribution matching, 26 weeks of paid parental leave for primary caregivers, an Employee Share Plan, beautiful offices with snacks and break areas, flexible working, career development and many other benefits that reflect our human value, you’ll do the best work of your life at Xero.
Research has shown that women and underrepresented groups are less likely to apply to jobs unless they meet every single competency or experience. If you are excited about this role, but your past experience doesn't align perfectly, we encourage you to apply anyway. You could be just the right person for this role and Xero. If you have any support or access requirements, we encourage you to advise us at time of application and throughout the interview process.