Site Reliability Engineer
Atlassian
We are looking for an engineer who is passionate about scaling cloud services to join our growing SRE team. The SRE team owns the caching infrastructure, tooling, and automation that support Atlassian’s suite of Cloud products.
We'd love it if you had an understanding of modern cloud infrastructure, programming expertise, operational experience and a desire to change the status quo. We're looking for an engineer who can analyze and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency.
On your first day, we'll expect you to have:
1+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
1+ years of hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure).
Familiarity with Unix / Linux operating systems.
Great emphasis to debug, improve code, and automate routine tasks.
Backend engineering experience in one or more prominent languages such as Java, Go or Python.
Strong communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
It would be great, but not mandatory if you had:
Experience implementing caching solutions, strategies, and best practices.
Experience in microservice architecture.
Experience building web-services and clients using REST/GraphQL.