Observability Engineer

Campaign Monitor

Campaign Monitor

Other Engineering
United Kingdom · Remote
Posted on Oct 11, 2024

The Company:
Marigold helps brands foster customer relationships through the science and art of connection. Marigold Relationship Marketing is a suite of world-class martech solutions that help marketers create long term customer love and loyalty. Marigold provides the most comprehensive set of use cases for marketers at any level. Headquartered in Nashville, Tennessee, Marigold has offices globally across the United States, Europe, Australia, New Zealand, South America and Central America, as well as in Japan.

The Role:

The Observability Engineer is a fundamental part of the Site Reliability Engineering team and process. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms as well as the design and architecture of the systems and services within the supported suite of products. Observability is responsible for monitoring, resolving and/or escalating system environment and application alerts in a highly transactional 24x7 environment.

As an Observability Engineer, you’ll be called upon to think strategically. You’ll work across functions and/or departments to understand the monitoring needs of the business. You will own the design, implementation, and optimization of comprehensive logging and monitoring solutions for complex applications and infrastructures. Your role involves leveraging advanced expertise in enterprise level tools, scripting for automation, and applying sound industry standard practices for observability. You will also be responsible for developing and maintaining standard observability metrics including MTTR and uptime. Beyond technical proficiency, you will be required to demonstrate leadership by guiding troubleshooting efforts, collaborating with cross-functional teams, and contributing to strategic logging and monitoring initiatives aligned with organizational objectives. Effective communication, mentoring, and a commitment to staying current with industry advancements are integral to success in this senior-level position.

What You’ll Do:

  • Provide expertise and ownership over various observability and logging tools ensuring optimal performance and scalability.

  • Apply engineering standard methodologies to analyze and develop observability solutions

  • Utilize scripting languages to automate and customize logging and monitoring workflows, enhancing efficiency and accuracy.

  • Lead troubleshooting efforts for complex issues, providing guidance to junior team members and collaborating with other technical teams to resolve issues effectively.

  • Collaborate with cross-functional teams, communicating effectively to understand requirements and integrating logging and monitoring solutions into overall strategies.

  • Contribute to the development and execution of strategic logging and monitoring initiatives aligned with organizational goals.

  • Mentor junior team members, fostering skill development and knowledge sharing within the team.

  • Demonstrate leadership qualities by setting standards, encouraging innovation, and promoting a collaborative team environment.

  • Develop, monitor, and analyze key metrics to support company KPIs.

Ideal Qualifications:

  • 5+ years of hands-on experience providing daily support of a company's applications infrastructure. This includes monitoring, and troubleshooting to ensure high-availability, and uptime.

  • Advanced expertise in Grafana as a logging and monitoring tool is a must.

  • Extensive experience in managing comprehensive monitoring solutions for complex IT infrastructures, including cloud, on-prem, hybrid and SAAS environments.

  • Proficiency with coding/scripting in one or more common languages for advanced automation and customization of logging and monitoring workflows. (I.E. Python, YAML, JSON, Perl, HTML5, PowerShell, XML, etc.)

  • Understands and supports Cloud-based applications

  • Strong experience working with databases such as MSSQL and/or Oracle.

Nice to Have:

  • Extensive understanding of other enterprise logging and monitoring tool sets like, Site24x7, SolarWinds, BigPanda, Splunk, etc, ideal.

  • Experience with Windows/Linux/Unix production environments using command line tools, networking and security concepts.

What We Offer:

  • The competitive salary and benefits you’d expect!

  • Generous time off (we call it Open Time Away) as well as paid holidays and a birthday benefit day off.

  • Retirement contributions.

  • Employee-centric and supportive remote work environment with flexibility.

  • Support for life events including paid parental leave.