Jobs

Be a part of it. Join the #AccelFamily

Infrastructure Engineer

Qwilt

Qwilt

Other Engineering
Israel
Posted on Wednesday, April 17, 2024

Infrastructure Engineer

  • Qwilt LTD - Israel

Description

Infrastructure Engineer

About the team:

Our globally distributed team of senior engineers is dedicated to managing and optimizing multi-cloud production infrastructure. We specialize in infrastructure monitoring, automation, tools management, and reliability across various cloud platforms. As part of our responsibilities, we actively participate in follow-the-sun on-call rotations. We operate in a dynamic, multi-tasking environment that requires constant learning and adaptation. By ensuring the reliability of our systems, we directly impact the success of our business.

Tech Stack:

● Version Control: Gitlab

● Continuous Delivery: ArgoCD

● Containerization: Kubernetes, Docker

● Configuration Management: Puppet, Ansible

● Cloud Native Infrastructure Tools: KEDA, Karpenter, External-DNS, External-Secrets, etc.

● Automation: Rundeck

● Monitoring & Alerting: InfluxDB, Prometheus, Thanos, Grafana, Zabbix

○ Agents: Logstsh, Telegraf, Fluentd, Fluentbit, Grafana Agent

● Logging: Coralogix, Loki, Elasticsearch

● Infrastructure as Code: Terraform

● Caching: Memcached

● Scripting: Shell, Python

● Cloud Platforms: AWS, GCP

About the role:

As a senior engineer specializing in infrastructure, you will play a critical role in managing and optimizing our multi-cloud production environments. You will be responsible for infrastructure monitoring, scaling and reliability, automation and monitoring tools management. This role requires active participation in follow-the-sun on-call rotations to ensure the reliability and availability of our services.

What You Will Do:

● Manage and optimize multi-cloud production infrastructure services.

● Implement and maintain infrastructure monitoring solutions using Prometheus, Thanos, Grafana, and other tools.

● Develop automation scripts in Bash and Python to streamline operational tasks.

● Manage tools such as Puppet, Ansible, Rundeck, Teleport and more.

● Collaborate with cross-functional teams to enhance system reliability and performance.

● Contribute to the architecture and scalability of our systems.

● Participate in follow-the-sun on-call rotation to respond to incidents and ensure system availability.

● Troubleshoot and resolve infrastructure issues across our cloud environments.

● Drive best practices for reliability, scalability, and observability.

● Mentor and guide other teams in best practices and technologies.

● Contribute to the design and implementation of scalable, reliable, and secure solutions.

Required Experience:

● Minimum of 3 years of hands-on experience with Kubernetes.

● At least 3 years of experience working with cloud environments (AWS, Google Cloud Platform).

● Strong understanding of infrastructure monitoring tools such as Prometheus (Mimir/Thanos/Cortex), including maintenance and architecture.

● Proficiency in Bash and Python scripting for automation tasks.

● Familiarity with SQL and NoSQL databases, such as MySQL, PostgreSQL and MongoDB.

● Familiarity with in-memory key-value stores such as Redis or Memcached.

● Solid understanding of networking and web applications, with emphasis on TCP/IP stack, SSL/TLS, and HTTP protocols.

Additional Skills (Preferred):

● Experience with Terraform for infrastructure as code.

● Knowledge of containerization technologies such as Docker.

● Understanding of CI/CD pipelines.

● Familiarity with logging and monitoring tools like Coralogix/Grafana.

Why Join Us:

If you are passionate about infrastructure reliability, automation, and thrive in a fast-paced environment, we would love to hear from you. Join us in delivering the best experience for our customers and ensuring the success of our business. Apply now to be part of our innovative team!

● Opportunity to work with a globally distributed team of senior engineers.

● Dynamic and challenging environment that encourages constant learning and growth.

● Direct impact on the reliability and success of our business.

● Exposure to cutting-edge technologies and cloud platforms.