Back

Site Reliability Engineer (m/f/d)

Worldwide Salaried Open

About the position As a Site Reliability Engineer in our Platform Squad, you will be a key player in keeping Flip's infrastructure fast, resilient and ready to scale. You'll shape the reliability culture, tooling and practices that allow our engineering teams to ship with confidence - at scale and without compromising availability. This role is perfect for an engineer who is passionate about building high-throughput, highly available systems and who wants to shape how a fast-growing SaaS platform runs in production.

Responsibilities

  • Further expand and optimize our cloud infrastructure on Azure and our Kubernetes clusters - designed for high throughput and highest availability - to support Flip's rapid growth across the globe.
  • Design and implement zero-downtime deployments, rollback mechanisms and disaster-recovery strategies that keep our platform available around the clock.
  • Evolve our LGTM stack (Loki, Grafana, Tempo, Mimir) to give every team the visibility they need - and use it to define and optimize our SLOs.
  • Design, develop and optimize infrastructure as code with Pulumi in Go, eliminating toil and making our platform self-service for engineering teams.
  • Promote CI/CD best practices, incident management, post-mortems and developer experience across the entire engineering organization.
  • Collaborate with your squad and engineering leadership to define the platform's direction - from scalable, high-throughput systems and cost optimization to security posture and compliance.

Requirements

  • 1–3 years of hands-on experience as a Site Reliability Engineer (SRE), Platform Engineer, DevOps Engineer, Infrastructure Engineer, Cloud Engineer, or Backend Engineer with a strong infrastructure focus.
  • Experience operating and scaling cloud infrastructures (Azure, GCP, AWS).
  • Deep knowledge of Kubernetes and container orchestration in production environments.
  • Hands-on experience with modern observability stacks (e.g. Prometheus, Mimir, Loki, ELK) and comfortable defining and operating SLOs and error budgets.
  • Solid software development skills in Go (preferred, since our IaC runs on Pulumi in Go), Python or Kotlin.
  • Hands-on experience with infrastructure as code (e.g. Pulumi, OpenTofu, Terraform) and configuration tooling (e.g. Ansible, Chef).
  • A collaborative mindset, strong communication skills and business-fluent English.
  • Willingness to participate in on-call rotations to ensure the reliability of our platform.

Nice-to-haves

  • Experience building and operating high-throughput, highly available systems in production.
  • Experience with Azure Kubernetes Service (AKS) specifically.
  • Experience with Kubernetes Gateway API and Envoy Gateway.
  • Familiarity with GitOps workflows and CI/CD pipeline design.
  • Knowledge of service mesh technologies (e.g. Linkerd, Istio).
  • Experience with Kubernetes Operators (e.g. Strimzi, CNPG)
  • Experience with operating High-Availability PostgreSQL

Benefits

  • Flexibility to work from home
  • Occasional team events, workshops, or meetings in our Berlin or Stuttgart offices
  • Costs of your E-Gym-Wellpass membership covered
  • Job bike leasing
  • Regular team events and culture days
  • Opportunity to work abroad in the European Union

Apply tot his job Apply To this Job

More jobs

Site Reliability Engineering, Automation and Orchestration Engineer

Worldwide Salaried

Site Reliability Engineer (m/f/d)

Worldwide Salaried

Lead Site Reliability Engineer - Infrastructure

Worldwide Salaried

Senior Site Reliability Engineer, AI Factory

Worldwide Salaried

Site Reliability Engineer

Worldwide Salaried

Senior Site Reliability Engineer, Fleet Management

Worldwide Salaried

Azure Site Reliability Engineer (W2 Only / No C2C)

Worldwide Salaried

Sr Site Reliability Engineer, Operations (US Federal)

Worldwide Salaried

Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Worldwide Salaried

Cloud Site Reliability Engineer

Worldwide Salaried

Project Manager, Business Optimization (Contractor)

Worldwide Salaried

Experienced Full Stack Data Entry Specialist – E-commerce Operations Support

Worldwide Salaried

Property Claims Specialist

Worldwide Salaried

Experienced Live Chat Data Entry Specialist – Remote Customer Service & Data Management

Worldwide Salaried

Experienced Live Chat Agent – Remote Customer Support Specialist (Part-Time & Full-Time)

Worldwide Salaried

Medicare Broker - Remote! W-2! Flexible Schedule!

Worldwide Salaried

Experienced Ramp Agent (Customer Service Agent) - MSP at arenaflex

Worldwide Salaried

Experienced Bilingual and Non-Bilingual Customer Service Representative I – Remote Opportunity at arenaflex

Worldwide Salaried

Senior Infrastructure Engineer (Observability)

Worldwide Salaried

Experienced Tier 1 Customer Support Agent – Part-Time Contractor – Remote

Worldwide Salaried