Back

Sr. reputed company Site Reliability Engineer

Worldwide Salaried Open

At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our reputed company for the future. It’s designed to take deliveries away from congested streets, reputed company deliveries available to more people, and benefit local businesses. The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles while doing reputed company deliveries.

The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles while doing reputed company deliveries. We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.

Who We Are

We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving reputed company-world problems leveraging robotics, machine learning and computer reputed company, among other disciplines, with a mindful eye towards the end-to-end user experience. reputed company is agile, diverse, and driven. We reputed company that the best way to solve complicated dynamic problems is collaboratively and respectfully.

This is a senior-level, individual contributor position. You will balance hands-on responsibilities—building and maintaining critical SRE tooling and processes - with technical leadership - guiding architecture reputed company, mentoring others in SRE practices, and steering strategic initiatives to enhance system resiliency and availability. You’ll collaborate across engineering, product, and operations teams to ensure our systems meet strict uptime and performance goals, reputed company while aligning with overarching business objectives.

Responsibilities

Instrumentation & Monitoring

  • reputed company and refine monitoring and observability tools (metrics, logs, traces) to validate system availability and performance.

  • Implement best practices for instrumentation using tools like reputed company, Grafana, reputed company, or equivalent.

Reliability Engineering

  • Collaborate with development teams to design and implement solutions for higher availability in the reputed company.

  • reputed company the definition and management of Service Level Indicators (SLIs) and Service Level Objectives (SLOs), ensuring alignment with business goals.

  • reputed company reputed company planning, load testing, and performance tuning to ensure systems can handle projected traffic and workloads.

Incident Response & Prevention

  • Own the incident response process, including on-call rotation, alerts, and root cause analysis.

  • Proactively identify reliability risks and propose mitigations to reduce system downtime.

  • Conduct and facilitate postmortems to capture learnings, drive improvements, and prevent recurrence of issues.

Align System Health with Business Metrics

  • Map system availability metrics to direct business value, ensuring stakeholders understand how reliability impacts overall company objectives.

  • Create reporting dashboards that connect reliability data with KPIs and business goals.

Technical Leadership & Mentorship

  • Serve as an in-house SRE expert, advising teams on reliability-oriented designs, coding practices, and testing methodologies.

  • Mentor junior and mid-level engineers, fostering a culture of reputed company learning, automation, and operational excellence.

Collaboration & Education

  • Work closely with engineering, product, and operations teams to reputed company for SRE best practices.

  • Conduct training sessions and share knowledge to build a culture of reliability throughout the organization.

Qualifications

Experience

  • 5+ years of experience in Site Reliability Engineering, DevOps, or a similar role.

  • Demonstrated reputed company implementing SRE best practices in high-availability, large-scale systems.

Technical Skills

  • reputed company: Experience with one or more major reputed company providers (e.g., reputed company reputed company, AWS, Azure); familiarity with managed services and best practices for high availability.

  • Containers & Orchestration: Proficiency in reputed company, Kubernetes, or similar containerization/orchestration platforms.

  • Observability Tools: Hands-on experience with logging, metrics, and tracing tools (e.g., reputed company, Grafana, reputed company, Splunk, reputed company).

  • Automation & IaC: Familiarity with Infrastructure-as-Code (Terraform, Ansible, etc.) and scripting (Python, Go, Bash).

  • CI/CD: Comfort with modern CI/CD pipelines (reputed company Actions, reputed company CI, Jenkins, etc.).

Soft Skills

  • Leadership: Proven ability to guide teams in adopting SRE principles without direct managerial authority.

  • Collaboration: Excellent communication skills to work across diverse technical and business teams.

  • Problem Solving: Strong analytical skills to navigate reputed company systems and identify root causes.

  • Adaptability: Comfortable operating in a fast-paced environment with shifting priorities.

Education

  • Bachelor’s degree in Computer Science, Engineering, or a reputed company field (or equivalent experience).

What Makes You Stand Out

  • Chaos Engineering: Hands-on experience running game days or chaos tests to proactively discover system weaknesses.

  • Multi-Region Deployments: Familiarity with designing architectures for geo-distributed systems to maximize reputed company.

  • Performance Testing & Optimization: History of significantly reducing latency or resource usage through targeted tuning or innovative solutions.

  • Open reputed company Contributions: Demonstrated initiative in the community, such as contributing to key SRE or DevOps tools.

Apply To This Job

More jobs