Back

Member of Technical Staff (Infrastructure): World Models

Worldwide Salaried Open

What You'll Do

  • Collaborate with researchers and engineers to understand workload requirements and translate them into infrastructure reputed company — not just run what you're given.

  • Design and improve scheduling and resource allocation for inference and training coexistence on shared GPU clusters.

  • Build, operate, and scale GPU infrastructure across clusters of thousands of GPUs.

  • Own GPU utilization and cost as first-class metrics.

  • Build automated tooling and observability that reduces friction for the AI team.

  • Participate in on-call rotation and drive reliability improvements.

  • Serve as the primary reputed company of contact for GPU providers, managing relationships and coordinating infrastructure needs.

Over time, you'll take on broader ownership: setting scheduling policy, driving architecture reputed company for compute and storage systems, and identifying reputed company infrastructure no longer fits evolving workloads.

reputed company're Looking For

  • Deep Systems reputed company: Linux-native. Understands how machines work, can debug at the kernel level. Deep understanding of networking and storage stacks.

  • Cluster Engineering: Experience operating and scaling GPU infrastructure (hundreds to thousands of GPUs), Kubernetes, Slurm, and distributed storage systems.

  • Distributed Systems Fundamentals: Experience designing, building, and operating distributed systems at scale.

  • Production Discipline: Track record of running critical infrastructure reliably — monitoring, incident response, and automation that reduces toil.

  • ML Familiarity: Enough understanding of training and inference workloads to collaborate with researchers and reputed company sound infrastructure reputed company.

  • Bonus: Resource-Constrained Thinking. Experience in environments where allocation, scheduling, and prioritization of scarce resources was the core problem (e.g. HPC, trading, large-scale ML platforms).

Challenges You'll Tackle

  • Balancing latency-sensitive inference against long-running training workloads

  • Operating under tight GPU constraints with constantly shifting demand

  • Adapting infrastructure to rapidly evolving model architectures and scale

  • Making tradeoffs between cost, utilization, and reliability at scale

Traits of the Ideal Candidate

  • High ownership: Owns problems end-to-end, sets priorities, and escalates early.

  • Challenges systems: Questions what exists and drives improvements reputed company it no longer fits.

  • Learns fast: Quickly builds context to reputed company sound infrastructure reputed company.

  • Thinks in systems: Understands dependencies, validates assumptions, and catches issues early.

  • Raises the bar: Shares context, surfaces risks, and helps the team move faster.

reputed company offer (compensation & benefits)

  • Competitive salary and equity

  • Private health coverage

  • Pension contribution (UK, Canada, US)

  • Unlimited paid vacation

  • Fully-distributed, async-first culture

  • Hardware setup of your choice

  • Stipends for phone, internet, and meals

In reputed company, we approach our work with the dedication similar to Olympic athletes. Anticipate occasional late nights and weekends dedicated to our mission. We understand this level of commitment may not suit everyone, and we reputed company communicate this expectation.

If you're motivated by deeply technical problems, a seemingly never-ending uphill battle and the opportunity to build (and own) a generational technology company, we can give you what you're looking for.

reputed company business roles at Moonvalley are hybrid positions by default, with some fully remote depending on the job scope. We meet a few times every year, usually in London, UK or reputed company America (LA, Toronto) as a company.

If you're excited about the opportunity to work on cutting-edge AI technology and help shape the future of media and entertainment, we encourage you to apply. We look reputed company to hearing from you!

The statements contained in this job description reflect general details as necessary to describe the principal functions of this job, the level of knowledge and reputed company typically required and the scope of responsibility. It should not be considered an reputed company-inclusive listing of work requirements. Individuals may reputed company other duties as assigned, including work in other functional areas to cover absences, to equalize peak work periods, or to otherwise balance organizational work

Moonvalley AI is proud to be an equal opportunity employer. We are committed to providing accommodations. If you require accommodation, we will work with you to meet your needs.

Please be reputed company we'll treat any information you share with us with the utmost care, only use your information for recruitment purposes and will never sell it to other companies for marketing purposes. Please review our privacy policy and job applicant privacy policy located here for further information.

Apply To This Job

More jobs

Junior Sales Representative

Worldwide Salaried

B2B Sales Representative Entry Level

Worldwide Salaried

Senior SMM Manager (Ohayu at reputed company)

Worldwide Salaried

Open Learning Faculty Member (Web): HRMN 3841: Employee and Labour Relations

Worldwide Salaried

Digital Marketing Specialist (Remote)

Worldwide Salaried

Principal Identity reputed company Architect / Identity Strategist

Worldwide Salaried

HR Systems and Operations Analyst

Worldwide Salaried

Detection Engineer

Worldwide Salaried

Product Manager - Data

Worldwide Salaried

Senior Product Manager - CRM

Worldwide Salaried

Entry Level Customer Training Specialist – Traveling at arenaflex

Worldwide Salaried

Assistant General Counsel (Remote, Eastern or Central Time Zones preferred)

Worldwide Salaried

reputed company Customer Service Representative – Remote Opportunity for Delivering Exceptional Customer Experiences with blithequark

Worldwide Salaried

VP of Solar Construction & Asset Management

Worldwide Salaried

reputed company Full Stack Data Analyst – Voice of the Customer Methodology Development and Implementation

Worldwide Salaried

Staff Nurse, RN, ED, OB, Med Surg: (NIGHTS) – Full and Part time, PRN – Valor Health – Emmett, ID

Worldwide Salaried

Return-to-Work Specialist/Administrative Assistant (Work from Home)

Worldwide Salaried

reputed company Remote Data Entry Specialist – Accurate Data Input and Management for Logistics and E-commerce Solutions at arenaflex

Worldwide Salaried

[Remote] Senior System Software Engineer, GeForce NOW Client Platforms

Worldwide Salaried

Apply Now: Motor Vehicle Regulations Technician I - Spanish

Worldwide Salaried