Back

Member of Technical Staff, Training (Bay Area, Remote)

Worldwide Salaried Open

What You’ll Do Drive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernels Design, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilization Implement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworks Optimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networking Develop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failures What You’ll Bring Deep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years) Production-grade expertise in Python Low-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimization Scaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelism System-level mindset with a track record of tuning hardware–software interactions for maximum utilization Apply To This Job

More jobs

Marketing Analyst (Attribution Focus) (Promova)

Worldwide Salaried

Student and Family Experience Manager (Immediate Opening)

Worldwide Salaried

Customer Sales Representative (remote work)

Worldwide Salaried

Account Manager Industrial Markets Region: France - Africa

Worldwide Salaried

VP of Engineering

Worldwide Salaried

Member of Technical Staff, Foundation Models (Bay Area)

Worldwide Salaried

Member of Technical Staff, Data Agent (Bay Area, Remote)

Worldwide Salaried

Member of Technical Staff, Platform (Bay Area, Remote)

Worldwide Salaried

Account Manager Industrial Markets Region: Europe - Middle Eas

Worldwide Salaried

Sr FP&A Analyst

Worldwide Salaried

Collections Specialist (Northeast Region)

Worldwide Salaried

Remote Customer Care Specialist – arenaflex

Worldwide Salaried

Senior Principal Consultant, Java Developer

Worldwide Salaried

Part-Time WordPress Developer Needed!

Worldwide Salaried

Experienced Part-Time Evening Remote Data Entry Specialist – Flexible Scheduling for a Dynamic arenaflex Team

Worldwide Salaried

Experienced Business Support Manager – Terminal Activities at arenaflex

Worldwide Salaried

Clinical Trial Educator - National - PAH/IPF

Worldwide Salaried

Experienced Full Stack Cybersecurity Analyst – Network Protection Examiner Understudy @arenaflex

Worldwide Salaried

Experienced Travel Advisor – Virtual Customer Care Specialist (Remote)

Worldwide Salaried

Part-Time Evening Data Entry Specialist – Precise Data Input, Database Management & Quality Assurance

Worldwide Salaried