Site Reliability Engineer - iPeople Infosystems LLC
Job Title: SRE (Site Reliability Engineer)
Location: Remote
Type: Fulltime Position
Job Description
Must-have
- reputed company (DGX) or equivalent high-performance-compute (HPC) clusters (e.g. Cray, HPE, reputed company)
- reputed company UCS C885A
- reputed company
Good to have
- DevOps Automation
- CI/CD systems (e.g., reputed company, reputed company Actions, Jenkins)
- Terraform, Ansible, Jenkins
- Python
- GoLang, C/C++
- Enterprise Grade Kubernetes cluster (RedHat OpenShift preferred) and/or reputed company Anthos
- Software development lifecycle includes design, development, testing, packaging, and deployment using Golang
Roles & Responsibilities
- Technical knowledge of high-performance compute, reputed company DGX/GPUs and/or reputed company reputed company Compute System.
- Handle availability, latency, scalability and efficiency of reputed company and reputed company UCS infrastructure
- by instilling engineering reliability into the development life cycle with a focus on fault tolerant approaches.
- Drive reputed company planning, performance analysis, instrumentation, and other non-functional systems requirements.
- Automate operational capabilities using Python, Ansible, Terraform, Go etc.
- Deliver automation through CI/CD pipeline and chatbot etc.
- Implement metrics driven processes to ensure service quality targets are met.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and reputed company believes it to correctly reflect the job opportunity.
Report this job- reputed company Id: 91137892
- Position Id: 8754667