Lead DevOps Engineer

EPAM Systems, Inc. -
Desde casa

Postulate ahora

Detalles del empleo

Hace 2 días

Descripción completa del empleo

We are seeking a Lead DevOps Engineer to design, operate, and continuously improve the AWS platform that powers a custom VDI platform and cloud playtesting/streaming platform. This is a primarily individual contributor role that requires strong ownership and the ability to work independently while collaborating with one other team member and customer stakeholders. You will be responsible for infrastructure-as-code, container platforms, automation, CI/CD standardization, cost/performance optimization (including GPU instances), and leading troubleshooting during platform-wide degradations.

Responsibilities

Design, build, and maintain AWS infrastructure using Terraform
Management of Terraform workflows and remote state using HashiCorp Cloud Platform (HCP)
Ownership of the infrastructure lifecycle including provisioning, upgrades, decommissioning and operational hygiene
Operation of ECS clusters to deploy and operate microservices supporting the platforms
Operation of EKS clusters used to host and enable GitHub Actions runners, including required platform customizations
Right-size and tune GPU-enabled EC2 capacity to balance user experience with strict cloud cost controls
Continuous assessment of scaling behavior, utilization and performance bottlenecks
Implementation and maintenance of AWS Lambda functions for automation such as cleanup tasks, on-demand provisioning and operational workflows
Standardize and optimize GitHub Actions pipelines for Terraform plan/apply workflows, infrastructure releases and container image build/publish/deploy processes
Lead troubleshooting and restoration efforts for platform-wide issues such as VDI session drops, authentication issues and machine/storage failures
Coordination of incident resolution across teams through investigation, mitigation and follow-up actions
Creation and maintenance of run books, operational documentation and onboarding materials

Requirements

5+ years of experience in DevOps or platform engineering roles
Expertise in AWS infrastructure design, provisioning and lifecycle management
Proficiency in Terraform and HashiCorp Cloud Platform (HCP)
Skills in container orchestration with ECS and EKS
Knowledge of GPU-enabled EC2 capacity right-sizing, cost management and performance tuning
Competency in AWS Lambda for event-driven automation
Background in CI/CD standardization with GitHub Actions pipelines
Capability to lead reliability engineering, troubleshooting and incident resolution
High ownership and accountability with the ability to work independently and deliver without close supervision
Strong troubleshooting and systems thinking, remaining calm and structured during incidents
Clear communication with both technical and non-technical stakeholders
Practical prioritization in a Kanban environment balancing planned work and urgent interruptions
English proficiency at B2 level or higher

Nice to have

Familiarity with Amazon GameLift Streams
Understanding of streaming and playtesting platform needs
Skills in triaging urgent ad-hoc requests outside the standard Kanban flow

We offer

International projects with top brands
Work with global teams of highly skilled, diverse peers
Healthcare benefits
Employee financial programs
Paid time off and sick leave
Upskilling, reskilling and certification courses
Unlimited access to the LinkedIn Learning library and 22,000+ courses
Global career opportunities
Volunteer and community involvement opportunities
EPAM Employee Groups
Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn

Postulate ahora

Herramientas para candidatos

Herramientas para empresas

Explorar

Mantente conectado