Hey! We're Scale Up, and our client is looking to hire a Site Reliability Engineer for a 12-month infrastructure modernization project.
Type of Employment: Contractor (12-month project)
Work Modality: 100% Remote
Work Schedule: Full-time
Location: LATAM
Project Duration: Through June 30, 2027
Our client is a fast-paced technology company focused on building reliable, scalable digital products with a strong emphasis on quality and user experience. They work in collaborative agile environments where engineering, product, and infrastructure teams partner closely to modernize large-scale systems while maintaining high service reliability.
We're looking for an experienced Site Reliability Engineer to help drive large-scale infrastructure modernization initiatives across legacy and cloud environments.
You'll lead operating system migrations, improve deployment pipelines, modernize monitoring platforms, and provide operational support for mission-critical systems while helping engineering teams successfully transition to modern cloud infrastructure.
- Lead operating system modernization projects across approximately 1,700 systems and virtual machines.
- Execute migrations from RHEL7 to EL8/9 and modernize configuration management.
- Build, maintain, and configure RPM packages.
- Develop automated operational runbooks.
- Improve CI/CD pipelines and deployment reliability.
- Support monitoring and logging migrations to modern observability platforms.
- Provide Tier-2 operational support and incident response.
- Partner with engineering teams during cloud migration initiatives.
- Automate repetitive operational tasks and maintain technical documentation.
- 5+ years of Software Engineering or Site Reliability Engineering experience.
- Strong Python programming skills.
- Experience with Linux system administration and OS migrations.
- Experience managing configuration management and software packaging.
- Strong troubleshooting and incident response experience.
- Experience working with CI/CD pipelines and infrastructure automation.
- Experience migrating monitoring platforms to Prometheus, Grafana, or Chronosphere.
- Splunk experience.
- Experience supporting cloud migration initiatives.
- Experience working in large-scale production environments.