Site Reliability Engineer – Infrastructure

Job description

Position: Site Reliability Engineer (SRE) – Infrastructure
Location: Atlanta, GA
Employment Type: Full-Time
Work Arrangement: Onsite Hybrid

Overview

The Site Reliability Engineer (SRE) will ensure the reliability, scalability, and performance of enterprise applications and services across cloud and on-premises environments. This role focuses on automation, monitoring, and incident response to minimize downtime and enhance operational efficiency. The position requires close collaboration with development, quality assurance, and operations teams to deliver secure and resilient systems.

What You Will Do

• Design, build, and maintain secure, compliant infrastructure using Infrastructure as Code tools such as Terraform and Ansible
• Automate provisioning and management of servers, storage, networks, Kubernetes clusters, and related systems across cloud and on-premises environments
• Develop tools and processes for automated deployment, configuration, monitoring, and alerting
• Collaborate with cross-functional teams to implement scalable and reliable cloud and data center solutions
• Participate in incident response, on-call rotations, and post-incident reviews to improve system resilience
• Monitor system performance and availability using service-level agreements (SLAs), objectives (SLOs), and indicators (SLIs); proactively troubleshoot and resolve reliability, performance, or security issues
• Create and maintain disaster recovery and business continuity plans for critical systems
• Continuously analyze and improve infrastructure efficiency, scalability, and performance
• Stay current with emerging technologies and recommend tools or practices to enhance platform capabilities
• Share technical expertise and mentor team members to strengthen internal capabilities

What We Are Looking For

Required Qualifications

• Bachelor’s degree in Computer Science, Engineering, or related field, or equivalent experience
• Proven experience as a Site Reliability Engineer or Systems Engineer
• Strong proficiency in Terraform and Ansible for infrastructure automation
• Hands-on experience with Kubernetes, Docker, or other container orchestration tools
• Proficiency in scripting languages such as Python or Bash
• In-depth knowledge of Google Cloud Platform (GCP) services including compute, networking, storage, Kubernetes, and security
• Solid understanding of VMware virtualization and enterprise storage systems (e.g., Pure Storage)
• Experience with networking technologies including VLANs, VPNs, and routing protocols
• Strong grasp of IT infrastructure and operations principles, including systems integration and automation best practices
• Excellent communication and collaboration skills
• Ability to manage multiple priorities under pressure with strong problem-solving skills

Preferred Qualifications

• Terraform Associate certification
• GCP certification (e.g., Cloud Architect)
• Relevant certifications such as ITIL, PMP, or CISSP
• Experience in regulated or enterprise environments

Core Competencies

• Communication and collaboration across technical and business teams
• Problem-solving and analytical thinking
• Ownership and accountability for system reliability
• Adaptability to emerging technologies and changing business needs
• Leadership and mentorship within technical teams

Job details

Job type Permanent

Location Atlanta, GA

Reference JOB-4872

Apply now

"*" indicates required fields

Step 1 of 3

Accepted file types: pdf, doc, docx, txt, Max. file size: 4 MB.
If hired, will you now or in the future require sponsorship for employment visa status (e.g., H-1B visa)?
Employment with Tier4 Group and our clients may be contingent upon successfully passing a background check, in compliance with applicable laws. Do you consent to a background check if offered employment?
Are you legally authorized to work in the United States?*