Job Purpose and Summary
The Senior Site Reliability Engineer (SRE) will lead the transition from legacy mainframe systems to a cloud-based infrastructure. This role involves making key architectural decisions, designing and delivering essential platforms, and ensuring that cloud services operate effectively. The SRE will also investigate production issues, identify root causes, and develop long-term solutions.
Key Responsibilities
- Architectural Leadership: Drive architectural decisions, establish best practices, and deliver technical capabilities based on industry expertise.
- Stakeholder Management: Manage customer expectations, influence technical decisions, and guide architectural principles while leading communities of practice in relevant technologies.
- Cloud Infrastructure Management: Engineer and support all aspects of the PHEAA AWS GovCloud, ensuring compliance with NIST 800-53, FedRamp, and Enterprise Architecture standards.
- Debugging Expertise: Take responsibility for debugging cloud-based distributed systems on AWS.
- Infrastructure as Code (IaC): Architect, engineer, and deliver solutions as IaC using Terraform alongside tools like Ansible.
- Performance Oversight: Lead the performance management of the AWS environment, including logging, monitoring, and reporting to meet compliance and audit requirements.
- CI/CD Support: Contribute to the development and support of CI/CD pipelines across various organizations.
- DevOps Culture: Champion an innovative DevOps mindset to guide PHEAA towards a cloud-native future.
- Cost Efficiency: Ensure that the cloud transformation delivers anticipated cost savings through strong architectural guidance and best practices.
- Cross-Functional Learning: Engage with industry experts to enhance capabilities across different areas.
- Technology Configuration: Configure essential cloud technologies (e.g., OpenShift, Kafka, MongoDB) to meet PHEAA’s cloud needs.
- VPC and Security Design: Recommend, design, and implement VPCs, account management, and security protocols within the PHEAA Cloud.
Minimum Qualifications:
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- 10-15 years of experience in infrastructure support, engineering, and architecture, with at least 5 years in cloud architecture and support (AWS preferred).
- Demonstrated experience with FedRamp and Federal Student Aid.
- Strong background in continuous integration and delivery, with experience in CI/CD pipelines.
- Advanced knowledge of DevOps/DevSecOps practices, including familiarity with SAFe Agile and Atlassian JIRA.
- Proficiency in Amazon Web Services (AWS), covering billing, core technologies, performance management, CloudWatch, and VPCs.
- Experience with MongoDB, PostgreSQL, and NoSQL databases.
- Familiarity with application container technologies (OpenShift/Docker/Kubernetes) and IT automation tools (e.g., Ansible).
- Expertise in Infrastructure as Code (IaC) with tools like Terraform, as well as scripting languages (Python, Bash, Perl).
- Strong problem-solving skills and excellent written and verbal communication abilities.
Required Certifications
- AWS Certified Cloud Practitioner
- AWS Certified Solutions Architect – Associate
- AWS Certified Solutions Architect – Professional
- AWS Certified Developer – Associate
Preferred Experience and Certifications
- Familiarity with DevOps CI/CD tools (e.g., Jenkins, GitLab, Team City).
- Experience with monitoring and logging tools (e.g., New Relic, ELK, Splunk).
- Background in Agile project management.
- AWS Certified DevOps Engineer – Professional.
- AWS Certified Security – Specialty.