Site Reliability Engineer

eTek IT Services, Inc. • San Mateo, CA, US • 3m ago

Position : Site Reliability Engineer

Location : San Mateo, CA

Required Skills• Must Haves: 3 to 5 years exp. Kubernetes, DataDog, cloud services, large scale systems, AWS&GCP, minor Azure • GKE, home strung clusters on prem, and AKS (Very Small), EKS • Consistent upgrades across all the clusters and clouds • Nice to Have: Gaming experience bonus

Additional Skills

Job Description

Who We Are

Founded in 2005, 2K Games is a global video game company, publishing titles developed by some of the most influential game development studios in the world. Our studios responsible for developing 2K’s portfolio of world-class games across multiple platforms, include Visual Concepts, Firaxis, Hangar 13, CatDaddy, Cloud Chamber, and HB Studios. Our portfolio of titles is expanding due to our global strategic plan, building and acquiring exciting studios whose content continues to inspire all of us! 2K publishes titles in today’s most popular gaming genres, including sports, shooters, action, role-playing, strategy, casual, and family entertainment.

Our team of engineers, marketers, artists, writers, data scientists, producers, thinkers and doers, are the professional publishing stewards of our growing library of critically-acclaimed franchises such as NBA 2K, Battleborn, BioShock, Borderlands, The Darkness, Mafia, Sid Meier’s Civilization, WWE 2K, and XCOM.

At 2K, we pride ourselves on creating an inclusive work environment, which means encouraging our teams to Come as You Are and do your best work! We are dedicated to diversity and inclusion, and want our community of candidates to reflect this commitment. We encourage all qualified applicants to explore our global positions.

2K is headquartered in Novato, California and is a wholly owned label of Take-Two Interactive Software, Inc. (NASDAQ: TTWO).

About the Team: Site Reliability Engineering (SRE)

The 2K Site Reliability team is responsible for the operations and infrastructure of all consumer-facing production systems and developer-facing systems at 2K Games, including NBA2K game services, customer-facing account services, and websites. This team handles systems and services spanning multiple datacenters both terrestrial and cloud-based.

What We Need:

We are looking for an expert engineer who is passionate about building multi-datacenter infrastructure and services. Robust systems and problem-solving skills are required as we develop solutions for game studios and support data centers around the world alongside a group of outstanding engineers. In this role, you will collaborate with network engineers, systems architects, and development staff to support our gamers and the needs of the business.

What you will do

What We Do

Build and operate highly resilient systems in a multi-datacenter and cloud global environment serving game and consumer services
Develop tools for the management and automation of the systems and service infrastructure
Define and implement standards that will impact systems, services, and multiple software environments
Diagnose and resolve technical issues from both internal and external customers and drive improvements to prevent them from recurring
Participate in Site Reliability Engineering’s on-call rotation

Who We Believe Will Be an Outstanding Fit

You are eager to work in a fast-paced environment with other highly skilled engineers who are passionate about service availability and health!

If the idea of building data center infrastructure services from greenfield to implementation moves you!

Required Qualifications

6+ years of demonstrated influence across one or more teams for large scale projects that drive impact and improvement across the organization
6+ years of experience in an SRE role for online services in a multi-region, multi-cloud environment with specific experience in reliability and resliency
6+ years of developing tools for automation of processes or augmenting off the shelf tool functionality
6+ years of AWS and/or GCP cloud experience running highly elastic mission critical workloads
6+ years of coding experience in at least one or more of Python, Ruby, Java, or Go and a good understanding of code management
6+ years of experience using Infrastructure as Code tools like Terraform, Pulumi, or others
Extensive knowledge of software build, test, and deploy processes using Git, Jenkins, Puppet, Ansible, Docker/containers, and Kubernetes
Experience with system analysis and troubleshooting
Serve as a mentor to junior engineers and provide technical leadership to the organization.

Bonus Points

Prior hands-on experience running large scale multiplayer video games at scale
Experience designing and crafting software for systems and network automation
Debugging, code optimization, and routine task automation skills
Demonstrated ability to decompose sophisticated problems. Ability to engage in lateral investigations.