Staff Software Engineer, Site Reliability Engineering, GCS
Company: Google
Location: Cupertino
Posted on: March 27, 2026
|
|
|
Job Description:
Minimum qualifications: Bachelor’s degree in Computer Science, a
related field, or equivalent practical experience. 8 years of
experience with software development in one or more programming
languages. 3 years of experience leading projects. 3 years of
experience designing, analyzing, and troubleshooting distributed
systems. Preferred qualifications: Masters degree in Computer
Science or Engineering. 3 years of experience working with
algorithms, data structures, analysis and software design.
Experience in one or more of the following: C, C++, Java, Python,
Go, Perl or Ruby. About the job Site Reliability Engineering (SRE)
combines software and systems engineering to build and run
large-scale, massively distributed, fault-tolerant systems. SRE
ensures that Google Clouds services—both our internally critical
and our externally-visible systems—have reliability, uptime
appropriate to customers needs and a fast rate of improvement.
Additionally SRE’s will keep an ever-watchful eye on our systems
capacity and performance. Much of our software development focuses
on optimizing existing systems, building infrastructure and
eliminating work through automation. On the SRE team, you’ll have
the opportunity to manage the complex challenges of scale which are
unique to Google Cloud, while using your expertise in coding,
algorithms, complexity analysis and large-scale system design. SREs
culture of intellectual curiosity, problem solving and openness is
key to its success. Our organization brings together people with a
wide variety of backgrounds, experiences and perspectives. We
encourage them to collaborate, think big and take risks in a
blame-free environment. We promote self-direction to work on
meaningful projects, while we also strive to create an environment
that provides the support and mentorship needed to learn and grow.
As a part of the Google Cloud Storage (GCS) SRE, you will help
scale one of the world’s largest object storage systems. You will
collaborate with a supportive global team and partner with
developers to launch features or drive reliability projects with
our Sydney-based engineers. As a Tier 1 on-call member, you will
leverage operational insights to lead high-impact engineering
projects and mentor talent. Your role will offer a unique
opportunity to support reliability and directly influence the
technical evolution of a critical, global Google service. Behind
everything our users see online is the architecture built by the
Technical Infrastructure team to keep it running. From developing
and maintaining our data centers to building the next generation of
Google platforms, we make Googles product portfolio possible. Were
proud to be our engineers engineers and love voiding warranties by
taking things apart so we can rebuild them. We keep our networks up
and running, ensuring our users have the best and fastest
experience possible. The US base salary range for this full-time
position is $207,000-$300,000 bonus equity benefits. Our salary
ranges are determined by role, level, and location. Within the
range, individual pay is determined by work location and additional
factors, including job-related skills, experience, and relevant
education or training. Your recruiter can share more about the
specific salary range for your preferred location during the hiring
process. Please note that the compensation details listed in US
role postings reflect the base salary only, and do not include
bonus, equity, or benefits. Learn more about benefits at Google .
Responsibilities Manage the complete life-cycle of services, from
inception and design, through to deployment, operation and
refinement. Enable service launches by conducting system design
consultation, developing software platforms and frameworks,
capacity planning and performing launch reviews. Maintain services
once they are live by measuring and monitoring availability,
latency and overall system health. Scale systems sustainably
through mechanisms like automation, and evolve systems by pushing
for changes that improve reliability and velocity. Lead sustainable
incident response and blameless postmortems.
Keywords: Google, Antioch , Staff Software Engineer, Site Reliability Engineering, GCS, IT / Software / Systems , Cupertino, California