Site Reliability Manager, Traffic Trust SRE
Company: Google
Location: San Francisco
Posted on: April 1, 2026
|
|
|
Job Description:
info_outline X Applicants in San Francisco: Qualified
applications with arrest or conviction records will be considered
for employment in accordance with the San Francisco Fair Chance
Ordinance for Employers and the California Fair Chance Act. Minimum
qualifications: Bachelor's degree in Computer Science, a related
technical field, or equivalent practical experience. 8 years of
experience in software engineering. 3 years of experience managing
people or teams. 3 years of experience building and developing
large-scale infrastructure or distributed systems. Preferred
qualifications: Experience leading high-performing Software
Engineering or Site Reliability Engineering teams. Experience with
the design, monitoring, and troubleshooting of distributed systems.
Experience being part of an on-call rotation and responding to
production incidents. About the job Site Reliability Engineering
(SRE) combines software and systems engineering to build and run
large-scale, massively distributed, fault-tolerant systems. SRE
ensures that Google Cloud's services—both our internally critical
and our externally-visible systems—have reliability, uptime
appropriate to customer's needs and a fast rate of improvement.
Additionally SRE’s will keep an ever-watchful eye on our systems
capacity and performance. Much of our software development focuses
on optimizing existing systems, building infrastructure and
eliminating work through automation. On the SRE team, you’ll have
the opportunity to manage the complex challenges of scale which are
unique to Google Cloud, while using your expertise in coding,
algorithms, complexity analysis and large-scale system design.
SRE's culture of intellectual curiosity, problem solving and
openness is key to its success. Our organization brings together
people with a wide variety of backgrounds, experiences and
perspectives. We encourage them to collaborate, think big and take
risks in a blame-free environment. We promote self-direction to
work on meaningful projects, while we also strive to create an
environment that provides the support and mentorship needed to
learn and grow. Traffic Trust Site Reliability Engineering (SRE)
looks after services that protect our systems from Denial of
Service (DoS) attacks, do the crypto magic needed to keep our
users' traffic private, and shield our users' Internet Protocol
(IP) addresses from Ad Tech companies. Our DoS protection works
mostly silently to protect Google and cloud customer services from
malicious overload, but when novel or huge attacks occur, our
infrastructure also provides Google and its customers the tools to
diagnose and mitigate those attacks. Behind everything our users
see online is the architecture built by the Technical
Infrastructure team to keep it running. From developing and
maintaining our data centers to building the next generation of
Google platforms, we make Google's product portfolio possible.
We're proud to be our engineers' engineers and love voiding
warranties by taking things apart so we can rebuild them. We keep
our networks up and running, ensuring our users have the best and
fastest experience possible. The US base salary range for this
full-time position is $207,000-$300,000 bonus equity benefits. Our
salary ranges are determined by role, level, and location. Within
the range, individual pay is determined by work location and
additional factors, including job-related skills, experience, and
relevant education or training. Your recruiter can share more about
the specific salary range for your preferred location during the
hiring process. Please note that the compensation details listed in
US role postings reflect the base salary only, and do not include
bonus, equity, or benefits. Learn more about benefits at Google .
Responsibilities Lead and develop a team of Site Reliability
Engineers (SREs) who are critical to Google Production and Google
Cloud reliability, as well as the security and protection of our
customers and services from attack. Design and implement changes to
our services, collaborating with other SREs and partner development
teams to deliver projects that improve service reliability and
security or enable the fast, safe delivery of new features and
capacity. Provide consulting services to product-aligned teams on
DoS, Crypto, and Privacy Proxy, focusing on general security,
capacity, and reliability topics. Be part of a Tier 1 on-call
rotation covering 12 of every 24 hours (10:00 AM – 10:00 PM), with
partner teams in London and Zurich covering the remaining 12 hours.
Manage incidents across the full lifecycle, from detection through
to post-mortem and beyond.
Keywords: Google, Antioch , Site Reliability Manager, Traffic Trust SRE, IT / Software / Systems , San Francisco, California