โ† Back to jobs

Sr Site Reliability Engineer (Advanced Threat Protection)

PaloAlto Networks
FULL_TIME Remote ยท US , CA, United States, CA, US Posted: 2026-05-11 Until: 2026-07-10
Apply Now โ†’
You will be redirected to the original job posting on BeBee.
Apply directly with the employer.
Job Description
Our Mission At Palo Alto Networks , we're united by a shared mission-to protect our digital way of life. We thrive at the intersection of innovation and impact, solving real-world problems with cutting-edge technology and bold thinking. Here, everyone has a voice, and every idea counts. If you're ready to do the most meaningful work of your career alongside people who are just as passionate as you are, you're in the right place. Who We Are In order to be the cybersecurity partner of choice, we must trailblaze the path and shape the future of our industry. This is something our employees work at each day and is defined by our values: Disruption, Collaboration, Execution, Integrity, and Inclusion. We weave AI into the fabric of everything we do and use it to augment the impact every individual can have. If you are passionate about solving real-world problems and ideating beside the best and the brightest, we invite you to join us! We believe collaboration thrives in person. That's why most of our teams work from the office full time, with flexibility when it's needed. This model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes. Job Summary Your Career Palo Alto Networks is at the forefront of cloud-native infrastructure, where reliability, scale, and intelligent automation define the future of operations. As a Senior Site Reliability Engineer, you will design and operate the platforms that power our applications across Google Cloud Platform, AWS, and global data centers - and you'll push the boundary of what's possible by leveraging AI and machine learning to transform how we approach SRE. This isn't just about keeping the lights on. You'll build intelligent systems that predict incidents before they happen, automate root cause analysis, and continuously optimize our infrastructure. You'll be a critical bridge between engineering and our Infrastructure Platform, combining deep SRE expertise with AI-driven automation to deliver unprecedented levels of reliability and operational efficiency. If you're excited about applying AI to real-world infrastructure challenges - and you thrive in an environment where automation isn't just a nice-to-have but a core philosophy - this is your next career. Your Impact Design, build, and operate cloud infrastructure that enables reliable, rapid deployment of microservices with resilient operations and effective monitoring Leverage AI/ML to automate incident detection, root cause analysis, and remediation - reducing toil and accelerating mean time to resolution Build and integrate AI-powered tools (e.g., LLM-based agents, AIOps platforms) into SRE workflows for intelligent alerting, log analysis, and capacity planning Write automation code for provisioning and operating infrastructure at massive scale Develop self-healing systems that can automatically detect anomalies, diagnose issues, and take corrective action with minimal human intervention Work with development teams to ensure applications are production-ready, scalable, and reliable from the ground up Identify and drive opportunities to improve automation for code deployment, management, and observability of application services Establish end-to-end monitoring and alerting on all critical components, incorporating AI-driven anomaly detection and predictive analytics Participate in the on-call rotation supporting the platform and production applications Lead root cause analysis of critical business and production issues, building runbooks and automation to prevent recurrence Mentor other SREs on best practices in infrastructure orchestration, production troubleshooting, and AI-augmented operations Represent SRE in design reviews and work cross-functionally with engineering teams on operational readiness Qualifications Your Experience 5+ years of experience in DevOps, Site Reliability, or infrastructure engineering Expertise in multi-cloud environments - strong hands-on experience with Google Cloud Platform, AWS, and familiarity with OCI (Oracle Cloud Infrastructure) Experience designing and operating infrastructure across multiple cloud providers, including networking, identity management, and cross-cloud connectivity Expertise in Infrastructure as Code with tools such as Terraform, Ansible Strong proficiency in Python and shell scripting for automation Strong experience with Linux and distributed systems handling high-volume transactions Familiarity with CI/CD pipelines, GitLab, and Artifactory Strong fundamentals in HTTP, web servers, and networking BS or MS in Computer Science, a relat