VP, Platform Engineering & Reliability

June 1, 2026

Other Jobs To Apply

No other job posts for this day.

About the position

Responsible for defining and executing the strategy for the company’s Platform Engineering, DevOps, and Site Reliability Engineering (SRE) functions. This leader owns the reliability, scalability, security, operational governance, and engineering enablement capabilities of the platform ecosystem while enabling high-velocity, high-quality software delivery across engineering teams. The role operates at the intersection of architecture, operations, developer productivity, and platform governance, establishing technical direction, engineering standards, and operational maturity across the enterprise. This is a hands-on technical leadership role. While not expected to operate as an individual contributor on a daily basis, the leader must possess deep implementation credibility and be capable of reviewing, challenging, and guiding complex platform engineering decisions in depth. The successful candidate will have prior experience designing and operationalizing enterprise-grade CI/CD, observability, reliability, and platform governance capabilities in production environments.

Responsibilities

Establishing scalable and reliable platform engineering capabilities that improve developer velocity without sacrificing governance or operational integrity.
Standardizing CI/CD, observability, deployment traceability, and operational controls across engineering organizations
Improving production reliability, incident response effectiveness, and operational transparency
Reducing deployment risk through automation, validation, and platform-enforced engineering standards
Building internal platforms that are consumable, scalable, secure, and treated as products.
Creating clear operational accountability through measurable reliability objectives, telemetry standards, and runtime visibility
Enabling engineering teams to move faster while maintaining consistent enterprise standards
Define and execute the long-term strategy for platform engineering, DevOps, and reliability.
Design and evolve scalable, secure, resilient, and operable platform architectures.
Establish internal platforms as productized capabilities with defined roadmaps, SLAs, adoption metrics, and operational standards.
Define enterprise-wide standards for deployment automation, observability, runtime traceability, and infrastructure governance.
Drive consistency across cloud, containers, and deployment ecosystems while enabling appropriate team autonomy.
Lead CI/CD strategy, automation, and engineering enablement initiatives across the organization.
Establish best practices for infrastructure-as-code, configuration management, release governance, and environment consistency.
Implement platform guardrails and automated controls that reduce operational variance without unnecessarily slowing engineering delivery.
Ensure deployment pipelines support: automated quality enforcement, artifact traceability and immutability, release validation, reproducible deployments and operational auditability.
Drive standardization across heterogeneous tooling ecosystems while remaining technology-flexible where appropriate.
Lead the maturity and adoption of SRE principles and operational excellence practices.
Define and operationalize SLIs, SLOs, error budgets, and incident management standards.
Improve availability, scalability, performance, and resiliency across production systems.
Establish standardized telemetry and distributed tracing practices across services and infrastructure.
Ensure operational visibility supports rapid root-cause analysis and production recovery.
Drive post-incident learning through blameless postmortems and systemic corrective actions.
Ensure platform capabilities meet enterprise security, compliance, and regulatory expectations.
Build reliability, operational governance, and security into platform design by default.
Partner closely with security, compliance, architecture, and engineering leadership teams
Establish software supply chain governance practices, including deployment traceability, artifact lineage, and auditability.
Improve operational transparency through standardized runtime metadata and observability practices.
Lead, mentor, and scale high-performing platform engineering, DevOps, and SRE teams.
Establish a culture of operational ownership, engineering accountability, and continuous improvement.
Balance developer productivity with platform governance and enterprise reliability requirements
Influence engineering strategy and technical direction across organizational boundaries.
Partner with executive leadership to align platform investments with business objectives, delivery goals, and operational risk management.

Requirements

10+ years of engineering experience with deep expertise in platform engineering, cloud infrastructure, DevOps, SRE, or distributed systems
Proven experience leading platform engineering, DevOps, infrastructure, or reliability organizations at scale
Strong understanding of modern cloud and infrastructure ecosystems (AWS, Azure, GCP, or equivalent)
Deep knowledge of distributed systems, operational engineering, automation, and reliability practices
Demonstrated ability to operate effectively at both strategic and deeply technical levels.
Strong executive communication, organizational leadership, and cross-functional influence skills
Experience driving enterprise engineering standardization and operational maturity initiatives preferred.
Candidates should demonstrate prior ownership and implementation experience in several of the following areas within production-scale environments: CI/CD & Release Engineering: Designing and governing enterprise CI/CD platforms and deployment automation strategies
Implementing automated quality gates and release validation processes using modern CI/CD tooling
Establishing end-to-end deployment traceability across source control, build, release, and runtime environments
Managing artifact integrity, versioning, and immutable deployment practices
Supporting complex or heterogeneous tooling ecosystems across engineering organizations
Observability & Distributed Systems: Implementing enterprise observability strategies across distributed systems
Establishing standardized telemetry practices using OpenTelemetry or equivalent technologies
Implementing logs, metrics, traces, and correlation strategies that support effective production diagnostics.
Integrating deployment and runtime metadata into observability workflows for operational analysis
Platform & Container Engineering: Designing standardized containers and runtime strategies at organizational scale
Establishing secure and maintainable image lifecycle practices
Implementing environment consistency and operational standards across SDLC stages
Designing reusable platform capabilities that reduce duplication and operational inconsistency.
Production Reliability & Operational Engineering: Diagnosing and resolving complex production failures across application, infrastructure, and dependency layers
Implementing resiliency and failure-handling patterns within distributed systems
Improving operational recovery, incident response maturity, and service reliability
Driving systemic operational improvements based on production learnings and reliability metrics.

Nice-to-haves

Signals This Role May Not Be the Right Fit: This role is likely not a fit for candidates whose experience is primarily limited to: Operating tools without ownership of platform standards, governance, or engineering enforcement models
Managing teams without direct involvement in architecture, implementation guidance, or operational decision-making
Reliance on manual operational controls in place of automated engineering governance
Limited exposure to distributed systems reliability, observability, or production-scale operational troubleshooting
Observability practices focused only on logging without broader telemetry and correlation strategies.
Limited experience balancing engineering autonomy with enterprise operational standards

Benefits

Competitive compensation package based on experience, skillset and overall fit for #TeamloanDepot.
Inclusive, diverse, and collaborative culture where people from all backgrounds can thrive
Work with other passionate, purposeful, and customer-centric people
Extensive internal growth and professional development opportunities including tuition reimbursement
Comprehensive benefits package including Medical/Dental/Vision
Wellness program to support both mental and physical health
Generous paid time off for both exempt and non-exempt positions

Back to blog