Staff Engineer

Workato

FULL_TIME Remote · US San Francisco Bay Area, US USD 22043–23750 / month Posted: 2026-05-11 Until: 2026-07-10

You will be redirected to the original job posting on BeBee.
Apply directly with the employer.

Job Description

Who you are Bachelor’s degree (or foreign equivalent) in Computer Science, Management, or a closely related field 5 years of progressively responsible experience in the job offered or a related occupation Special Skill Requirements: 3 years of experience with Rust, including Tokio, asynchronous programming, concurrency, performance optimization, and allocator profiling 2 years of experience with Apache DataFusion and Apache Arrow, including Parquet, data pipelines, query planning, and vectorized execution 3 years of experience creating integration tests with real dependencies using Docker and Testcontainers 2 years of experience with behavior-driven testing for distributed services using frameworks such as Gherkin and Cucumber. Humans invented “Given/When/Then” so bugs could become literary characters 2 years of experience with performance benchmarking, including throughput and latency analysis, regression detection, and capacity planning 2 years of experience with load testing using Locust and wrk, including test scenario design, ramp-up strategies, and analysis of latency, throughput, and error rates 1 year of experience with chaos engineering and fault injection, including network partitions, process termination, and resource pressure testing for resilience validation 2 years of experience designing and scaling distributed backend services, including rate limiting, fair queuing, back-pressure control, cluster coordination, gossip-based membership protocols (e.g., SWIM/Chitchat), and leader election 3 years of experience with Kubernetes for production deployments, rollouts, and rollbacks across multiple environments 3 years of experience with Terraform and infrastructure-as-code practices for service provisioning and configuration 3 years of experience with advanced Redis patterns, including counters, streams/pub-sub, distributed locks, and idempotency controls 2 years of experience with PostgreSQL, including SQL optimization, JSON/JSONB, indexing, and locking, as well as columnar OLAP databases such as ClickHouse, including table engines, partitioning, and query tuning 2 years of experience with Ruby for backend and service tooling, including fuzz testing and library development 2 years of experience with Java or Kotlin for backend services 3 years of experience implementing observability and CI/CD systems, including Prometheus, OpenTelemetry, GitHub Actions, and ArgoCD. Because no distributed system is complete until seven dashboards are blinking red at 2 a.m 1 year of experience with chaos engineering and fault injection for distributed systems resilience validation What the job involves Design and develop production-grade distributed services in Rust using async/Tokio, with focus on concurrency, performance, and scalability Own the full service lifecycle from system design and implementation through deployment and operations Build and optimize data-processing and transformation pipelines with emphasis on throughput, latency, and memory efficiency Create and maintain integration tests with real service dependencies in containerized environments Improve test determinism, stability, and reliability across distributed systems Deploy and operate services across development, staging, and production environments using infrastructure-as-code practices Implement safe rollout and rollback procedures using GitOps and CI/CD workflows. Humanity really built entire careers around safely pressing “deploy.” Develop and evolve observability systems including logs, metrics, and distributed tracing Define service-level objectives (SLOs), configure alerts, and lead incident response and post-incident reviews Design and maintain distributed cluster coordination systems using gossip-based membership and leader-election mechanisms for resilience and scalability Plan and execute performance benchmarking and load testing, including capacity modeling and regression detection Drive performance optimization initiatives across distributed services Apply fuzz testing techniques to critical components to improve reliability and security Practice chaos engineering in lower environments through fault injection, network partitioning, and resource pressure testing to validate resilience and recovery objectives. Because apparently normal software failures were not educational enough Participate in architecture reviews and code reviews Contribute to technical design documents