Senior/Staff Site Reliability Engineer (Remote) - US/Canada
Narvar
Job Location
Job Type
Full TimeVisa Sponsorship
AvailableHires remotely
Relocation
AllowedHiring contact
Amit SharmaThe Role
Narvar is growing! We are hiring Site Reliability Engineers to lead cloud ops & data infrastructure for all of the Narvar products. You will lead reliability, scalability, & availability of our overall infrastructure with an eye towards automation - optimizing for a reduction in MTTR & operational cost. You are a person who is self-motivated, scrappy, and willing to learn and take action.
Our SRE team is responsible for
- Reliability & Availability
- Developer Experience
- Operational Experience
Day-to-day:
- Provide expert technical guidance and ongoing engineering design review to teams (ranging from 10-15 or 3 teams of 4, etc) planning and implementing large migrations, broad architectural shifts, and capacity growth
- Build a metrics-driven operational culture standardizing our practices for SLO definition and review, logging, monitoring, alerting, and on-call practices
- Make iterative improvements to blameless incident management processes, root cause analyses, outage prevention, and service recovery strategies
- Partner closely with Security, Quality, and Product teams to achieve high priority security, privacy, compliance, reliability and business-continuity objectives to the overall product roadmap
- You will be tasked with supporting all of the Software Supply Chain within Narvar
What we are looking for:
- Proven hands-on technical project leadership experience demonstrating business impact
- Software engineering and systems engineering skills (open source preferred) 5+ years
- Strong hands on production experience (at least 5 years) in cloud technologies like AWS/GCP
- Strong systems knowledge (operating systems, networking, etc.)
- Docker, Kubernetes, Jenkins, Envoy, Istio, Nginx
- Observability tools like: Prometheus, ELK, Grafana, Datadog
- Data intensive systems like: Cassandra, Yugabyte, Redis, MongoDB, Postgres, Kafka, Pulsar, Elasticsearch, etc.
- Service-oriented architecture
- Programming languages like: Python, Java, Golang, Rust, Typescript etc.
- Reliability engineering (SLOs, SLIs, Chaos engineering, performance and load testing, rate limiting, etc.)
- Experience helping the business establish Metrics, Service Level Objectives (SLO) (e.g., availability targets, reliability target)
- You have experience with Capacity Planning and Demand Forecasting
Bonus Points:
- CS Degree
- Prior speaking engagements in the related areas
- Publications around related topics
Why Narvar?
We're on a mission to simplify the everyday lives of consumers. We believe post-purchase is a critical phase of the customer journey. That's why we created Narvar - a platform focused on driving customer loyalty through seamless post-purchase experiences that allow retailers to retain, engage, and delight customers. If you've ever bought something online, there's a good chance you've used our platform!
From the hottest new direct-to-consumer companies to retail’s most renowned brands, Narvar works with Patagonia, GameStop, Neiman Marcus, Sonos, Nike and 850+ other brands. With offices in San Francisco, London, Paris, and Bangalore, we've served over 125 million consumers worldwide across 8 billion interactions, 38 countries, and 55 languages.
Pioneering the post-purchase movement means navigating into the unknown. Our team thrives on this sense of adventure while nurturing a mindset of innovation. We're a home for big hearts and we leave our egos at the door. We work hard but we always make time to celebrate professional wins, baby showers, birthday parties, and everything in between.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.