Senior Site Reliability Engineer, Engineering Operations

Chainlink

Chainlink

Software Engineering, Operations
Remote · United States
Posted on Friday, January 27, 2023
All roles with Chainlink Labs are global and remote-based. Unless otherwise stated, we ask that you try to overlap some working hours with Eastern Standard Time (EST). We encourage you to apply regardless of your location.
About Us
Chainlink is the industry-standard Web3 services platform that enables developers to build feature-rich Web3 applications with seamless access to real-world data and off-chain computation.
• Chainlink has helped enable $7T+ in transaction value since the start of 2022.
• Over 1,700 Web3 projects have integrated Chainlink services.
• Chainlink is live on 15+ blockchains with many having joined the Chainlink SCALE program.
• Chainlink is relied upon by industry-leading protocols like Aave, Compound, Paxos, Synthetix, and ENS.
• Chainlink has delivered 7.4B+ data points on-chain and onboarded 900+ decentralized oracle networks.
• Chainlink has established collaborations with Associated Press, Accuweather, AWS, Google Cloud, Meta, and Twilio.
• The world-class Chainlink Labs research team has won various awards for its work on distributed systems, security, and more.
Who we’re looking for:
• You’re focused on what matters most and ignore unimportant industry distractions.
• You take extreme ownership and deliver outstanding results.
• You have a growth mindset, seek out feedback and engage in constructive dialogue with others to help them grow.
• You move fast and evolve with rapidly advancing technologies.
• You want to be part of a team that excels and is committed to building the Chainlink Network and growing the Web3 ecosystem over the long term.
• You are welcoming toward a diverse network of participants joining an open, global standard.
• You’re excited about the future of Web3 and building a world powered by cryptographic truth.
At Chainlink Labs, our engineering team pushes the scale and capabilities of decentralized applications across the industry. The Chainlink Network holds >70% market share in the oracle space, solving real-world problems by enabling smart contracts to securely interact with off-chain data/computation.
We value talented and driven craftsmen who work collaboratively to tackle complex challenges, deliver product impact, and grow as builders. Join us and shape the future of blockchain technology and decentralized finance.
The Engineering Operations team supports product teams by providing specialized expertise required to keep products in a safe and reliable functioning condition while maximizing delivery velocity.
Engineering Operations functions as a security, release, and operational layer ensuring optimal balance between feature delivery velocity and product stability. We focus on minimizing operational risk, improving tempo, and automating manual processes that prevent software delivery from being repeatable and stable.
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Much of our software development focuses on building infrastructure and eliminating work through automation.
We are distributed across time zones and continents, and we embrace remote work. In the EngOps team, we follow the infrastructure-as-code approach and practice GitOps. Our on-call rotation uses the follow-the-sun pattern.
We all have different backgrounds and are determined to help you succeed no matter where you are or who you are. If you think you would do a great job at Chainlink, we are looking forward to speaking with you, even if you don't match 100% of the job requirements: those describe people we've usually had a great time working with, but they're not a tick-box exercise.

As a Site Reliability Engineer for the Engineering Operations team you will:

  • Maintain all on-chain and job orchestration configurations
  • Automate and reduce complexities around product operations
  • Evangelize and enact best practices as experts to guide high-quality Site Reliability Engineering
  • Make tooling user-friendly and accessible to create self-sufficient operational experts across the company and our network of Node Operators
  • Continue delivering operational tasks in agreed SLAs to expand scalability and reliability
  • Deliver high product velocity while protecting reliability and operability
  • Support production systems by being on-call

Your Impact

  • Deploy and maintain various externally-facing services
  • Improve the reliability and observability of Chainlink services
  • Provide our engineers with reliable automations and empower them to deploy and maintain Chainlink services in a repeatable and stable manner
  • Support monitoring services that watch over the entire Chainlink network
  • Support Incident Response by shortening the duration of incidents while keeping an active feedback loop that assures operations and reliability of our systems get better over time
  • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews
  • Engage in and improve the whole lifecycle of services—from inception and design, through to deployment, operation and refinement
  • Manage execution of project priorities, deadlines, and deliverables
  • Provide technical leadership for the local team and work closely with partner team technical leads

Skills and Qualifications

  • Excellent communication skills and a sense of ownership
  • 4+ years of relevant professional experience. You have a software engineering background and/or an operations background and have worked as an SRE or related role before
  • Experience architecting, developing, and troubleshooting distributed systems
  • Fluency on design patterns to build performant, resilient and highly available systems
  • Proficient software developer, you not only have the ability to read and write code, but also identify opportunities and implement sound solutions to automate routine tasks and eliminate toil
  • Experience with system architecture. You can create a design document for a performant and highly available application, involving multiple types of storage, cross-region load-balancing, caching layers and messaging infrastructure
  • Excitement for blockchain and Web 3.0
  • Be willing to go on-call. ​​Reliability is our most important feature, because on-call is an essential component of a reliable system we take it very seriously

Preferred Qualifications

  • Professional experience with Golang, TypeScript, or both
  • Experience running blockchain full node operator is a big plus
  • Experience with Chainlink as a developer or a node operator is a big plus
  • Comfort working with network protocols, proxies, and load balancers
  • Experience with CI/CD pipelines. You've worked on both software delivery and cloud-based services deployment
  • Experience with information security and DevSecOps
  • Experience working remotely in a distributed team
  • Experience with container orchestration

Our Stack

  • Some of the tools and services we use daily or almost daily are:
  • AWS; Terraform/Terragrunt; Kubernetes, Calico and ArgoCD; Prometheus and Grafana; GitHub Actions; Packer
  • We expect you to be comfortable with most of those tools and very proficient in several of them.
  • #LI-RD1
Privacy Policy and an Equal Opportunity Employer:
Chainlink Labs is an Equal Opportunity Employer. To request an accommodation in our recruitment process, please contact us at people@smartcontract.com.
Please see our Privacy Policy for more information about how we collect and use your application information.