SRE Tech Lead, Americas

Edge & Node

Edge & Node

North America · South America · Remote
Posted on Monday, May 13, 2024

Edge & Node is at the forefront of web3 innovation. Our mission is to establish The Graph as the unbreakable foundation of open data. Our pioneering subgraphs set the industry standard and solidify The Graph as the premier solution for organizing and accessing blockchain data.

At Edge & Node, we champion a decentralized future based on shared values. Dedicated to decentralizing power and resisting censorship, we aim for a robust, permissionless information era free from central control, thus eliminating the traditional vulnerabilities associated with misplaced trust.

The Site Reliability team works closely with Engineering teams across Edge & Node to ensure the services we operate are reliable, performant, and predictable. We focus on a mix of software development, operational automation and collaboration with other teams to help take our service delivery to the next level.

We are looking for a skilled SRE Tech Lead that can provide leadership and oversight of the development and automation of the various services E&N operates as part of the Graph ecosystem, while also contributing on the team as a hands-on SRE team member. In this role, you will have the opportunity to drive availability and reliability across multiple engineering teams and work closely with them to ensure the operational aspects of managing services is automated and observable. You will partner closely with the team’s manager.

What You’ll Be Doing

  • Working as a trusted partner with engineering management to identify and prioritize team epics/OKRs, cost cutting and optimization initiatives, and cross-team projects through the Shape Up process (contributing to product roadmaps by representing SRE activities/resource planning)
  • Collaborating with management to prioritize the needs of internal customers
  • Performing SRE duties as a regular part of your role
  • Developing end-to-end documentation and instrumentation of our systems to ensure visibility, automation, self-recovery, and resiliency throughout all areas of the stack
  • Owning the postmortem process and ensuring closure of corrective actions
  • Driving automation and monitoring/alerting strategies
  • Architecting, building, and supporting core services that are critical to our business
  • Influencing infrastructure and architecture decisions with a focus on reliability, security, and scalability
  • Coaching teams across the Graph ecosystem on best practices for deployment, observability and scalability
  • Collaborating with other SREs and engineers to ensure our architecture and operations are world-class
  • Building and maintaining relationships with external vendor teams, as needed
  • Learning on-call processes and systems, owning triage resolution/coordination across teams, and training team members to react to production issues
  • Participating in on-call rotation

What We Expect

  • Previous experience as an experienced SRE and/or tech lead in the SRE/Devops space
  • Ability to work with engineering management to set priorities and objectives for the SRE team while also working alongside SREs to accomplish priority tasks
  • Experience building and delivering large-scale software systems
  • Previous experience working with both bare metal infrastructure and cloud infrastructure (ideally GCP)
  • Experience operating as an SRE with hands-on experience implementing processes that drive reliability and performance
  • History of working across organizations to codify and implement best practices for both operation and optimization of software systems; knowledge of and ability to implement CI/CD best practices are considered a plus
  • Deep working knowledge of Kubernetes (or other container orchestration systems) and associated technologies
  • Clear communication skills (written and verbal) to document processes and architectures

About The Graph

The Graph is the indexing and query layer of web3. The Graph Network’s self service experience for developers launched in July 2021. Developers build and publish open APIs, called subgraphs, that applications can query using GraphQL. The Graph supports indexing data from multiple different networks including Ethereum, NEAR, Arbitrium, Optimism, Polygon, Avalanche, Celo, Fantom, Moonbeam, IPFS, and PoA with more networks coming soon. To date, tens-of-thousands of subgraphs have been deployed on the hosted service, and now subgraphs can be deployed directly on the network. Over 28,000 developers have built subgraphs for applications such as Uniswap, Synthetix, KnownOrigin, Art Blocks, Balancer, Livepeer, DAOstack, Audius, Decentraland, and many others.

If you are a developer building an application or web3 application, you can use subgraphs for indexing and querying data from blockchains. The Graph allows applications to efficiently and performantly present data in a UI and allows other developers to use your subgraph too! You can deploy a subgraph to the network using the newly launched Subgraph Studio or query existing subgraphs that are in the Graph Explorer. The Graph would love to welcome you to be Indexers, Curators and/or Delegators on The Graph’s mainnet. Join The Graph community by introducing yourself in The Graph Discord for technical discussions, join The Graph’s Telegram chat, and follow The Graph on Twitter, LinkedIn, Instagram, Facebook, Reddit, and Medium! The Graph’s developers and members of the community are always eager to chat with you, and The Graph ecosystem has a growing community of developers who support each other.

The Graph Foundation oversees The Graph Network. The Graph Foundation is overseen by the Technical Council. Edge & Node, StreamingFast, Messari, Semiotic and The Guild are five of the many organizations within The Graph ecosystem.