Overview

About Scratch 

Scratch is a creative programming language and the world’s largest online coding community for children and teens. Children around the world use Scratch and ScratchJr to create their own interactive games, stories, and animations – and share their creations with one another. In the process, they learn to think creatively, reason systematically, and work collaboratively. In 2022, more than 33+ million young people around the world created projects with Scratch. 

The Scratch Foundation

Since its creation at the MIT Media Lab in 2007, Scratch use has grown dramatically. More than 120,000,000 people from every country in the world have created more than half a billion Scratch projects. In 2024, we’re responding to this growth by focusing on four strategic priorities: diversifying revenue sources; maintaining a high-quality experience for our existing users; re-engineering the platform; and implementing our programmatic and research work. 

Position Overview

At this exciting stage in our growth, the Scratch team is hiring a Senior Site Reliability Engineer who will identify and deliver software improvements using their expertise in software development, complexity analysis, and scalable system design. Strong collaboration skills will be required to work closely with other engineering teams to ensure services/systems are highly stable and performant, meeting the expectations of our wide range of users and stakeholders.  

Senior Reliability Engineer Responsibilities:

  • Lead and participate in the design and deployment of major shared infrastructure components to improve the availability and scalability of our services.
  • Partner with app engineering teams to develop and ensure adherence to our product SLOs by responding and resolving production issues.
  • Lead and scale our incident management, post-mortem processes, and on-call rotations.
  • Build tooling and automation to support and increase accessibility of our platform with the goal of increasing the velocity of our product engineering teams.
  • Support services from inception to delivery by bringing an eye towards: System Design, Scaling, Automation, Capacity Planning, Observability, and Reliability.
  • Educate and train engineers on developer tooling and standards around reliability.
  • Ideal Technology experience in or around:
  • Cloud: Amazon Web Services, Fastly, Cloudflare
  • Orchestration: Kubernetes
  • Metrics: Prometheus, Elasticsearch
  • Other duties as assigned.

Qualifications 

  • 5+ years of Site Reliability Engineering experience 
  • BA in Computer Science or related field 
  • Current AWS Certification is a plus but not required. 
  • Familiarity with database concepts. Experience with observability, monitoring, and dashboards (New Relic, Sentry, Datadog, Pingdom, Job Schedulers) 
  • Experience in incident, problem, change, and release management practices
  • Ability to communicate strategically to a range of stakeholders and facilitate thoughtful meetings.
  • Ability to adapt to shifting project needs.

This position pays between $145,000 – $165,000 and is a hybrid position that is based in New York City. To apply, please visit https://tinyurl.com/scratchcareers.

The Scratch Foundation is an equal opportunity employer. Scratch welcomes people of all ages, races, ethnicities, religions, abilities, sexual orientations, and gender identities. We especially encourage historically marginalized identities to apply.