Senior Site Reliability Engineer
Mesh
About Mesh
Founded in 2020, Mesh aims to build an open, connected, and secure financial ecosystem. As a modern financial operating system, Mesh enables deposits from exchanges and wallets, digital asset transfers, crypto payments, account aggregation, and trading within a unified platform. With over 300 integrations, Mesh is pioneering the creation of an embedded financial ecosystem that benefits both businesses and users. Mesh has raised over $40M in funding and is backed by notable investors including Money Forward, Galaxy, Samsung Next, PayPal Ventures, and more. For more information, visit https://www.meshconnect.com/.
Job Description
Are you passionate about building resilient and scalable systems? Mesh is seeking a Senior Site Reliability Engineer (SRE) to join our infrastructure team. As a crucial team member, you will play a key role in designing, implementing, and maintaining the infrastructure that powers our financial solutions. This a hybrid position out of our San Francisco office. Ideally, you will work from our office at least 1x per week.
Responsibilities
- Design and implement highly available and scalable infrastructure to support Mesh’s financial platform.
- Collaborate with engineering teams to define and implement reliability standards and best practices.
- Monitor system performance and identify and address potential issues to ensure uptime and optimal performance.
- Automate infrastructure provisioning, configuration, and deployment processes to streamline operations.
- Lead incident management efforts, including overseeing incident response processes, coordinating with cross-functional teams, and conducting post-incident reviews to identify root causes and implement preventative measures.
- Design and implement incident response playbooks and escalation procedures to ensure timely and effective resolution of incidents.
- Collaborate with software engineering teams to improve system reliability through automated testing, fault tolerance, and disaster recovery planning.
- Mentor junior team members and contribute to the continuous learning and development of the SRE team.
Qualifications
- 5+ years of experience as a Site Reliability Engineer or similar role, preferably in the fintech or crypto industry.
- Proficiency in programming languages such as Python, Go, Java, or C# with an emphasis on automation and scripting.
- Deep understanding of cloud platforms such as AWS, Azure, or Google Cloud Platform.
- Experience with container orchestration and related tools, such as Kubernetes, and containerization technologies such as Docker.
- Strong knowledge of Infrastructure as Code (IaC) principles using tools like Terraform
- Expertise with monitoring and logging tools such as Data Dog, New Relic, Prometheus, Grafana, or similar.
- Excellent troubleshooting skills and the ability to diagnose and resolve complex technical issues.
If you are passionate about building simple solutions to complex problems and are obsessed with creating and building value, we would love to hear from you. Join our dynamic team and contribute to the development of cutting-edge fintech solutions!