Platform Reliability Engineer - Digitization and Workflow Engineering

Location(s) US-TX-Dallas
Job ID
2020-67530
Schedule Type
Full Time
Level
Vice President
Function(s)
Software Engineer
Region
Americas
Division
Engineering
Business Unit
Developer Experience
Employment Type
Employee

MORE ABOUT THIS JOB

Drive the convergence of workflow platforms across the firm to promote process consistency and allow for the gathering and analysis of metrics.


The Workflow Engineering team builds world-class technology solutions for automating all kinds of critical business processes across Goldman Sachs. Our platforms manage millions of tasks and business decisions, and run tens of thousands of workflows daily in order to guarantee that vital business operations run on time.

RESPONSIBILITIES AND QUALIFICATIONS

HOW YOU WILL FULFILL YOUR POTENTIAL

  • Own runtime environment of a large scale globally distributed platform (1800+ machines)
  • Develop forward strategy to migrate to a hybrid cloud runtime
  • Balance feature development velocity and reliability with well-defined SLOs.
  • Autonomy to prioritize and escalate in order to achieve stated site reliability outcomes
  • Create sustainable systems, services and development practices to keep the estate scalable, resilient and available
  • Proactively engage and guide development teams to improve the lifecycle of developing and managing highly available systems through assertion of SRE principals
  • Passionate about managing operational risk, debugging intricate problems across a distributed stack

SKILLS AND EXPERIENCE WE ARE LOOKING FOR

  • 7+ years of experience developing distributed services, deployed across a small to medium runtime estate (200+ machines)
  • BS/MS degree in Computer Science or related technical field involving coding and / or systems engineering.
  • Proficiency in one or more of the following: Java, C++, Python.
  • Deep understanding of Java threading models, JVM performance and tuning
  • Hands on experience with Apache Geode or any distributed high availability data caching technologies
  • Hands-on experience with development, debugging and optimizing code, as well as automation
  • Advanced troubleshooting and debugging skills with JVM thread dumps, heap dumps, etc

Preferred Qualifications

  • Prior experience in SRE role
  • Understanding of distributed databases like Mongo, Cassandra or ElasticSearch
  • Understanding of container and container orchestration e.g. Docker, Kubernetes
  • Experience with open source messaging like Kafka/ Rabbit MQ etc.
  • Understanding of Linux kernel sub-systems
  • Working knowledge of solutions and control plane in AWS
  • Strong interpersonal skills, drive, and ownership.
  • Solving novel problems from first principles.
  • Experience with UI frameworks like Angular

ABOUT GOLDMAN SACHS

The Goldman Sachs Group, Inc. is a leading global investment banking, securities and investment management firm that provides a wide range of financial services to a substantial and diversified client base that includes corporations, financial institutions, governments and individuals. Founded in 1869, the firm is headquartered in New York and maintains offices in all major financial centers around the world.

© The Goldman Sachs Group, Inc., 2021. All rights reserved Goldman Sachs is an equal employment/affirmative action employer Female/Minority/Disability/Vet.