Microsoft Azure delivers a diverse selection of cutting edge, highly available and secure services to a diverse and rapidly growing customer base delivering their businesses on the cloud. The Microsoft Azure Site Reliability Engineering (SRE) team represents a deep investment in improving the availability, reliability, security and release velocity our customers expect. We are looking for founding engineers to seed SRE teams as part of our global talent expansion in Sydney. You would be responsible for infrastructure and operability engineering that will be leveraged to build world-class services that are easy to operate, reliable, and used as building blocks for other services.
As a member of the Crisis Management Engineering (CME) team, you will work with an elite team of engineers to identify, design and develop solutions to difficult, novel problems to prevent and improve response to outages affecting customers on Microsoft Azure's massive infrastructure. Your work in this role will use cutting edge technologies and industry concepts to directly prevent millions of minutes of downtime for customers worldwide. As an engineer on the team, you will participate in design and development of solutions and contribute ideas to influence the direction of the team. You will also participate in our global on-call rotation accountable for remediating the most critical outages impacting Azure services and customers.   Additional responsibilities:
- Proactively identify and resolve people, process and technology issues to reduce incident mitigation time, reduce toil and increase scalability
- Collaborate with your direct team and teams across Azure/Microsoft to research, architect, develop, and deliver solutions to problems in an agile development
- Lead response of and resolve Azure service outages affecting customers; write software to prevent problem recurrence
- Bachelor's degree or at least 4 years in a software engineering environment
- A minimum of 2 years experience in software engineering using object oriented programming (Java, Go, C#, or similar)
- Strong design, coding, problem solving and debugging skills
- Strong collaboration skills; working across teams and organizations is necessary to be successful
- Must be able to participate in a global on-call rotation
- Knowledge of Microsoft Azure, AWS or similar cloud computing platforms
- Expertise building extensible, high scale service platforms - Expertise in debugging and remediating issues in large-scale distributed systems
- Experience with machine learning algorithms or statistical analysis
- Experience as a crisis manager leading response to service outages
You will be required to pass Microsoft background checks prior to the start of employment and periodically thereafter. Further details regarding this process will be provided in follow up correspondence.     AZSRE   #PIEJOBS
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to email@example.com.