Service Reliability Engineering Architect
Oracle, the world leader in Enterprise
Cloud, is hiring the best and brightest technologists in the industry as
we continue to add customer-centric, world-class, leading edge, secure,
hyper-scale based solutions throughout all levels of the cloud stack.
Oracle's cloud eco-system is the only complete business cloud platform
on the planet, with market leading and business transforming solutions
spanning SaaS, DaaS, PaaS and IaaS. Oracle's Cloud applications, such as
Enterprise Resource Management, Customer Relationship Management, Human
Capital Management, and Supply Chain Management are used by thousands
of customers across the globe and are the broadest, most innovative in
the industry, providing businesses with adaptive intelligence,
standardized business processes and competitive advantage at low cost.
part of market leading ERP Cloud, Oracle ERP Cloud offers a broad suite
of modules and capabilities designed to empower modern finance and
deliver customer success with streamlined processes, increased
productivity, and improved business decisions.
Cloud Operations is looking for passionate, innovative, high caliber,
team oriented super stars that seek being a major part of a
transformative revolution in the development of modern business cloud
based applications. We are seeking highly capable, best in the world
developers, architects and technical leaders at the very top of the
industry in terms of skills, capabilities and proven delivery; who seek
out and implement imaginative and strategic, yet practical, solutions;
people who calmly take measured and necessary risks while putting
Key Tasks and Responsibilities
- Provide technical leadership and help drive cross-team software
engineering efforts to build systems and services that improve
operational efficiency, increase velocity of product delivery, and drive
reliability, scale and performance.
- Participate in feature design
reviews, new market planning and other cloud expansion forums to ensure
Monitoring, Telemetry, Reliability, Automation, and Runtime
Debuggability is represented as a first class, design time, and testing
- Lead and drive focus on proactive improvements in the
reliability and availability of the Oracle SaaS/ERP services to reduce
the customer outage time and reduce cost in outage remediation.
- Participate in the creation and improvement of processes, patterns and
practices for incident response, postmortem/root cause analysis,
end-to-end repair item definition, fixes in production. This includes
both customer and internal communication throughout the entire process.
- Lead critical architectural discussions across the organization in
order to provide guidance on software engineering patterns and practices
increasing reliability and resilience of Oracle SaaS/ERP services.
- Define implementation standards and guidelines for Oracle SaaS/ERP
based on industry leading patterns and practices in the creation of high
scale cloud services.
- Guide teams in core architectural decisions
making for some of the largest scale distributed systems in the world
ensuring that customer need, business need, and technical needs are met
throughout the decision processes.
- Partner with Operations SRE to
define architectures, instrumentation frameworks, and systems for
Telemetry, Deployment Automation, Configuration Management as well as
Monitoring, Detection, and Remediation of errors, faults and failures to
ensure that what was done manually today is done by machines tomorrow.
- Provide technical leadership, and help drive cross functional team
activities designing, enhancing, implementing and scaling underlying
full-stack, e.g. operating platforms, technology layers and frameworks.
- Participate in industry conferences, meetups, blogs, and other forms of
communication to keep Oracle current with industry best practice, and
drive improvements in state-of-the-art architectures and solutions for
Telemetry, Deployment Automation, Configuration Management as well as
Monitoring, Detection, and Remediation of errors, faults and failures.
Skills and Qualifications
- Minimum of 10 years of software development experience, with
demonstrated knowledge of professional software engineering best
practices for the full software development life cycle, including coding
standards, code reviews, source control, build and release processes,
continuous deployment, and test suite development and maintenance.
Practical experience running large scale online systems built on Cloud platforms.
At least 5 years of experience designing and implementing solutions for
platform and application layer telemetry and monitoring.
- Experience coordinating resources across diverse teams to restore
service and maintain SLAs, ITIL certification is preferred.
- Minimum of 7 years of experience with Java, C#/C and SQL dialects, as
experience is a plus.
Automation experience (test, integration, build/release, etc.) in a distributed environment.
Troubleshooting skills across network, application, caching, queuing,
load-balancing, storage and distributed services layers.
to analyze network and performance monitor traces, application
performance problems, and Windows application and crash-dump debugging.
Ability to conceptualize a distributed service, its dependencies and the transactional flow when troubleshooting.
Experience providing technical leadership and architectural guidance to teams working on complex software projects.
Excellent written and verbal communication skills, including the
ability to communicate technical content to both technical and
non-technical peers, customers, and at times, executive leadership.
- Self-driven to keep moving things forward even in the face of ambiguity
and imperfect knowledge (resilient to hazards of analysis paralysis).
Detailed Description and Job Requirements
Analyze, design develop, troubleshoot and debug software programs for commercial or end user applications. Writes code, completes programming and performs testing and debugging of applications.
As a member of the software engineering division, you will specify, design and implement major changes to existing software architecture. Create new architecture for a moderate size product or a portion of a major product. Build and execute unit tests and unit test plans. Review integration and regression test plans created by QA. Communicate with QA and porting engineering to ensure consistency, testability and portability across products in general.
Provide leadership and expertise in the development of new products/services/processes, frequently operating at the leading edge of technology. Recommends and justifies major changes to existing products/services/processes. BS or MS degree or equivalent experience relevant to functional area. 8 or more years of software engineering or related experience.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status or any other characteristic protected by law.
Job: Product Development
Location: US-CA,California-Redwood City
Job Type: Regular Employee Hire