|Reference # :||19-09048||Title :||Senior HPC Cluster and Linux System Engineer|
|Location :||Marina del Rey, CA|
|Position Type :||Contract|
|Experience Level :||Start Date / End Date :||08/12/2019 / 12/13/2019|
H-1's/Subcontractors/OPT's will not be considered
This position is located at Medical - Healthcare Institution's research facility (Information Sciences Institute) in Marina del Rey, CA
Work ranges from theoretical basic research, such as core engineering and computer science discovery, to applied research and development, such as design and modeling of innovative prototypes and devices.
ISI is seeking an experienced Senior HPC Cluster and Linux System Engineer to support core institute infrastructure and HPC services.
As a Senior Engineer at ISI, you will have the opportunity to work with other highly-skilled technology experts (IT members and Researchers) on complex systems and interesting technical challenges.
The candidate for the position of Senior HPC Cluster and Linux System Engineer must meet the following qualifications:
1. Ten (10) years of experience in the following fields: information technology, system administration, and high-performance computing cluster support and management.
2. Five (5) years of experience in high-performance computing cluster support and management.
3. Bachelor's degree in a relevant field such as computer science, computer information systems, etc., or equivalent combined education, training, and experience.
4. Expertise with multi-vendor management, security, and network/Internet protocols.
5. Expertise with administrating, monitoring, and maintaining secure Linux/UNIX operating systems (CentOS/RHEL, Ubuntu).
6. Working knowledge of machine learning algorithms and software frameworks (TensorFlow, PyTorch, Keras, CUDA, cuDNN, Caffe, Theano, etc.)
7. Proficient fundamental programming skills (Bash, Python, or similar languages).
8. Expertise with the HPC system software cluster management tool, job schedulers with Slurm
9. Expertise with the configuration management tools (Salt, Ansible, Puppet, etc).
10. Proficiency with low-latency/high-bandwidth interconnected infrastructure such as 10GigE.
11. Knowledge of HPC storage (FC, SAS) principles, file systems (ZFS, etc.), and compute node storage (NFS).
12. Ability to identify, troubleshoot, and resolve problems and manage system performance.
13. Demonstrated expertise in HPC cluster and scheduler planning, design, and implementation, involving both CPU and GPU resources.
14. Ability to drive technical leadership and management of complex large-scale computing system projects.
15. Experience establishing processes for maintaining system performance and managing best-in-class standards.
16. Excellent organization and communication skills.
The ideal candidate for the position of Senior HPC Cluster and Linux System Engineer has the following qualifications:
1. Master degree in a relevant field, such as computer science, computer information systems, etc.
2. More than five (5) years of experience in high-performance computing cluster support and management
3. Experience with virtualization infrastructures (VmWare).
4. Experience with container technologies (Docker, Singularity).
5. Experience with binding Active Directory to systems.
6. Experience with cloud computing (AWS, Azure).
THE WORK YOU WILL DO
The Senior HPC Cluster and Linux System Engineer collaborates with technical leadership in the design, development, installation, and maintenance of software for the Linux and HPC cluster systems. The Senior Engineer is responsible for managing the planning, implementation, availability, performance, security, maintenance, and repair of general and cluster infrastructure.
The Senior Engineer:
1. Drives the day-to-day operations for the Linux and HPC cluster systems by monitoring computing resource performance, managing configurations, and addressing security administration. Applies revisions to system firmware and software. Engages and collaborates with vendors to assist support activities as required.
2. Leads the development of new HPC software deployment plans, custom scripts, and testing procedures to ensure operational reliability for university researchers. Trains technical staff in the use of new software and hardware, either developed or acquired.
3. Oversees the maintenance and management of HPC researcher accounts for staff and ISI research groups. Leads the installation, modification, and maintenance of various research software applications for access on HPC clusters. Acts as a trusted technical advisor for researcher support and documentation on software applications and programs.
4. Designs, installs, configures, and performs document management for cluster infrastructure, including operating systems, job schedulers, resource managers, provisioning managers, configuration managers, SAN devices, network devices, and other components.
5. Investigates, debugs, and addresses researcher inquiries and requests efficiently through a customer issue ticketing system. Implements customer-focused resolutions efficiently. Communicates complex technical concepts in a simple, straightforward manner to address a broad range of stakeholders.
6. Creates opportunities to explore emerging technologies and technical developments to address expanding analytical requirements. Identifies new services and develops corresponding implementation plans. Advocates for best practices in the HPC field. Champions collaborative relationships with peer HPC research organizations when necessary.
7. Contributes to an inclusive environment that values differences by building and maintaining collaborative relationships with team members, peers, and organizational leaders. Actively embodies values and behaviors such as accountability, ethics, and best-in-class customer service. Contributes to a culture of trust and transparency by sharing information broadly, openly, and deliberately.
8. Supports the vision for IT department and Institute. Works closely with team members and management to implement and support effective solutions for HPC. Maintains currency with technology, standards, and best practices. Supports process improvement efforts within the team and across the organization.
Applies electrical and computer engineering principles in supporting research and development activities. Solves most engineering problems with minimal supervision. Coordinates with hardware engineers and software developers to select appropriate solutions. Monitors design concepts. Makes presentations at project meetings and internal design reviews. Creates and assembles engineering documentation for technical reports. May write sections of technical reports. Conducts tests as assigned. Documents test results. Stays informed of new developments and technologies. Minimum Education: Bachelor's degree Minimum Field of Expertise: Basic knowledge of computer system hardware, software, laboratory test equipment and modern system design methodologies.
inSync will consider for employment qualified applicants with criminal histories in a manner consistent with the City of Los Angeles Fair Chance Initiative for Hiring Ordinance.