There are currently no job openings.

HPC Cloud Administrator

Job Description

HPC4Health is a Compute Canada and Compute Ontario project dedicated to provide High Performance Computing (HPC), bioinformatics, and software development support to Health institutions. The project is lead by the SickKids Hospital and the Princess Margaret Cancer Centre, part of UHN. The HPC4Health infrastructure is located in the SickKids’ PGCRL data centre (686 Bay Street) and it is maintained by HPC4Health’s centralized support team (SickKids Staff). The HPC4Health uses cloud technologies to provide HPC services to participating partners maintaining the highly demanding security standards required in biomedical research. This position would be working on the operations and development of the HPC4Health’s HPC cloud environment.

Employment Type:

Temporary, Full-Time (one year contract)

Posted:

Until Filled

Responsibilities

  • Support the HPC4Health’s HPC cloud system formed by more than 13,000 computer threads, high performance networks (infiniband and 10 GigE) and 4 PB of high performance computing storage.
  • Develop and integrate Openstack modules within the HPC cloud.
  • Provide guidance and support in all aspects of high-end computing research to a large community composed of researchers and clinicians from different hospital and health institutions in Ontario.
  • Manage hardware and interact with vendors support teams.

Desired Skills and Experience

Required Skills

    The successful candidate is required to have:

  • University degree in Computer Science or Engineering.
  • Demonstrated understanding of OpenStack modules (nova, neutron, cinder, etc), their function and configuration in a cloud environment.
  • Minimum of 5-7 years experience supporting HPC systems in a multi-user environment.
  • Experience supporting large storage devices (NAS) and good understanding of file systems like OneFS and XFS.
  • Experience configuring and managing HPC workload management and scheduling software suites.
  • Experience and understanding of user support best practice and Help Desk ticket systems.
  • Must possess excellent verbal communication skills and the ability to interact with scientific and technical audiences.

Additional Assets

    Qualifications and experiences below are considered an asset:

  • Good understanding of RedHat RDO-Manager and other OpenStack deployment and management methods.
  • Good understanding of Ethernet and Infiniband concepts (TCP/IP,VLANs, Partitions, IPoIB, SR-IOV, etc).
  • Experience with Linux HA technologies ( corosync, pacemaker).
  • Proficiency in Linux (RedHat, CentOS).
  • Good understanding of protocols like NFS, CIFS, LDAP, DHCP, DNS and NTP.
  • Working knowledge of scripting languages such as Bash, Perl and Python.
  • Experience installing, configuring and maintaining application tools and databases: Bright Computing, MySQL, PostgresSQL, RabbitMQ, Apache, Tomcat.