36/7, West Rajiv Nagar
Gurgaon, India

91 Springboard
Co-working place 3rd Floor, 175 & 176,
Bannerghatta Main Rd, Dollars Colony
Bengaluru, Karnataka 560076

Innovate with HPC on Cloud

ProviderCompute InstanceInterconnectJob Scheduler
Azure H-series, HB-series, HBv2-series, HC-series, N series Infiniband, RDMA Azure Batch
GCP M2, C2 High speed SDN network backed by Google backbone Slurm
AWSHigh speed SDN network backed by Google backbone Elastic fabric Adapter (EFA) Slurm

High-performance computing is a very expensive technology. It is developed and maintained by only a few huge corporations like Shell, AstraZeneca, and Govt organization working in areas of meteorology and space exploration. A renewed importance of innovation in the globalized industrial scenario bring HPC to even small startups working in the domain like Machine learning and AI. HPC has always been at the pinnacle of research and innovation of problems of massive scale. According to an estimation an average internet user generates 1.5 GB of data per day, A self-driving car generates over 3000 GB data per day, A connected factory generates 1TB data every day. Huge computation power is required to process this data. By leveraging HPC technology huge datasets of terabyte or petabyte of size can be processed in minutes and hours despite weeks and months by traditional computing devices. Consequently, HPC has become a key technology for driving innovation out of big data.  Cloud has emerged as a disruptive technology for many traditional technologies including HPC. Organizations who are using the cloud has many benefits than creating and maintaining on-premise infrastructure.

HPC on Public Cloud

HPC workloads typically have huge peaks and valleys. There are time when underlining hardware working on full capacity and other time it is sitting idle. HPC workloads takes days and weeks to deliver results. It is a huge financial loss when compute capacity of petaflops are underutilized. In fact consideration of idle time is a major factor is designing of HPC facility. Top 500 which is a universal standard ranking system for HPC supercomputers have come up with the idea of Green500 to rank HPC systems for most cost saving and environment friendly facilities.  Cloud has proven to be a disruptive technology across many domains including HPC. Many organization across the world are choosing cloud alternative for expanding their on prem capabilities as well as to apply HPC for innovation. Cloud offers powerful computer systems for enabling many workloads including visual simulation, image rendering, data analytics using R on the cloud. Microsoft Azure offers fast F series CPUs suitable for faster execution of simulation and analytics. G series and M series CPUs can process huge datasets in memory for data processing and big data workloads. Google offers C2 and M2 VMs ideal for compute intensive and memory intensive workloads. It delivers 40% higher performance than other general existing VMs available in GCP. Amazon is enhancing its existing capability in near future by adding more compute engines dedicated from HPC workloads. Currently AWS has C5n driving its compute capabilities for HPC workloads. P3dn EC2 instance provides fastest machine learning training in the cloud. Amazon has developed Elastic Fabric Adaptor suitable for HPC workloads for low latency inter process communication. Azure has H-series VM instances. H series is primarily designed for executing HPC workloads on the cloud. Microsoft has partnered with Nvidia to develop systems powered with tesla V100 PCI Express based GPUs and V100 SXM GPUs on cloud. Nvidia developed parallel processing libraries which can makes it possible to use Nvidia GPUs for faster execution cycle along in parallel with CPUs. Azure N series VMs ideal for dynamic and rich video production and faster game development.

Azure is only provider of Infiniband and RDMA on the cloud. By adding Infiniband in their cloud offering Microsoft has revolutionized the HPC domain ahead of any other organization. Infiniband can deliver speed of upto 10 GBps between multiple CPUs. It has and added advantage to over CPU only architecture. Cloud offers choice of pre-emptible VMs, Spot instances for creating and scaling and cost effective HPC clusters on cloud. IT teams can dynamically decide on how many compute machines they want for their workload. Thousands of machines can be added to existing cluster to get faster results. Cloud has empowered IT groups to access HPC resources on cloud from anywhere in the world. Finally for organization who wants to access supercomputer for their jobs can get Cray supercomputer in Microsoft Azure.

Innovate faster on HPC

HPC is becoming increasingly important for key areas like Machine Learning, AI, Life sciences among many. To take an example Machine learning, training of the model depends upon large datasets. Huge systems with petaflops and teraflops of computing capacity can train a model faster than general computer systems. HPC is helping ML and AI researchers to become faster more productive. Deep learning has become a prime method of doing research activities in financial markets, drug discovery, speech recognition, computer networking, and many more. Deep learning improves the performance of model training by exploiting specific low-level instructions on GPUs along with CPUs. It spares professionals to move on to new ideas early and accelerate the innovation cycle. HPC can reduce the time of execution from weeks and months to a small number of hours. HPC is proven to be useful in advancing technologies irrespective of domain. Domains range from geology, physics to network design and cybersecurity have utilized the power of HPC to drive innovation in their respective domains and find new ways to break through bottlenecks in existing processes.

Article by Umesh Upadhyay

Leave a Reply