BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

NSF's New Initiative To Bring The Cloud Era To Academic Big Data Research

Following
This article is more than 6 years old.

Google

Earlier this month, the US National Science Foundation (NSF) announced a new collaboration with three major cloud vendors to provide computing credits for academic research. Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure each committed up to $3 million over three years in computing time on their platforms for academic research as part of the new NSF initiative, making some of the world's most powerful "big data" platforms far more readily available to power the next generation of research.

Researchers like myself have pushed NSF almost since the dawn of the modern commercial cloud to embrace these commercial computing platforms as part of its academic HPC program. Indeed, in my own research trajectory, I evolved from an avid “friendly user” of NSF-funded HPC systems for more than a decade to eventually turning completely to the commercial cloud for my work due to the historic mismatch between the academic world’s batch-oriented CPU-intensive IO-saturated model and the realtime scaling, effectively infinite storage and IO-first design of the commercial cloud.

If you’re running a traditional scientific simulation code that requires tight coupling between cores and can wait days for your code to run and have relatively minimal IO and storage needs, the traditional academic computing world is tailor-made for you, having been built from the ground up for precisely those kinds of applications. On the other hand, if your work is highly data parallelizable, has intense IO and/or storage demands or requires purpose-built hardware for neural network research, the academic world simply cannot compete with the commercial cloud.

Over the last few years, Amazon, Google and Microsoft have all launched special academic research and education programs that offer free or greatly reduced cost computing time on their computing clouds and even entire mini-clouds dedicated just to research. This new NSF program greatly expands this focus on enabling cutting edge academic research.

According to an NSF spokesperson, researchers applying for NSF grants have always been permitted to include the cost of commercial cloud computing time in their funding requests, with proposals dating back at least to 2011 doing so, but since those were direct costs, they necessarily displaced other expenditures. For example, instead of hiring an additional grad student, funds might have to go towards computing time. In this way, cloud computing time was considerably more expensive to researchers than time on NSF computing platforms, which were provided at no monetary charge through a separate allocation process. For certain classes of “big data” research, this frequently led to a mismatch in which a cloud platform could achieve the same result as an NSF system in a fraction of the time and resource requirements, but NSF’s computational allocations only covered their own HPC platforms.

Thus, this new program represents a fundamental transformation in NSF’s approach to computational allocations, finally placing “big data” projects on the same footing as the more traditional scientific workloads that have largely been the focus of the NSF-funded systems since the founding of the supercomputing centers three decades ago.

NSF emphasized that this new cloud initiative augments, rather than replaces, its traditional HPC program and that NSF will continue to fund the purchase of more traditional on-premises HPC resources to be allocated for academic research. Thus, more traditional scientific and engineering applications, such as simulation codes, can continue to rely on stable access to the latest generations of tightly coupled HPC systems, while the growing realm of “big data” applications (and increasingly neural network research) that are far more aligned with the environs of the commercial cloud and its unique data-first architectures, can now fully leverage those platforms as equal class citizens.

NSF’s new program will also have significant benefits for education, in that today’s science and engineering students are highly likely to make use of the commercial cloud in their jobs after graduation, meaning that gaining familiarity with the architectures and workflows of the commercial cloud will better prepare them for the real world.

Moreover, as companies like Google push the boundaries of AI, their software and hardware infrastructures are becoming the standard tools for deep learning research, meaning these new cloud offerings for academic research will have significant impacts on neural network research, greatly expanding access to the specialized hardware required to accelerate and scale leading edge approaches.

Putting this all together, NSF’s new collaboration with Amazon, Google and Microsoft represents a powerful step forward in broadening access to the unique capabilities of the commercial cloud while still maintaining NSF’s support of the unique needs of more traditional science and engineering computational workloads. In this way, academic researchers can now leverage the best of both worlds and it will be exciting to see how these blending of those worlds evolves, especially whether in future this leads to a more hybrid cloud approach, with codes and data moving seamlessly between the commercial cloud and on-premises academic HPC systems. In the end, this new collaboration is a tremendous win for academic research and at the same time stands testament to just how pioneering the data-first realtime architectures of the commercial cloud truly are.