NVIDIA GPU Cloud
What is NVIDIA GPU Cloud?(From NVIDIA)
NVIDIA GPU Cloud (NGC) provides researchers and data scientists with simple access to a comprehensive catalog of GPU-optimized software tools for deep learning and high performance computing (HPC) that take full advantage of NVIDIA GPUs. The NGC container registry features NVIDIA tuned, tested, certified, and maintained containers for the top deep learning frameworks. It also offers third-party managed HPC application containers, NVIDIA HPC visualization containers, and partner applications.
But, What is NVIDIA GPU Cloud...in english?
NVIDIA GPU Cloud is a "Cloud Platform" with options for on premises GPU Environments that are used for Deep Learning which includes learning for facial recognition, drug discovery, video captioning, speech recognition, autonomous driving. Now, these GPUs are obviously the some of the most powerful in the world but, without a WebUI the usefulness of these GPUs are limited to only the most hardcore tech companies for Deep Learning. I joined the team to design the WebUI layer to change that.
What does the new User need in the WebUI?
There are four main components to running Deep Learning on GPUs:
ACE(Accelerated Computing Environment) are different GPU environments, the 1,000's of GPUs in the Cloud, TitanX GPU Card, the NVIDIA Workstation, DGX-1 and the DGX-2.
Datasets are used for Deep Learning training. A great example is facial recognition for Facebook picture uploads. To build a Neural Network that powerful you need TONS of training data. So, imagine one million human face images and one million plant pictures in two directories that will be fed through a network to understand basic properties of a face and what is not a face. "Datasets" are those sets of training data.
Images(Containers) is an immutable file that's essentially a snapshot of a container. The Container wraps a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries, environment variables to run a particular Deep Learning job.
Job Creation is an amalgamation of the previous items. It is Container running on an ACE Instance(specific GPU set) taking in one or more Datasets and building what will be a result(or training model).
To recap, the user will be using our WebUI to build jobs with varying attributes to train neural networks and build training models.