My last article was on frameworks and includes information about framework optimizations for specific platforms and accelerators. After reading that material you might look at this title and wonder, don’t we have enough complexity to deal with already?. The answer is clearly yes, but, I have been following this topic for a while and wanted to introduce you to a recent development in just-in-time or run time compilers for machine learning that I hope you’ll also find interesting. I want to be clear that this information is most useful to software developers but I think it will help data scientists understand more about how to do large scale model training and inferencing and I promise to keep it short.
To understand the types of problems that a run-time compiler for data science could solve let’s examine some of the details of how TensorFlow is constructed. TensorFlow supports both CPU and GPU device types and is distributed in two separate versions. One of the most common operations used by deep learning frameworks like TensorFlow is matrix multiplication (matmul). TensorFlow has both CPU and GPU implementations of matmul that are optimized for their respective target devices. On a system with one or more GPUs, the GPU with the lowest ID will be selected to run matmul by default.
Every framework developer must consider what devices to support and what operations to optimize for those devices. Without the ability to do optimization at run-time, each framework developer must specifically code optimizations for each back end device or accelerator they choose to support. With the rapidly expanding number of both frameworks (m) plus hardware devices (n) this results in a large number of optimization coding efforts ( m * n) for compete coverage.
In order to address these challenges, Intel recently announced the open source release of nGraph, a C++ library, compiler and run-time suite for running Deep Neural Networks on a variety of frameworks and devices. The picture above shows where the nGraph core sits in the deep learning hardware and software stack. Since all frameworks could be optimized for all hardware platforms using a run-time compiler like nGraph, the engineering could be reduced to an m+n scale effort. The current version of nGraph is available from the following GitHub repository – NervanaSystems/ngraph.
How nGraph works. Every deep learning framework implements a framework-specific symbolic representation for their computations, called a computational graph. The nGraph community has developed a framework bridge for each supported framework. Developers can install the nGraph library and compile a framework with the library in order to specify nGraph as the framework back end you want to use. The appropriate framework bridge will convert the framework specific computation definitions into a framework-independent intermediate representation (IR) that can then be compiled into a form that can be executed on any supported back end device. The IR layer handles all the device abstraction details and lets developers focus on their data science, algorithms and models, rather than on machine code.
Community contributions to nGraph are always welcome. If you have an idea how to improve the library share your proposal via GitHub issues.
Resources for this article:
- Intel® nGraph™ – An Intermediate Representation, Compiler, and Executor for Deep Learning
- nGraph: A New Open Source Compiler for Deep Learning Systems
- High-performance TensorFlow* on Intel® Xeon® Using nGraph™
- Intel Nervana Graph: A universal deep learning compiler
- Accelerating Deep Learning Training and Inference with System Level Optimizations
- GPU Accelerated Analytics
- Pushing the Boundaries Of AI
Thanks for reading,
Phil Hummel