For us in the Boston area, we watched our Red Sox end their record-setting season by celebrating another World Series title.
Naturally, there has been much buzz about the team, as well as first-year manager Alex Cora and the clubhouse culture he built. However, it takes more than culture change to win, and Cora and the Red Sox front office recognize that. We live in a data-driven world, and that includes the world of baseball.
A recent Boston Globe story featured a good example of how the Red Sox use data-driven insights to make in-game decisions. It isn’t luck when outfielder Mookie Betts snags a fly ball that most observers would think he had no chance to catch. What places Betts in the ideal position is data, retrieved through AI and analytics that captures and analyzes the batter’s historical and future trends. The output of that learning is placed on an index card which Betts keeps in his back pocket, so he can move to the optimal position in the field before each at-bat.
Perhaps a Red Sox legend like Ted Williams would dismiss today’s approach to analyzing the other team. However, the data has always been there. Today’s difference is the availability of intelligent analysis through AI and Machine Learning. Modern tools to provide managers a modern twist to strategy.
Whether baseball or the business world, organizations are collecting vast amounts of new data points and racing to unlock its value to help make faster and better decisions.
At Dell Technologies, we help our customers deliver new outcomes through AI and ML. At the same time, we as a company are doing what our customers are doing – leveraging AI and ML to help us make better decisions and improve customer experiences and outcomes.
Before proudly sharing a few examples, I invite you to check out Dell Technologies “Unlock the Power of Data,” which was streamed as a virtual event for customers and partners on November 14. During the broadcast, trends in AI were discussed, use cases and examples outlined, and Dell Technologies AI capabilities demonstrated.
Delivering Targeted Healthcare Insights with AI and Machine Learning
The medical industry is well-positioned to be a top benefactor of the AI/ML evolution, enabling providers to better evaluate patients and personalize treatment options.
In this case, a regional healthcare provider partnered with Dell EMC Consulting to develop and implement a robust analytics research platform that would enable an extensive community of researchers and innovators to work more efficiently with faster and expanded access to critical data.
One such example is a recent collaboration between the healthcare provider’s data scientists and data scientists from Dell EMC Consulting. The teams together delivered new research targeted at the alarmingly high number of seizures that occur in hospitals, most of which are only detectable by brain monitoring with an electroencephalogram (EEG). Delayed diagnosis of such “subclinical seizures” leads to brain damage, lengthens hospitalization, and heightens the risk of in-hospital death or long-term disability.
The learnings from past EEGs would go a long way towards helping hospital physicians provide better diagnosis and treatment. However, there are two key challenges that make curating and mining the information difficult. First, patients’ EEG reports and the corresponding waveform data files are often stored separately and not clearly linked. Equally challenging is the ability to quickly extract useful information from the reports that describe clinically important neurophysiological events.
Using the new research platform and applying advanced AI and machine learning techniques, the joint team developed a highly accurate classifier for pairing the report files with the corresponding data. They also discovered several analysis techniques that are highly accurate in extracting the relevant information needed from the reports. With these two foundations, the team has established a highly effective and efficient data pipeline for clinical operations, quality improvement, and neurophysiological research.
Using Machine Learning to Automate the Offline
Dell’s eCommerce platform is the front door for the full range of customer inquiries from simple browsing to real-time support. However, did you know that Dell manages more than four million offline orders that arrive via fax and email each year? Our global Order Management and Support organization has traditionally executed those orders manually. However, a new solution was needed to improve order accuracy and cycle time.
Leveraging machine learning and the latest in Optical Character Recognition (OCR), Dell Digital developed Robotix — a scalable solution for digitizing offline purchase orders. Robotix improves the customer experience by processing orders faster and reducing pain points, while automating offline quality checks and customizing order entry instructions.
Robotix, currently patent-pending, is already live in North America and expected to automate the majority of global offline orders in its first full year of implementation.
Proactively Avoiding System Failure with SupportAssist
The millions of customer systems connected to Dell EMC around the globe can run trillions of variations of hardware and software configurations. These variations may be further influenced by factors such as geographic location and climate.
Given such a vast scope and size, the ability to predict and validate potential faults may seem like an impossible task. However, through the power of AI and ML, and the capacity of today’s Graphics Processing Units, our internal data scientists have built solutions that implement Deep Learning models to open a world of even more possibilities.
Today, SupportAssist, our automated proactive and predictive technology, is run on almost 50 million customer systems. Through this connected technology, Dell EMC can save customers from the potentially disastrous impact of downtime or data loss by alerting and remediating a potential hard drive failure on average 50 days before the failure occurs. And as our services technology continues to get smarter, customers will be empowered to make faster, better decisions about their IT, and address immediate issues while they plan for what’s next.
These are just a few of the many, many AI/ML use cases deployed either internally by Dell Technologies or externally by our customers. Yet, while implementations vary, there is a common thread tying them together that equates to success. The right people and processes, combined with these powerful technologies, that enable us to define and execute a vision that brings data and insights to life and makes transformation real.
|Update your feed preferences|
Deploying trained neural network models for inference on different platforms is a challenging task. The inference environment is usually different than the training environment which is typically a data center or a server farm. The inference platform may be power constrained and limited from a software perspective. The model might be trained using one of the many available deep learning frameworks such as Tensorflow, PyTorch, Keras, Caffe, MXNet, etc. Intel® OpenVINO™ provides tools to convert trained models into a framework agnostic representation, including tools to reduce the memory footprint of the model using quantization and graph optimization. It also provides dedicated inference APIs that are optimized for specific hardware platforms, such as Intel® Programmable Acceleration Cards, and Intel® Movidius™ Vision Processing Units.
- Model Optimizer
The Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices. It is a Python script which takes as input a trained Tensorflow/Caffe model and produces an Intermediate Representation (IR) which consists of a .xml file containing the model definition and a .bin file containing the model weights.
2. Inference Engine
The Inference Engine is a C++ library with a set of C++ classes to infer input data (images) and get a result. The C++ library provides an API to read the Intermediate Representation, set the input and output formats, and execute the model on devices. Each supported target device has a plugin which is a DLL/shared library. It also has support for heterogenous execution to distribute workload across devices. It supports implementing custom layers on a CPU while executing the rest of the model on a accelerator device.
- Using the Model Optimizer, convert a trained model to produce an optimized Intermediate Representation (IR) of the model based on the trained network topology, weights, and bias values.
- Test the model in the Intermediate Representation format using the Inference Engine in the target environment with the validation application or the sample applications.
- Integrate the Inference Engine into your application to deploy the model in the target environment.
Using the Model Optimizer to convert a Keras model to IR
The model optimizer doesn’t natively support Keras model files. However, because Keras uses Tensorflow as its backend, a Keras model can be saved as a Tensorflow checkpoint which can be loaded into the model optimizer. A Keras model can be converted to an IR using the following steps
- Save the Keras model as a Tensorflow checkpoint. Make sure the learning phase is set to 0. Get the name of the output node.
import tensorflow as tf
from keras.applications import Resnet50
from keras import backend as K
from keras.models import Sequential, Model
K.set_learning_phase(0) # Set the learning phase to 0
model = ResNet50(weights=‘imagenet’, input_shape=(256, 256, 3))
config = model.get_config()
weights = model.get_weights()
model = Sequential.from_config(config)
output_node = model.output.name.split(‘:’) # We need this in the next step
graph_file = “resnet50_graph.pb”
ckpt_file = “resnet50.ckpt”
saver = tf.train.Saver(sharded=True)
tf.train.write_graph(sess.graph_def, ”, graph_file)
2. Run the Tensorflow freeze_graph program to generate a frozen graph from the saved checkpoint.
tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph –input_graph=./resnet50_graph.pb –input_checkpoint=./resnet50.ckpt –output_node_names=Softmax –output_graph=resnet_frozen.pb
3. Use the mo.py script and the frozen graph to generate the IR. The model weights can be quantized to FP16.
python mo.py –input_model=resnet50_frozen.pb –output_dir=./ –input_shape=[1,224,224,3] — data_type=FP16
The C++ library provides utilities to read an IR, select a plugin depending on the target device, and run the model.
- Read the Intermediate Representation – Using the InferenceEngine::CNNNetReader class, read an Intermediate Representation file into a CNNNetwork class. This class represents the network in host memory.
- Prepare inputs and outputs format – After loading the network, specify input and output precision, and the layout on the network. For these specification, use the CNNNetwork::getInputInfo() and CNNNetwork::getOutputInfo()
- Select Plugin – Select the plugin on which to load your network. Create the plugin with the InferenceEngine::PluginDispatcher load helper class. Pass per device loading configurations specific to this device and register extensions to this device.
- Compile and Load – Use the plugin interface wrapper class InferenceEngine::InferencePlugin to call the LoadNetwork() API to compile and load the network on the device. Pass in the per-target load configuration for this compilation and load operation.
- Set input data – With the network loaded, you have an ExecutableNetwork object. Use this object to create an InferRequest in which you signal the input buffers to use for input and output. Specify a device-allocated memory and copy it into the device memory directly, or tell the device to use your application memory to save a copy.
- Execute – With the input and output memory now defined, choose your execution mode:
- Synchronously – Infer() method. Blocks until inference finishes.
- Asynchronously – StartAsync() method. Check status with the wait() method (0 timeout), wait, or specify a completion callback.
- Get the output – After inference is completed, get the output memory or read the memory you provided earlier. Do this with the InferRequest GetBlob API.
The classification_sample and classification_sample_async programs perform inference using the steps mentioned above. We use these samples in the next section to perform inference on an Intel® FPGA.
Using the Intel® Programmable Acceleration Card with Intel® Arria® 10GX FPGA for inference
The OpenVINO toolkit supports using the PAC as a target device for running low power inference. The steps for setting up the card are detailed here. The pre-processing and post-processing is performed on the host while the execution of the model is performed on the card. The toolkit contains bitstreams for different topologies.
- Programming the bitstream
aocl program <device_id> <open_vino_install_directory>/a10_dcp_bitstreams/2-0-1_RC_FP16_ResNet50-101.aocx
2. The Hetero plugin can be used with CPU as the fallback device for layers that are not supported by the FPGA. The -pc flag prints performance details for each layer
./classification_sample_async -d HETERO:FPGA,CPU -i <path/to/input/image.png> –m <path/to/ir>/resnet50_frozen.xml
Intel® OpenVINO™ toolkit is a great way to quickly integrate trained models into applications and deploy them in different production environments. The complete documentation for the toolkit can be found at https://software.intel.com/en-us/openvino-toolkit/documentation/featured.
Anybody can help me to provide information (learning walkthrough) for machine learning sandboxing system ; how machine learning content analytics ; can scan the malware to isolate the suspicious traffic into sandboxing to identify the next level behavior to protect the exploitation.
What are the component for machine learning?
How machine learning would work for content analyzing and sandboxing.
Explaining the relationship between machine learning and artificial intelligence is one of the most challenging concepts that I encounter when talking to people new to these topics. I don’t pretend to have the definitive answer, but, I have developed a story that seems to get enough affirmative head shakes that I want to share it here.
The diagram above has appeared in many introductory books and articles that I’ve seen. I have reproduced it here to highlight the challenge of talking about “subsets” of abstract concepts – none of which have widely accepted definitions. So, what does this graphic mean or imply? How is deep learning a subset of artificial intelligence? These are the questions I’m going to try to answer by telling you a story I use for briefings on artificial intelligence during the rest of this article.
Since so many people have read about and studied examples of using deep learning for image classification, that is my starting point. I am not however going to talk about cats and dogs, so please hang with me for a bit longer. I’m going to use an example of facial recognition. My scenario is that there is a secure area in a building that only 4 people (Angela, Chris, Lucy and Marie) are permitted to enter. We want to use facial recognition to determine if someone attempting to gain access should be allowed in. You and I can easily look at a picture and say whether it is someone we know. But how does a deep learning model do that and how could we use the result of the model to create an artificial intelligence application?
I frequently use the picture below to discuss the use of deep neural networks for doing model training for supervised classification. Now when looking at the network consider that the goal of all machine learning and deep learning is to transform input data into some meaningful output. For facial recognition, the input data is a representation of the pixel intensity and color or grey scale value from a picture and the output is probability that the picture is either Angela, Chris, Lucy or Marie. That means we are going to have to train the network using recent photos of these four people.
A highly stylized neural network representation
This picture above is a crude simplification of how a modern convolutional neural network (ConvNet) used for image recognition would be constructed, however, it is useful to highlight many of the important elements of what we mean by transforming raw data into meaningful outputs. For example, each line or edge drawn between the neurons of each layer represent a weight (parameter) that must be calculated during training. These weights are the primary mechanism used to transform the input data into something useful. Because this picture only includes 5 layers with less than 10 nodes per layer it is easy to visualize how fully connected layers can quickly increase the number of weights that must be computed. The ConvNets in wide spread use today typically have from 16 to 200 or more layers, although not all fully connected for the deeper designs, and can have 10’s of millions to 100’s of millions of weights or more.
We need that many weights to “meaningfully” transform the input data since the image is broken down into many small regions of pixels (typically 3×3 or 5×5) before getting ingested by the input layer. The numerical representation of the pixel values is then transformed by the weights so that the output of the transformation indicates if this region of cells adds to the evidence that this is a picture of Angela or negates the likelihood that this is Angela. If Angela has black hair and the network does not detect many regions of solid black color, then there not be much evidence that this picture is Angela.
Finally, I want to tie everything discussed so far to an explanation of the output layer. In the picture above, there are 4 neurons in the output layer and that is why I setup my facial recognition story to have 4 people that I am trying to recognize. During training I have a set of pictures that have been labeled with the correct name. One way to look at how I might do that is like this:
Table 1 – Representation of labeled training data
The goal during training is to come up with a single set of weights that will transform the data from every picture in the training data set into a set of four values (vector) for each picture where the values match as close as possible to the labels assigned as above. For Picture1 the first value is 1 and the other three are zeros and for Picture2 the set of 4 training values are set to zero for the first 3 elements and the fourth value is 1. We are telling the model that we are 100% sure (probability = 1) that this is a picture of Angela and certain that it is not Chris, Lucy, or Marie (probability = 0). The training process tries to find a set of weights that will transform the pixel data for Picture1 in to the vector (1,0,0,0) and Picture2 into the vector (0,0,0,1) and so on for the entire data set.
Of course, no deep learning model training algorithm can do that because of variations in the data so we try to get as close as possible for each input image. The process of testing a model with known data or processing new unlabeled images is called inferencing. When we pass in unlabeled data we get back a list of four probabilities that reflect the evidence in the data that the image is one of the four know people, for example we might get something back like (.5, .25, .15, .1). For most classification algorithms the set of probabilities will add to 1. What does this result tell us?
Our model says we are most confident that the unlabeled picture is Angela since that is the outcome with the highest probability, but, it also tells us that we can only be 50% sure that it is not one of the other three people. What does it mean if we get an inference result back that is (..25, .25, .25, .25)? This result tells us the model can’t do better than a random process like picking a number between 1 and 4. This picture could be anyone of our known people or it could be a picture of a truck. The model provides us no information. How intelligent is that? This is where the connection with artificial intelligence gets interesting.
What we like to achieve is getting back inference predictions where one value is very close to 1 and all the others are very close to zero. Then we have high confidence that person requesting access to a restricted area is one of our authorized employees. That is rarely the case, so we must deal with uncertainty in our applications that use our trained machine learning models. If the area that we are securing is the executive dining room then perhaps we want to open the door even if we are only 50% sure that the person requesting access is one of our known people. If the application is securing access to sensitive computer and communication equipment, then perhaps we want to set a threshold of 90% certainty before we unlock the door. The important point is that machine learning is usually not sufficient alone to build an intelligent application. Therefore, fear that the machines are going to get smarter than people and therefore be able to make “better” decisions is still a long way off, maybe a very long way…
The process of training a deep neural network is akin to finding the minimum of a function in a very high-dimensional space. Deep neural networks are usually trained using stochastic gradient descent (or one of its variants). A small batch (usually 16-512), randomly sampled from the training set, is used to approximate the gradients of the loss function (the optimization objective) with respect to the weights. The computed gradient is essentially an average of the gradients for each data-point in the batch. The natural way to parallelize the training across multiple nodes/workers is to increase the batch size and have each node compute the gradients on a different chunk of the batch. Distributed deep learning differs from traditional HPC workloads where scaling out only affects how the computation is distributed but not the outcome.
Challenges of large-batch training
It has been consistently observed that the use of large batches leads to poor generalization performance, meaning that models trained with large batches perform poorly on test data. One of the primary reason for this is that large batches tend to converge to sharp minima of the training function, which tend to generalize less well. Small batches tend to favor flat minima that result in better generalization . The stochasticity afforded by small batches encourages the weights to escape the basins of attraction of sharp minima. Also, models trained with small batches are shown to converge farther away from the starting point. Large batches tend to be attracted to the minimum closest to the starting point and lack the explorative properties of small batches.
The number of gradient updates per pass of the data is reduced when using large batches. This is sometimes compensated by scaling the learning rate with the batch size. But simply using a higher learning rate can cause destabilize the training. Another approach is to just train the model longer, but this can lead to overfitting. Thus, there’s much more to distributed training than just scaling out to multiple nodes.
An illustration showing how sharp minima lead to poor generalization. The sharp minimum of the training function corresponds to a maximum of the testing function which hurts the model’s performance on test data. 
How can we make large batches work?
There has been a lot of interesting research recently in making large-batch training more feasible. The training time for ImageNet has now been reduced from weeks to minutes by using batches as large as 32K without sacrificing accuracy. The following methods are known to alleviate some of the problems described above:
- Scaling the learning rate 
The learning rate is multiplied by k, when the batch size is multiplied by k. However, this rule does not hold in the first few epochs of the training since the weights are changing rapidly. This can be alleviated by using a warm-up phase. The idea is to start with a small value of the learning rate and gradually ramp up to the linearly scaled value.
- Layer-wise adaptive rate scaling 
A different learning rate is used for each layer. A global learning rate is chosen and it is scaled for each layer by the ratio of the Euclidean norm of the weights to Euclidean norm of the gradients for that layer.
- Using regular SGD with momentum rather than Adam
Adam is known to make convergence faster and more stable. It is usually the default optimizer choice when training deep models. However, Adam seems to settle to less optimal minima, especially when using large batches. Using regular SGD with momentum, although more noisy than Adam, has shown improved generalization.
- Topologies also make a difference
In a previous blog post, my colleague Luke showed how using VGG16 instead of DenseNet121 considerably sped up the training for a model that identified thoracic pathologies from chest x-rays while improving area under ROC in multiple categories. Shallow models are usually easier to train, especially when using large batches.
Large-batch distributed training can significantly reduce training time but it comes with its own challenges. Improving generalization when using large batches is an active area of research, and as new methods are developed, the time to train a model will keep going down.
- On large-batch training for deep learning: Generalization gap and sharp minima. Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter. 2016. arXiv preprint arXiv:1609.04836.
- Accurate, large minibatch SGD: Training imagenet. Priya Goyal, Piotr Dollar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. arXiv preprint arXiv:1706.02677.
- Large Batch Training of Convolutional Networks . Yang You, Igor Gitman, Boris Ginsburg. 2017. arXiv preprint arXiv:1708.03888.
I’m looking to generate the Advanced Machine Learning report referrenced here: https://support.symantec.com/en_US/article.HOWTO125816.html
I’ve followed the steps of Scheduled Reports > Add > Computer Status but there is not an option for Advanced Machine Learning (Static) Content Distribution
We’re currently on 14.0.1 as suggested. And I confirmed all of the required AML settings are enabled in our environment. Am I missing something or has this report option been removed? Any help would be appreciated.
Analytics – A journey to AI Artificial intelligence (AI) has been around in concept since the 1950s when Arthur L. Samuel, created a learning algorithm that allowed a machine to beat the local state checkers champion. Yet, it took the largest supercomputer at that time and all its compute power to run that single algorithm to teach the machine how to play. The barrier to entry was so high it was out of reach for most business and research facilities, thus it never took off. Fast forward to today—the game has changed; the cost of compute … READ MORE
|Update your feed preferences|
Deep learning has exploded over the landscape of both the popular and business media landscapes. Current and upcoming technology capable of powering the calculations required by deep learning algorithms has enabled a rapid transition from new theories to new applications. One of current supporting technologies that is expanding at an increasing rate is in the area of faster and more use case specific hardware accelerators for deep learning such as GPUs with tensor cores and FPGAs hosted inside of servers. Another foundational deep learning technology that has advanced very rapidly is the software that enables implementations of complex deep learning networks. New frameworks, tools and applications are entering the landscape quickly to accomplish this, some compatible with existing infrastructure and others that require workflow overhauls.
As organizations begin to develop more complex strategies for incorporating deep learning they are likely to start to leverage multiple frameworks and application stacks for specific use cases and to compare performance and accuracy. But training models is time consuming and ties up expensive compute resources. In addition, adjustments and tuning can vary between frameworks, creating a large number of framework knobs and levers to remember how to operate. What if there was a framework that could just consume these models right out the box?
BigDL is a distributed deep learning framework with native Spark integration, allowing it to leverage Spark during model training, prediction, and tuning. One of the things that I really like about Intel BigDL is how easy it is to work with models built and/or trained in Tensorflow, Caffe and Torch. This rich interop support for deep learning models allows BigDL applications to leverage the plethora of models that currently exist with little or no additional effort. Here are just a few ways this might be used in your applications:
- Efficient Scale Out – Using BigDL you can scale out a model that was trained on a single node or workstation and leverage it at scale with Apache Spark. This can be useful for training on a large distributed dataset that already exists in your HDFS environment or for performing inferencing such as prediction and classification on a very large and often changing dataset.
- Transfer Learning – Load a pretrained model with weights and then freeze some layers, append new layers and train / retrain layers. Transfer learning can improve accuracy or reduce training time by allowing you to start with a model that is used to do one thing, such as classify a different objects, and use it to accelerate development of a model to classify something else, such as specific car models.
- High Performance on CPU – GPUs get all of the hype when it comes to deep learning. By leveraging Intel MKL and multi threading Spark tasks you can achieve better CPU driven performance leveraging BigDL than you would see with Tensorflow, Caffe or Torch when using Xeon processors.
- Dataset Access – Designed to run in Hadoop, BigDL can compute where your data already exists. This can save time and effort since data does not need to be transferred to a seperate GPU environment to be used with the deep learning model. This means that your entire pipeline from ingest to model training and inference can all happen in one environment, Hadoop.
Real Data + Real Problem
Recently I had a chance to take advantage of the model portability feature of BigDL. After learning of an internal project here at Dell EMC, leveraging deep learning and telemetry data to predict component failures, my team decided we wanted to take our Ready Solution for AI – Machine Learning with Hadoop and see how it did with the problem.
The team conducting the project for our support organization was using Tensorflow with GPU accelerators to train an LSTM model. The dataset was sensor readings from internal components collected at 15 minute intervals showing all kinds of metrics like temperature, fan speeds, runtimes, faults etc.
Initially my team wanted to focus on testing out two use cases for BigDL:
- Using BigDL model portability to perform inference using the existing tensorflow model
- Implement an LSTM model in BigDL and train it with this dataset
As always, there were some preprocessing and data cleaning steps that had happened before we could get to modeling and inference. Luckily for us though we received the clean output of those steps from our support team to get started quickly. We received the data in the form of multiple csv files, already balanced with records of devices that did fail and those that did not. We got over 200,000 rows of data that looked something like this:
Converting the data to a tfrecord format used by Tensorflow was being done with Python and pandas dataframes. Moving this process to be done in Spark is another area we knew we wanted to dig in to, but to start we wanted to focus on our above mentioned goals. When we started the pipeline looked like this:
From Tensorflow to BigDL
For BigDL, instead of creating tfrecords we needed to end up with an RDD of Sample(s). Each Sample is one record of your dataset in the form of feature, label. Feature and label are in the form of one or more tensors and we create the sample from ndarray. Looking at the current pipeline we were able to simple take the objects created before writing to tfrecord and instead wrote a function that took these arrays and formed our RDD of Sample for BigDL.
def convert_to(x, y): sequences = x labels = y record = zip(x,y) record_rdd = sc.parallelize(record) sample_rdd = record_rdd.map(lambda x:Sample.from_ndarray(x, x)) return sample_rddtrain = convert_to(x_train,y_train)val = convert_to(x_val,y_val)test = convert_to(x_test,y_test)
After that we took the pb and bin files representing the pretrained models definition and weights and loaded it using the BigDL Model.load_tensorflow function. It requires knowing the input and output names for the model, but the tensorflow graph summary tool can help out with that. It also requires a pb and bin file specifically, but if what you have is a ckpt file from tensorflow that can be converted with tools provided by BigDL.
model_def = "tf_modell/model.pb"model_variable = "tf_model/model.bin"inputs = ["Placeholder"]outputs = ["prediction/Softmax"]trained_tf_model = Model.load_tensorflow(model_def, inputs, outputs, byte_order = "little_endian", bigdl_type="float", bin_file=model_variable)
Now with our data already in the correct format we can go ahead and inference against our test dataset. BigDL provides Model.evaluate and we can pass it our RDD as well as the validation method to use, in this case Top1Accuracy.
results = trained_tf_model.evaluate(test,128,[Top1Accuracy()])
Defining a Model with BigDL
After testing out loading the pretrained tensorflow model the next experiment we wanted to conduct was to train an LSTM model defined with BigDL. BigDL provides a Sequential API and a Functional API for defining models. The Sequential API is for simpler models, with the Functional API being better for complex models. The Functional API describes the model as a graph. Since our model is LSTM we will use the Sequential API.
Defining an LSTM model is as simple as:
def build_model(input_size, hidden_size, output_size): model = Sequential() recurrent = Recurrent() recurrent.add(LSTM(input_size, hidden_size)) model.add(InferReshape([-1, input_size], True)) model.add(recurrent) model.add(Select(2, -1)) model.add(Linear(hidden_size, output_size)) return modellstm_model = build_model(n_input, n_hidden, n_classes)
After creating our model the next step is the optimizer and validation logic that our model will use to train and learn.
Create the optimizer:
optimizer = Optimizer( model=lstm_model, training_rdd=train, criterion=CrossEntropyCriterion(), optim_method=Adam(), end_trigger=MaxEpoch(50), batch_size=batch_size)
Set the validation logic:
optimizer.set_validation( batch_size=batch_size, val_rdd=val, trigger=EveryEpoch(), val_method=[Top1Accuracy()])
Now we can do trained_model = optimizer.optimize() to train our model, in this case for 50 epochs. We also set our TrainSummary folder so that the data was logged. This allowed us to also get visualizations in Tensorboard, something that BigDL supports.
At this point we had completed the two initial tasks we had set out to do, load a pretrained Tensorflow model using BigDL and train a new model with BigDL. Hopefully you found some of this process interesting, and also got an idea for how easy BigDL is for this use case. The ability to leverage deep learning models inside Hadoop with no specialized hardware like Infiniband, GPU accelerators etc provides a great tool that is sure to change up the way you currently view your existing analytics.