What’s the Connection Between Big Data and AI?

When people talk about big data, are they simply referring to numbers and metrics?

Yes.

And no.

Technically, big data is simply bits and bytes—literally, a massive amount (petabytes or more) of data. But to dismiss big data as mere ones and zeroes misses the point. Big data may physically be a collection of numbers, but when placed against proper context, those numbers take on a life of their own.

This is particularly true in the realm of artificial intelligence (AI). AI and big data are intrinsically connected; without big data, AI simply couldn’t learn. From the perspective of the team in charge of Oracle’s Cloud Business Group (CBG) Product Marketing, they liken big data to the human experience. On Oracle’s Practical Path To AI podcast episode Connecting the Dots Between Big Data and AI, team members compare the AI learning process to the human experience.

The short version: the human brain ingests countless experiences every moment. Everything that is taken in by senses is technically a piece of information or data—a note of music, a word in a book, a drop of rain, and so on. Infant brains learn from the very beginning they start taking in sensory information, and the more they encounter, the more they are able to assimilate and process, then respond in new and informed ways.

AI works similarly. The more data an AI model encounters, the more intelligent it can become. Over time, as more and more data processes through the AI model, it becomes increasingly significant. In that sense, AI models are trained by big data, just as human brains are trained by the data accumulated through multiple experiences.

And while this may all seem scary at first, there’s a definite public shift toward trusting AI-driven software. This is discussed further by Oracle’s CBG team on the podcast episode, and it all goes back to the idea of human experiences. In the digital realm, people now have the ability to document, review, rank, and track these experiences. This knowledge becomes data points in big data, thus fed into AI models which start validating or invalidating the experiences. With enough of a sample size, a determination can be made based on “a power of collective knowledge” that grows and creates this network.

However, that doesn’t mean that AI is the authority on everything, even with all the data in the world.

To hear more about this topic—and why human judgment is still a very real and very necessary part of, well, everything—listen to the entire podcast episode Connecting the Dots Between Big Data and AI and be sure to visit Oracle’s Big Data site to stay on top of the latest developments in the field of big data.

Guest author Michael Chen is a senior manager, product marketing with Oracle Analytics.

Related:

Taking the Fear Factor Out of AI

For decades, films like Space Odyssey, War Games, Terminator and The Matrix have depicted the future and what it would be like if artificial intelligence (AI) took over the world. Fast forward to 2019 and AI is quickly becoming a reality. The things we only used to see in the movies are improving our daily lives and we often don’t realize it. We’ve been living with AI assistance for quite some time. We use Waze and Google Maps to help us predict traffic patterns and find the shortest driving routes. We let Roomba navigate our homes … READ MORE

Related:

New ‘Experience Zones’ Offer a Fast Route to AI Expertise

New Dell EMC AI Experiences Zones showcase the business benefits of artificial intelligence and provide ready access to the latest Dell EMC AI solutions. Organizations around the world now recognize the opportunity to put artificial intelligence to work to solve pressing business problems. In one sign of this growing AI momentum, a recent IDC report predicts that worldwide spending on AI systems will jump by 44 percent this year, to more than $35 billion.[1] This push into the brave new world of AI isn’t confined to just certain industries. It’s across the board, according to IDC. … READ MORE

Related:

Where Were You When Artificial Intelligence Transformed the Enterprise?

Where were you when artificial intelligence (AI) came online? Remember that science fiction movie where AI takes over in a near dystopian future? The plot revolves around a crazy scientist who accidentally put AI online, only to realize the mistake too late. Soon the machines became the human’s overlords. While these science fiction scenarios are entertaining, they really just stoke fear and add to the confusion to AI. What enterprises should be worried about regarding AI, is understanding how their competition is embracing it to get a leg up. Where were you when your competition put … READ MORE

Related:

Neural Networks in Deep Learning

Neural networks are algorithms that are loosely modeled on the way brains work. These are of great interest right now because they can learn how to recognize patterns. A famous example involves a neural network algorithm that learns to recognize whether an image has a cat, or doesn’t have a cat. In this article, I’m providing an introduction to neural networks. We’ll explore what neural networks are, how they work, and how they’re used today in today’s rapidly developing machine-learning world.

Before we look at different types of neural networks, we need to start with the basic building blocks. And these aren’t hard. There are just five things you need to figure out:

  • Neurons
  • Inputs
  • Outputs (called activation functions)
  • Weights
  • Biases

I’ll summarize these terms below, or you can take a look at this Oracle blog post on machine learning for a more detailed explanation.

How Neural Networks Work

Neurons are the decision makers. Each neuron has one or more inputs and a single output called an activation function. This output can be used as an input to one or more neurons or as an output for the network as a whole. Some inputs are more important than others and so are weighted accordingly. Neurons themselves will “fire” or change their outputs based on these weighted inputs. How quickly they fire depends on their bias. Here’s a simple diagram covering these five elements.

Diagram of Neurons, Inputs, Outputs, Weights, Biases

I haven’t represented the weight and bias in this diagram, but you can think of them as floating point numbers, typically in the range of 0-1. The output or activation function of a neuron doesn’t have to be a simple on/off (though that is the first option) but can take different shapes. In some cases, such as the third and fifth examples above, the output value can go lower than zero. And that’s it!

Examples of How Neural Networks Work

So now you have the building blocks, let’s put them together to form a simple neural network. Here’s a network that is used to recognize handwritten digits. I took it from this neural network site, which I’d recommend as a great resource if you want to read further about this topic.

A simple neural network example

Here you can see a simple diagram with inputs on the left. Only eight are shown but there would need to be 784 in total, one neuron mapping to each of the 784 pixels in the 28×28 pixel scanned images of handwritten digits that the network processes. On the right-hand side, you see the outputs. We would want one and only one of those neurons to fire each time a new image is processed. And in the middle, we have a hidden layer, so-called because you don’t see it directly. A network like this can be trained to deliver very high accuracy recognizing scanned images of handwritten digits (like the example below, adjusted to cover 28×28 pixels).

Scanned image of handwritten digits for neural network

But a network like the one shown above would not be considered by most to be deep learning. It’s too simple, with only one hidden layer. The cutoff point is considered to be at least two hidden layers, like the one shown below:

Neural network with two hidden layers

I glossed over what the hidden layer is actually doing, so let’s look at it here. The input layer has neurons that map to an individual pixel, while the output neurons effectively map to the whole image. Those hidden layers map to components of the image. Perhaps they recognize a curve or a diagonal line or a closed loop.

But importantly, those components in the hidden layers map to specific locations in the original image. They have to. There are hard links from the individual pixels on the left. So a network like the one above would not be able to answer a simple question on the image like the one below: how many horses do you see?

Horses and neural networks

I could show images that had horses anywhere on the picture and you would have no problem determining how many there were. You’d do so by recognizing the elements that make up a horse, no matter where in the picture they occurred. And that’s a very good thing, because the world we live in requires us to recognize objects that are in front of us, or off to the side, fully visible, or partially obscured. To solve problems like this, we need a different kind of network like the one you see below: a convolutional neural network.

Let’s imagine we’re working with images that are 28×28 pixels again, but this time we can’t rely on having one image fixed in the center. Look at the logic of that first hidden layer. All of those neurons are now linked to specific, overlapping areas of the input image (in this case a 5×5 pixel area). Starting with this basic structure, and adding some additional processing, it’s possible to build a neural network that can identify items in a position-independent way. Incidentally, neurons in the visual cortex of animals, including humans, work in a similar way. There are neurons that only trigger on certain parts of the field of view.

Convolutional networks are the workhorses of image recognition. But when it comes to natural language processing, they are not so good. Understanding the written or spoken word is quite different from processing independent images. Language is highly contextual, by which I mean that individual words have to be processed in the context of the words around them. (Note that I am not a linguist and apologize for any imprecise usage of terms).

When it comes to processing a sentence, there are at least three different things you have to understand: the first two are the meaning of the individual words, and the syntax or grammar of the sentence (the rules about word order, structure, and so on). If you’ve gotten this far, then you have those things nailed. But consider the sentence below.

I’m baking a rainbow cake and want to add different _________ to the batter.

What’s the missing term? You can only figure something like that out by looking at the earlier part of the sentence. A rainbow has many different colors, so you would need to add different food dyes to the batter (which would also have to be portioned out in some way).

Recurrent Neural Networks

Working out that answer required taking earlier words in the sentence as input to the next word. I’m describing a feedback loop, which is not something you saw earlier. Networks with feedback loops are called recurrent neural networks and in its simplest form, a feedback loop looks like this.

Note how the output feeds back to the inputs. If you “unroll” this diagram, you get the structure below.

You can see how this kind of structure would enable you to process a sequence of elements (like the words in a sentence) with each one providing input (context if you like) for subsequent elements.

Of course, this simple structure is not powerful enough to process language, but more complex networks with feedback loops can. And a common kind of recurrent neural network contains elements called LSTM units, which are really good at remembering things (like a key word earlier in a sentence), as well as forgetting them when needed. Below is one example of an LSTM unit.

You can see the similarity with the simple diagram above, but there’s much more going on here. I’m not going to explain it all (there’s a great explanation on this Github page) but I’ll point out a couple of things.

Look inside the main rectangular box. The shaded rectangular boxes are entire layers, the symbol inside representing the shape of the activation function (output). The shaded circles with X and + represent multiplication and addition operations respectively. Look at the first combination (a layer with a sigmoid output leading to a multiplication with the output of the previous term). If that layer outputs a value of zero, then the multiplication will effectively zero out that previous term. Put another way, this first combination is the “forgetting” circuitry. For the rest, check out this blog.

There’s more to processing language than syntax and understanding individual words. In that earlier example, how did you know a rainbow has many different colors? You’ve seen rainbows before and know what they look like. And that general knowledge of the implications of those words is the third element of processing natural language (and the hardest for a computer). To illustrate this, see the example below from the game of cricket. Those of you who don’t know the game will still be able to process the syntax of the sentences. You will still know what the individual words mean. But lacking the “common sense” of the context for those words, you will have no clue what is going on.

Coming around the wicket, the leg spinner bowled a “wrong-un.” The batsman swept it firmly against the spin to deep backward point. Is the batsman left-handed, right-handed, or can’t tell from this information?

Conclusion

This is just an overview. There are many other approaches to neural networks that have different strengths and weaknesses or are used to solve different types of problems. But concepts here still apply. Neural networks are all built on the same basic elements: neurons (with bias), inputs (with weights), and outputs (activation functions with specific profiles). These elements are used to construct different or specialized layers and elements (like the LSTM unit above). All of these things are combined with feedback loops and other connections to form a network.

In an upcoming blog post, we will look at how neural networks learn and are trained. Because until that happens, they are not ready for work. In the meantime, discover more about Oracle’s data management platforms and how Oracle supports neural networks in Oracle Database.

Related:

  • No Related Posts

Proxysg integrate with DLP

I need a solution

Hi everyone,

I am new proyxsg and dlp user. Please tell me how to  solve below questoin.

I have a proxysg(version 6.7) and DLP(15.5). When I intergrate with both, then health check, the proysg shows “health check has failed”.  Can anyone help me what should I do for successful integration. Thanks.

0

Related:

Simple, Scalable, Containerized Deep Learning using Nauta

Deep learning is hard. Between organizing, cleaning and labeling data, selecting the right neural network topology, picking the right hyperparameters, and then waiting – hoping – that the model produced is accurate enough to put into production. It can seem like an impossible puzzle for your data science team to solve. But the IT aspect of the puzzle is no less complicated, especially when the environment needs to be multi-user and support distributed model training. From choosing an operating system, to installing libraries, frameworks, dependencies, and development platforms, building the infrastructure to support your company’s deep … READ MORE

Related:

Block specific google document on google doc

I need a solution

Hi team,

I am just wondering if we could block one specific google document url https://drive.google.com/file/d/xxxxxxxxxxxxxxxxxxxxxxx/view?usp=drive_web

I do not want to block the whole domain but only that google document url.

I have tried:

1. https://drive.google.com/file/d/xxxxxxxxxxxxxxxxxxxxxxx/view?usp=drive_web

2. drive.google.com:443/file/d/xxxxxxxxxxxxxxxxxxxxxxx/view?usp=drive_web

Those didn’t work.

0

Related:

Scaling NMT with Intel® Xeon® Scalable Processors

With the continuous research interest in the field of machine translation and efficient neural network architecture designs improving translation quality, there’s a great need to improve its time to solution. Training a better performing Neural Machine Translation (NMT) model still takes days to weeks depending on the hardware, size of the training corpus and the model architecture.



Intel® Xeon® Scalable processors provide incredible leap in scalability and over 90% of the Top500 super computers run on Intel. In this article we show some of the training considerations and effectiveness when scaling a NMT model using Intel® Xeon® Scalable processors.



An NMT model reads a source sentence in a language and passes it to an encoder which builds an intermediate representation and the decoder processes the intermediate representation to produce the translated target sentence in another language.

enc-dec-architecture.pngFigure 1: Encoder-decoder architecture



The figure above shows an encoder-decoder architecture. The English source sentence, “Hello! How are you?” is read and processed by the architecture to produce a translated German sentence “Hallo! Wie geht sind Sie?”. Traditionally, Recurrent Neural Network (RNN) were used in encoders and decoders, but other neural network architectures such as Convolutional Neural Network (CNN) and attention mechanismbased models are also used.



Architecture and environment

Transformer model is one of the interesting architectures in the field of NMT, which is built with variants of attention mechanism in the encoder-decoder part there by replacing the traditional RNNs in the architecture. This model was able to achieve state of the art results in English-German and English-French translation tasks.



multi_head_attention.png

Figure 2: Multi-head attention block



The above figure shows the multi-head attention block used in the transformer model. At a high-level, the scaled dot-product attention can be thought as finding the relevant information, values (V) based on Query (Q) and Keys (K) and multi-head attention could be thought as several attention layers in parallel to get distinct aspects of the input.



We use the Tensorflows’ official model implementation of the transformer model and we’ve also added Horovod to perform distributed training. WMT English-German parallel corpus with 4.5M sentences was used to train the model.



The tests described in this article were performed in house on Zenith super computer in Dell EMC HPC and AI Innovation lab. Zenith is Dell EMC PowerEdge C6420-based cluster, consisting of 388 dual socket nodes powered by Intel® Xeon® Scalable Gold 6148 processors and interconnected with an Intel® Omni-path fabric.



System Information

CPU Model

Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

Operating System

Red Hat Enterprise Linux Server release 7.4 (Maipo)

Tensorflow Version

1.10.1 with Intel® MKL

Horovod Version

0.15.0

MPI

Open MPI 3.1.2

Note: We used a specific Horovod branch to handle sparse gradients. Which is now part of the main branch in their GitHub repository.



Weak scaling, environment variables and TF configurations

When training using CPUs, weak scaling, environment variables and TF configurations play a vital role in improving the throughput of the deep learning model. Setting it optimally can help with additional performance gains.



Below are the suggestions based on our empirical tests when running 4 processes per node for the transformer (big) model on 50 zenith nodes. We found setting these variables on all our experiments seem to improve the throughput and modifying OMP_NUM_THREADS based on the number of processes per node.



Environment Variables:

export OMP_NUM_THREADS=10

export KMP_BLOCKTIME=0

export KMP_AFFINITY=granularity=fine,verbose,compact,1,0



TF Configurations:

intra_op_parallelism_threads=$OMP_NUM_THREADS

inter_op_parallelism_threads=1



Experimenting with weak scaling options allows to find the optimal number of processes run per node such that the model is fit in the memory and performance doesn’t deteriorate. For some reason TensorFlow creates an extra thread. Hence, to avoid oversubscription it’s better to set the OMP_NUM_THREADS to 9, 19 or 39 when training with 4,2,1 process per node respectively. Although we didn’t see it affecting the throughput performance in our experiments but may affect performance in a very large-scale setup.



Performance can be improved by threading. This can be done by setting OMP_NUM_THREADS, such that the product of its value and number of MPI ranks per node equals the number of available CPU cores per node. KMP_AFFINITY environment variable provides a way to control the interface which binds OpenMP threads to physical processing units. KMP_BLOCKTIME, sets the time in milliseconds that a thread should wait after completing a parallel execution before sleep. TF configurations such as intra_op_parallelism_threads and inter_op_parallelism_threads are used to adjust the thread pools there by optimizing the CPU performance.



effect_of_environment_variables_bold.png

Figure 3: Effect of environment variables



The above results show that there’s a 1.67x improvement when environment variables are set correctly.



Faster distributed training

Training a large neural network architecture could be time consuming even to perform rapid prototyping or hyper parameter tuning. Thanks to distributed training and open source frameworks like Horovod which allows to train a model using multiple workers. In our previous blog we showed the effectiveness of training an AI radiologist with distributed deep learning and using Intel® Xeon® Scalable processors. Here, we show how distributed training improves the performance of machine translation task.





scaling_performance_bold.png

Figure 4: Scaling Performance



The above chart shows the throughput of the transformer (big) model when trained using 1 – 100 zenith nodes. We get a near linear performance when scaling up the number of nodes. Based on our tests, which include setting the correct environment variables and optimal number of processes, we see an 79x improvement on 100 Zenith nodes with 2 processes per node compared to the throughput on single node with 4 processes.



Translation Quality

NMT models’ translation quality is measured in terms of BLEU (Bi-Lingual Evaluation Understudy) score. It’s a measure to compute the difference between the human and machine translated output.



In a previous blog post we explained some of the challenges of large-batch training of deep learning models. Here, we experimented using a large global batch size of 402k tokens to determine the models’ performance on English to German task. Most of the hyper parameters were set same as that of transformer (big) model, the model was trained using 50 Zenith nodes with 4 processes per node and 2010 being the local batch size. The learning rate grows linearly for 4000 steps to 0.001 and then follows inverse square root decay.



Case-Insensitive BLEU Case-Sensitive BLEU
TensorFlow Official Benchmark Results 28.9
Our results 29.15 28.56

Note: Case-Sensitive score not reported in the Tensorflow Official Benchmark.



Above table shows our results on the test set (newstest2014) after training the model for around 2.7 days (26000 steps). We can also see a clear improvement in the translation quality compared to the results posted on Tensorflow Official Benchmarks.



Conclusion

In this post we showed how to effectively train an NMT system using Intel® Xeon® Scalable processors. We also showed some of the best practices for setting the environment variables and the corresponding scaling performance. Based on our experiments and also following other research work on NMT to understand some of the important aspects of scaling an NMT system, we were able obtain a better translation quality and to speed up the training process. With growing research interest in the field of neural machine translation, we expect to see much interesting and improved NMT models in the future.

Related: