There is a lot of Big Data software available now. One of them that you should definitely know about is the H2O AI Machine Learning solution.

With this open-source application you can implement algorithms from the fields of statistics, data mining and machine learning. The H2O AI Engine is based on the distributed file system Hadoop and is therefore more performant than other analysis tools. Your machine learning methods can thus be used as
parallelized methods.

Software Stack

They can program their algorithms in R, Python and Java and thus in the most important mathematical programming languages. H2O provides a REST interface to Python, R, JSON and Excel. Additionally, you can access H2O directly with Hadoop and Apache Spark. This makes integration into your data science workflow much easier. You already get approximate results while running the algorithms. A graphical web browser UI helps you to better analyze the processes and perform targeted optimizations.

How Clients Interacts with H2O AI

You can interact with H2O via clients using various interfaces. It is important for you to know that the data is usually not held in memory. They are localized in a H2O cluster and you only get a pointer to the data when you make a request.

How Clients Interacts with H2O AI
H2O Interaction flow

H2O Frame

The basic unit of data storage accessible to you is the H2O Frame. This corresponds to a two-dimensional, resizable and potentially heterogeneous data point. This tabular data structure also contains labeled axes.

H2O Cluster

Your H2O cluster consists of one or more nodes. A node corresponds to a JVM process and this process consists of three layers.

H2O Machine Learning Software Structure
H2O Software Stack

H2O Machine Learning Components

Language Layer

The R evaluation layer is a slave to the REST client front-end and in the Scala layer you can write native programs and algorithms. You can then use these with H2O Machine learning.

Algorithms Layer

This layer is where your algorithms are applied. You can run statistical methods, data import and machine learning here.

Core Layer

In this layer you handle the resource management. You can manage both the memory and the CPU processing capacity.