EXPERT KNOWLEDGE AT A GLANCE

Month: December 2020

When to choose NoSQL over SQL?

When to use NoSQL vs SQL – In this article we explain the important differences.
With the right choice of storage medium, you can build elementary more performant architectures in times of Big Data. Streaming platforms can now process huge streams of data in real time. But this technology is not a panacea. The database, for example, still occupies an important place in today’s data handling.
Often, however, it is crucial that you choose the right system for your data and in relation to the overall infrastructure.

when to use NoSQL vs SQL – Spoiled for choice

Database vendors abound. Here is just a small selection of popular databases.

popular examples nosql sql
Popular SQL and NoSQL Databases

But before you get into the differences between the databases, you should basically know the differences between the systems.

SQL is relational

Structured Query Language (SQL) databases consist of a fixed defined schema structure. All schemas contain tables with columns. Each table row (tuple) represents a data set (record). In addition, each row consists of a set of attributes (characteristics).

You can use the query language to manipulate and retrieve tables. You can also control the relationships between these structured data formats. Each table in a database can be linked to each other.
These relationships can take many forms. Table cells can have single relationships, or relationships with many cells.

This schema clearly shows all SQL table cells elationships
SQL table cells relationships

NoSQL is not relational

Not only SQL (NoSQL) databases allow you to store and retrieve unstructured data using a dynamic schema. For example, your data is stored in the form of n collections, each containing m documents. Other forms are key-value stores, or graph databases. Thus, there is no special query language here

when to use NoSQL vs SQL – Both in direct comparison.


NoSQL databases exist since 1998 and is relatively young compared to SQL. SQL was already developed in the 70s. Besides the actual structure, databases of both categories differ in that they are scalable in different ways. In contrast to
NoSQL databases, SQL databases can only be scaled vertically.
Furthermore, it is important for you to know that you cannot write to and read from an SQL database in parallel. In NoSQL databases, you can read what data is available at that moment.

when to use NoSQL vs SQL - This picture shows schematically and clearly the differences between NoSQL and SQL databases
SQL vs NoSQL

When to use NoSQL vs SQL

Which one suits me?


As you might have guessed, the answer here is: it depends! The differences are there and can have an important impact on the performance of your services. So the choice always depends on the application purpose. Especially for BigData use cases you should choose a NoSQL database, because here you don’t have to wait for the transaction to complete. Where you need high flexibility, due to frequently changing data structures, or real-time processing, you should also go for NoSQL DBs. However, if you want acid guarantees, you will have to go for an SQL solution. It is important for you to understand that both systems coexist, complement each other and do not replace each other.

If you want to know how to partition a database, check out this article.

H2O AI – That’s why it’s so great

There is a lot of Big Data software available now. One of them that you should definitely know about is the H2O AI Machine Learning solution.

With this open-source application you can implement algorithms from the fields of statistics, data mining and machine learning. The H2O AI Engine is based on the distributed file system Hadoop and is therefore more performant than other analysis tools. Your machine learning methods can thus be used as
parallelized methods.

Software Stack

They can program their algorithms in R, Python and Java and thus in the most important mathematical programming languages. H2O provides a REST interface to Python, R, JSON and Excel. Additionally, you can access H2O directly with Hadoop and Apache Spark. This makes integration into your data science workflow much easier. You already get approximate results while running the algorithms. A graphical web browser UI helps you to better analyze the processes and perform targeted optimizations.

How Clients Interacts with H2O AI

You can interact with H2O via clients using various interfaces. It is important for you to know that the data is usually not held in memory. They are localized in a H2O cluster and you only get a pointer to the data when you make a request.

How Clients Interacts with H2O AI
H2O Interaction flow

H2O Frame

The basic unit of data storage accessible to you is the H2O Frame. This corresponds to a two-dimensional, resizable and potentially heterogeneous data point. This tabular data structure also contains labeled axes.

H2O Cluster

Your H2O cluster consists of one or more nodes. A node corresponds to a JVM process and this process consists of three layers.

H2O Machine Learning Software Structure
H2O Software Stack

H2O Machine Learning Components

Language Layer

The R evaluation layer is a slave to the REST client front-end and in the Scala layer you can write native programs and algorithms. You can then use these with H2O Machine learning.

Algorithms Layer

This layer is where your algorithms are applied. You can run statistical methods, data import and machine learning here.

Core Layer

In this layer you handle the resource management. You can manage both the memory and the CPU processing capacity.