EXPERT KNOWLEDGE AT A GLANCE

Month: October 2020

That is why Liquid State Machines (LSM) are great

– Recently developed computational model

– does not require information to be stored in some stable state of the system

→ the inherent dynamics of the system are used by a memory less readout function to compute the output

→ can be used for complex Tasks (pattern classification, function approximation, object tracking, …)

LSMs take the temporal aspect of the input into account

Concept

The figure shows a typical structure of a liquid State Machine.
Liquid State Machine

Reservoir/ Liquid

– large accumulation of recurrent interacting nodes
→ is stimulated by the input layer
– Liquid itself is not trained, but randomly constructed with the help of heuristics
– Loops cause a short-term memory effect
– preferably a Spiking Neural Network (SNNs)
→ are closer to biological neural networks than the multilayer Perceptron
→ can be any type of network that has sufficient internal dynamics

Running State

→ will be extracted by the readout function

– depend on the input streams they’ve been presented

Readout Function

– converts the high-dimensional state into the output

– since the readout function is separated from the liquid, several readout functions can be used with the same liquid

→ so different tasks can be performed with the same input

lsm readout fcts
different types of readout functions

AutoEncoder – What Is It? And What Is It Used For?

AutoEncoder – In data science, we often encounter multidimensional data relationships. Understanding and representing these is often not straightforward. But how do you effectively reduce the dimension without reducing the information content?

Unsupervised dimension reduction

One possibility is offered by unsupervised machine learning algorithms, which aim to code high-dimensional data as effectively as possible in a low-dimensional way.
If you don’t know the difference between unsupervised, supervised and reinforcement learning, check out this article we wrote on the topic.

What is an AutoEncoder?

The AutoEncoder is an artificial neural network that is used to unsupervised reduce the data dimensions.
The network usually consists of three or more layers. The gradient calculation is usually done with a backpropagation algorithm. The network thus corresponds to a feedforward network that is fully interconnected layer by layer.

Types

AutoEncoder types are many. The following table lists the most common variations.

The figure shows all common AutoEncoder types
AutoEncoder types

However, the basic structure of all variations is the same for all types.

Basic Structure

Each AutoEncoder is characterized by an encoding and a decoding side, which are connected by a bottleneck, a much smaller hidden layer.

The following figure shows the basic network structure.

The figure shows the basic AutoEncoder structure.
AutoEncoder model architecture


During encoding, the dimension of the input information is reduced. The average value of the information is passed on and the information is compressed in such a way.
In the decoding part, the compressed information is to be used to reconstruct the original data. For this purpose, the weights are then adjusted via backpropagation.
In the output layer, each neuron then has the same meaning as the corresponding neuron in the input layer.

Autoencoder vs Restricted Boltzmann Machine (RBM)

Restricted Boltzmann Machines are also based on a similar idea. These are undirected graphical models useful for dimensionality reduction, classification, regression, collaborative filtering, and feature learning. However, these take a stochastic approach. Thus, stochastic units with a particular distribution are used instead of the deterministic distribution.


RBMs are designed to find the connections between visible and hidden random variables. How does the training work?
The hidden biases generate the activations during forward traversal and the visible layer biases generate learning of the reconstruction during backward traversal.

Pretraining

Since the random initialization of weights in neural networks at the beginning of training is not always optimal, it makes sense to pre-train. The task of training is to minimize an error or a reconstruction in order to find the most efficient compact representation for input data.

The figure shows the pretraining procedure of an autoencoder according to Hinton.
Training Stacked Autoencoder


The method was developed by Geoffrey Hinton and is primarily for training complex autoencoders. Here, the neighboring layers are treated as a Restricted Boltzmann Machine. Thus, a good approximation is achieved and fine-tuning is done with a backpropagation.

scikit-learn – Machine learning, Data Mining and Data Analysis in Python for free

In almost no scientific discipline you can get around the programming language Python nowadays.
With it, powerful algorithms can be applied to large amounts of data in a performant way.
Open source libraries and frameworks enable the simple implementation of mathematical methods and data transports.

What is scikit-learn?

One of the most popular Python libraries is scikit-learn. It can be used to implement both supervised and unsupervised machine learning algorithms. scikit-learn primarily offers ready-made solutions for data mining, preprocessing and data analysis.
The library is based on the SciPy Toolkit (SciKit) and makes extensive use of NumPy for high performance linear algebra and array operations. If you don’t know what NumPy is, check out our article on the popular Python library.
The library was first released in 2007 and since then it is constantly extended and optimized by a very active community.
The library was written primarily in Python and is based on Cython only for some high-level operations.
This makes the library easy to integrate into Python applications.

scikit-learn Features

Easily implement many machine learning algorithms with scikit-learn. Both supervised and unsupervised machine learning are supported. If you don’t know what the difference is between the two machine learning categories, check out this article from us on the topic.
The figure below lists all the algorithms provided by the library.

The figure  lists all the upervised and unsupervised machine learning algorithms provided by scikit-learn..
machine learning algorithms provided by scikit-learn..

scikit-learn thus offers rich capabilities to recognize patterns and data relationships in a dataset. Thus, high dimensions can be reduced to visualize the relationships without sacrificing much information.
Features can be extracted and data clustering algorithms can be easily created.

Dependencies

scikit-learn is powerful and versatile. However, the library does not exist completely solitary. Besides the obvious dependency on Python, the library requires the import of other libraries for special operations.

NumPy allows easy handling of vectors, matrices or generally large multidimensional arrays. SciPy complements these functions with useful features like minimization, regression or the Fourier transform. With joblib Python functions can be built as lightweighted pipeline jobs and with threadpoolctl methods can be coordinated as threads to save resources.

SciPy turns Python into an ingenious free MATLAB alternative

Python vs MATLAB

== Open source Python library
– a collection of mathematical algorithms and convenience functions

– is mainly used by scientists, analysts and engineers for scientific computing, visualization and related activities

– Initial Realease: 2006; Stable Release: 2020
– depends on the NumPy module
→ basic data structure used by SciPy is a N-dimensional array provided by NumPy

Benefits

scipy benefits

Features

– SciPy library provides many user-friendly and efficient numerical routines:

scipy subpackages

Available sub-packages

SciPy ecosystem

– scienitific computing in Python builds upon a small core of open-source software for mathematics, science and engineering

scipy ecosystem
SciPy Core Software

More relevant Packages

– the SciPy ecosystem includes, based on the core properties, other specialized tools

scipy eco sidepackages

The product and further information can be found here:

https://www.scipy.org/

Apache Mahout – A Powerful Open Source Machine Learning Project

Apache Mahout is a powerful machine learning tool that comes with a seamless compatibility to the strong big data management frameworks from the Apache universe. In this article, we will explain the functionalities and show you the possibilities that the Apache environment offers.

What is Machine Learning?

Machine learning algorithms provide lots of tools for analyzing large unknown data sets.
The art of data science is to extract the maximum amount of information depending on the data set by using the right method. Are there patterns in the high-dimensional data relationships, and how can they be represented in a low-dimensional way without much loss of information?

scikitLearn ml
Fields of machine learning


There is often a similar amount of information in the failure as when an algorithm was able to successfully create groupings.
It is important to understand the mathematical approaches behind the tools in order to draw conclusions about why an algorithm did not work.
If you don’t know the basic machine learning categories, it’s best to read our article on the subject first.

Machine Learning and Linear Algebra

Most machine learning methods are based on linear algebra.
This mathematical subfield deals with linear transformations, vector spaces and linear mappings between them.
The knowledge of the regularities is the key to the correct understanding of machine learning algorithms.

What is Apache Mahout

Apache Mahout is an open source machine learning project that builds implementations of scalable machine learning algorithms with a focus on linear algebra. If you’re not sure what Apache is, check out this article. Here we introduce you to the project and its main projects once.


Mahout was already released in 2009 and since then it is constantly extended and kept up-to-date by a very active community.
Originally, it contained scalable algorithms closely related to Apache Hadoop and MapReduce.
However, Mahout has since evolved into a backend independent environment. That is, it operates on non-Hadoop clusters or single nodes.

Features

The math library is based on Scala and provides an R-like Domain Specific Language (DSL). Mahout is usable for Big Data applications and statistical computing. The figure below lists all machine learning algorithms currently offered by Mahout.

The figure below lists all machine learning algorithms currently offered by Apache Mahout.
Implemented mathematical functions and algorithms

The algorithms are scalable and cover both supervised and unsupervised machine learning methods, such as clustering algorithms.

Apache Mahout covers a large part of the usual machine learning tools. This means that data can be analyzed without having to change frameworks. This is a big plus for maintaining compatibility in the application.

Apache Ecosystem

The framework integrates seamlessly into the Apache Ecosystem. This means that an application can access the entire power of the data processing platforms and build very high-performance big data pipelines. The following figure shows the Apache data management ecosystem.

Apache Mahout ecosystem
Apache Mahout ecosystem

Through connectivity to Apache Flink, stream data analysis pipelines can be built, or with Hive data from relational databases can be automatically converted into MapReduce or Tez or Spark jobs.

What role does xml play in Industry 4.0?

What role does xml play in Industry 4.0? – XML is one of the most popular and widely used data formats. Its widespread use is also its most important advantage. XML is interpretable by both humans and machines and is therefore widely used to import and export application data. XML stands for Extensible Markup Language and is a markup language for representing hierarchically structured data in text file format. It was already published in 1998 and is primarily a meta language.

That means that on its basis application-specific languages are defined by structural and content restrictions. For example RSS, MathML, GraphML, but also the Scalable Vector Graphics (SVG). All web browsers are able to visualize XML documents using the built-in XML parser.

What is XML Document Structure

An XML document can always be described as the interaction of its main components. In addition to the data itself, these are the layout, i.e. the description of the relationships between individual containers, and the structure.

What is XML-  This figure shows how an XML format document structure is determined.
What role does xml play in Industry 4.0? – Document Components

An XML structure can be interpreted as a tree. Thus, each XML document has a root element and texts or attributes as sub-elements.

An XML document can have an optional header in addition to the actual data. XML declarations, i.e. references to an external document type definition (DTD), or internal DTD, or document type declarations can be placed here. Examples for these declarations are the XML version or the encoding.

Classification of the XML format

The XML format can be further classified. Which class comes into question when is determined by the use case. Mainly we decide between document-centered and data-centered. The document-centric XML format is based on a text document and is difficult to process by machine due to its weak structure. In data-centric, the schema describes entities of a data model and their relationships. This format is optimized for efficient processing by machines. The Semistructured format represents a hybrid of both.

Processing

The XML format allows both sequential and optional accesses. This can be done either by a “push”, where the program flow is controlled by the parser, or by a “pull”, where the flow is implemented in the code that calls the parser.
Management of the tree structure can be hierarchical as well as nested.

XML-Schema vs. Database-Schema

Besides XML, JSON is also a very popular markup language. In this article we have recorded the most important information about this format.

Another large field of computer languages, i.e. formal languages developed for interaction between humans and computers, is occupied by database languages. They describe the structure of a database. Here, too, the data is organized as a plan.

If you want to know more about database language, read our article on SQL and NoSQL. Here we explain the most important differences.


But how does this schema differ from an XML schema?

This figure shows the differences between an XML schema and a database schema.
What role does xml play in Industry 4.0?- Differences between an XML schema and a database schema.

XML contains nested elements with an unlimited nesting depth. To transfer this nesting to a database schema, the nested elements must be decomposed and linked by foreign key relationships.
In XML format, the elements within an element can be repeated as often as desired. Elements of a given type do not always have to contain the same child elements. However, the order of elements is an integral part of the document structure.
In a database schema, each column is always present only once and contains simple values. Therefore, if multiple elements are to be stored, another table must then be created. The order in which the values are stored is not important, unlike in XML.