Tag: Machine learning (Page 2 of 2)

Perceptrons – These Artificial Neurons are the Fundamentals of Neural Networks

October 28, 2020 / RainerGewalt / 1 Comment

A perceptron is a simple binary classification algorithm modeled after the biological neuron and is thus a very simple learning machine. The output function here is determined by the weighting of the inputs and by the thresholds.
Perceptrons are used for machine learning as well as for artificial intelligence (AI) applications. If you don’t know the difference between AI, neural networks and machine learning you should read our article on the subject.

What does the learning process look like?

A set of input signals are decomposed into a binary output decision, i.e. zeros or ones.
By training with certain input patterns, similar patterns can thus be found in a data set to be analyzed.
The following figure shows this learning process schematically.

The figure shows the perceptron learning process schematically. — ***Perceptron Learning Process***

If a set threshold is exceeded or not reached by weighting all inputs, the state of the neuron output changes.
If one now trains a perceptron with given data patterns, the weighting of the inputs changes.
The perceptron thus has the ability to learn and solve complex problems by adjusting the weights.

However, a basic requirement to obtain valid results is that the data must be linearly separable.

What are Multilayer Perceptrons (MLP)

A multilayer perceptron corresponds to what is known as a neural network. Perceptrons thus form the neuronal base, which are interconnected in different layers.

The figure below shows a simple three-layer MLP. Each line here represents a different output.

The figure shows a simple three layer multi layer perceptron structure. — *three-layer MLP*

However, neurons of the same layer have no connections to each other.
For each signal, the perceptron uses different weights and the output of a neuron is the input vector of a neuron of the next layer.
The diversity of classification possibilities increases with the number of layers.

Recurrent Neural Networks vs Feed-Forward Networks

Basically, neural networks are distinguished according to the recurrent and the feed-forward principle.

Recurrent Neural Networks

In the recurrent neural network the neurons are connected to neurons of the same or a preceding layer.
Here, a basic distinction is made between three types of feedback. With the direct feedback the own output of a neuron is used as further input. In indirect feedback, on the other hand, the output of a neuron is connected to a neuron of the preceding layers.
In the last feedback principle, lateral feedback, the output of a neuron is connected to another neuron of the same layer.

Feed-Forward Networks

In feed-forward networks, on the other hand, the outputs are connected only to the inputs of a subsequent layer. These can be fully connected, then the neurons of a layer are connected to all neurons of the directly following layer.
Or short-cuts are formed. Some neurons are then not connected to all neurons of the next layer.

scikit-learn – Machine learning, Data Mining and Data Analysis in Python for free

October 26, 2020 / RainerGewalt / 0 Comments

In almost no scientific discipline you can get around the programming language Python nowadays.
With it, powerful algorithms can be applied to large amounts of data in a performant way.
Open source libraries and frameworks enable the simple implementation of mathematical methods and data transports.

What is scikit-learn?

One of the most popular Python libraries is scikit-learn. It can be used to implement both supervised and unsupervised machine learning algorithms. scikit-learn primarily offers ready-made solutions for data mining, preprocessing and data analysis.
The library is based on the SciPy Toolkit (SciKit) and makes extensive use of NumPy for high performance linear algebra and array operations. If you don’t know what NumPy is, check out our article on the popular Python library.
The library was first released in 2007 and since then it is constantly extended and optimized by a very active community.
The library was written primarily in Python and is based on Cython only for some high-level operations.
This makes the library easy to integrate into Python applications.

scikit-learn Features

Easily implement many machine learning algorithms with scikit-learn. Both supervised and unsupervised machine learning are supported. If you don’t know what the difference is between the two machine learning categories, check out this article from us on the topic.
The figure below lists all the algorithms provided by the library.

The figure lists all the upervised and unsupervised machine learning algorithms provided by scikit-learn.. — machine learning algorithms provided by scikit-learn..

scikit-learn thus offers rich capabilities to recognize patterns and data relationships in a dataset. Thus, high dimensions can be reduced to visualize the relationships without sacrificing much information.
Features can be extracted and data clustering algorithms can be easily created.

Dependencies

scikit-learn is powerful and versatile. However, the library does not exist completely solitary. Besides the obvious dependency on Python, the library requires the import of other libraries for special operations.

NumPy allows easy handling of vectors, matrices or generally large multidimensional arrays. SciPy complements these functions with useful features like minimization, regression or the Fourier transform. With joblib Python functions can be built as lightweighted pipeline jobs and with threadpoolctl methods can be coordinated as threads to save resources.

Apache Mahout – A Powerful Open Source Machine Learning Project

October 18, 2020 / RainerGewalt / 0 Comments

Apache Mahout is a powerful machine learning tool that comes with a seamless compatibility to the strong big data management frameworks from the Apache universe. In this article, we will explain the functionalities and show you the possibilities that the Apache environment offers.

What is Machine Learning?

Machine learning algorithms provide lots of tools for analyzing large unknown data sets.
The art of data science is to extract the maximum amount of information depending on the data set by using the right method. Are there patterns in the high-dimensional data relationships, and how can they be represented in a low-dimensional way without much loss of information?

scikitLearn ml — Fields of machine learning

There is often a similar amount of information in the failure as when an algorithm was able to successfully create groupings.
It is important to understand the mathematical approaches behind the tools in order to draw conclusions about why an algorithm did not work.
If you don’t know the basic machine learning categories, it’s best to read our article on the subject first.

Machine Learning and Linear Algebra

Most machine learning methods are based on linear algebra.
This mathematical subfield deals with linear transformations, vector spaces and linear mappings between them.
The knowledge of the regularities is the key to the correct understanding of machine learning algorithms.

What is Apache Mahout

Apache Mahout is an open source machine learning project that builds implementations of scalable machine learning algorithms with a focus on linear algebra. If you’re not sure what Apache is, check out this article. Here we introduce you to the project and its main projects once.

Mahout was already released in 2009 and since then it is constantly extended and kept up-to-date by a very active community.
Originally, it contained scalable algorithms closely related to Apache Hadoop and MapReduce.
However, Mahout has since evolved into a backend independent environment. That is, it operates on non-Hadoop clusters or single nodes.

Features

The math library is based on Scala and provides an R-like Domain Specific Language (DSL). Mahout is usable for Big Data applications and statistical computing. The figure below lists all machine learning algorithms currently offered by Mahout.

The figure below lists all machine learning algorithms currently offered by Apache Mahout. — Implemented mathematical functions and algorithms

The algorithms are scalable and cover both supervised and unsupervised machine learning methods, such as clustering algorithms.

Apache Mahout covers a large part of the usual machine learning tools. This means that data can be analyzed without having to change frameworks. This is a big plus for maintaining compatibility in the application.

Apache Ecosystem

The framework integrates seamlessly into the Apache Ecosystem. This means that an application can access the entire power of the data processing platforms and build very high-performance big data pipelines. The following figure shows the Apache data management ecosystem.

Through connectivity to Apache Flink, stream data analysis pipelines can be built, or with Hive data from relational databases can be automatically converted into MapReduce or Tez or Spark jobs.

PyGraph – A Great Open Source Graph Manipulation Library in Python

September 12, 2020 / RainerGewalt / 0 Comments

In times of Big Data, the graph has become a popular data structure due to its flexible and clear relationship-based structure. Even entire database systems are now designed according to the graph principle. For more on this, read our article on NoSQL databases. Libraries, like PyGraph, allow you to perform fast queries and optimized graph manipulations. With its full Python implementation, it offers you a user-friendly and powerful tool.

What is a graph?

In a graph, objects are represented according to their relationships with each other. The objects are called vertices and the relations are called edges of the graph. An edge always connects exactly two nodes.
Graphs are often used to represent traffic networks, entity-relationship diagrams, syntax trees for programming languages, finite automata and proof or decision trees.

PyGraph - Schematic representation of a graph structure and its components — Schematic representation of a graph structure and its components

PyGraph supports different graph types

Basically, graphs must be differentiated between directed and undirected.
If a graph is directed, the edges may only be used in one direction. These edges are also called directed edges. If it is undirected, there are no directional constraints. So each edge is connected to an undirected pair of vertices. In the following figure we have contrasted both categories.

Schematic comparison of undirected and directed graphs — Comparison of undirected and directed graphs

You can use PyGraph regardless of these properties, because both types are supported.

PyGraph supports several algorithms

PyGraph supports the use of many well-known graph operations. For example, searching or traversing a graph, where all nodes of a graph must be visited, can be done in different ways. In the Depth-First Search (DFS) search algorithm, for example, the successors of a successor of the current node are visited first and only then the neighbors of the current node.

The depth of the search can also be limited accordingly. Breadth-First Search (BFS), on the other hand, first visits its own neighboring nodes and only then the successors of the neighboring nodes.

In addition to the algorithm-based search of a graph, other operations can be performed with PyGraph, such as the calculation of minimum spanning trees. This tree describes the best possible path to traverse all available nodes in a weighted graph. In the following figure we have shown you all currently supported algorithms.