EXPERT KNOWLEDGE AT A GLANCE

Tag: python

PyTorch vs TensorFlow – Facebook vs Google – Understanding the Most Popular Deep Learning Frameworks

In recent years, the field of data science has been able to access increasingly powerful analysis methods thanks to increasingly high-performance hardware. Google’s Tensorflow has been the benchmark for editing machine learning and modeling deep learning methods. It still has the most freedom today. But a wide range of options often creates a high barrier to entry.

PyTorch vs TensorFlow – With the 2 years younger, also Python-based, open source package PyTorch, Facebook now wants to knock Tensorflow off its throne. It has been steadily gaining popularity for years due to its simplicity and features.
In this article, we will clarify what is in the package and whether it can really compete with Tensorflow.

What is PyTorch?

Pytorch is one of the most popular open source Python packages for scientific computing and neural network development/training.
It was developed by Facebook in 2016 and is based on the Torch library written in Lua. A NumPy-like tensor library that provides rich GPU support to enable accelerated neural network learning. PyTorch is also often referred to as the library of the same name. More about this in the section “Libraries”.
Tensors form the elementary data structures for PyTorch, similar to Tensorflow.

PyTorch vs TensorFlow – Tensors form the basis for both!

The mathematical term tensor corresponds to a generalization of vectors and matrices. It is thus an elementary data structure for data representation and processing. In PyTorch the implementation is done as multidimensional arrays . A vector thus corresponds to a one-dimensional tensor.

the figure schematically shows the principle behind tensors.
PyTorch vs TensorFlow – Tensor Principle

More dimensions can be added to a tensor up to infinity. Common types of tensors are 3 dimensional tensors for time series, images are usually 4 dimensional and videos are five dimensional tensors.

The figure shows the role of tensors in the training of neural networks in PyTorch.
PyTorch vs TensorFlow – Tensors and neural networks

PyTorch methods manipulate tensors for linear algebra operations. These processes can run at high performance by moving the tensor objects into the graphics card memory.

PyTorch Libraries

Pytorch offers the possibility to include specific libraries. This way the program can be kept lean and only make references to needed code.
The PyTorch library itself is an optimized tensor library for deep learning on both GPUs and CPUs.
By including another library, PyTorch can also compute on TPUs.


Depending on the data type, different libraries can be loaded, which provide optimized methods and pre-modeled prototypes for analysis. Torchaudio offers besides the usual audio transformation methods also data sets for training. With torchtext large language packages can be accessed and with torchvision images can be analyzed.

The figure shows all PyTorch Libraries.
PyTorch Libraries

With TorchElastic, training jobs can be managed and elatically distributed, for example, to shared capacities.

PyTorch features

Through accelerated tensor analysis via allocation to GPUs, PyTorch achieves high flexibility and high speed in Deep Learning algorithms. Beyond this, PyTorch offers through its Python base unlimited compatibilities to powerful Python libraries, such as NumPy and SciPy and to the Cython programming language. Here we have collected the most important Python open source data management and analysis libraries.


Reverse-mode auto differentiation allows developers to modify network behavior at will, without delay or overhead. This allows for essential acceleration of research iterations.
The 8-bit quantization model ensures efficient deployment on servers and edge devices, and PyTorch Mobile can be used to develop for Android and iOS environments.
Other features include named tensor, artificial neural network pruning, and parallel training of models with remote procedure call.

PyTorch can access TorchServe, an open source server from Facebook, and is fully compatible with cloud provider Amazon Web Services (AWS). If you don’t know what AWS is, read our article on the subject.

PyTorch offers a hybrid frontend as an additional feature. This offers the possibility to choose between two modes. The Eager and the Graph mode. The eager mode primarily offers usability and flexibility, while the graph mode offers better speed, optimization and functionality in a C++ runtime environment. PyTorch also allows conversion with the Hybrid frontend. This allows models to be developed in eagermode and then transferred to graph mode for production.

PyTorch has unlimited access to ONNX (Open Neural Network Exchange) compatible platforms. ONNX is an open source project jointly developed by Microsoft, Amazon, and Facebook, among others, that enables the exchange of AI models between different tools.

PyTorch vs Tensorflow

Duell of the Giants

Just like the Facebook solution, Tensorflow works with the tensor data type. PyTorch scores with its simplicity and effective memory usage. Tensorflow, on the other hand, is much more scalable and thus better suited for production models. An essential difference was originally that with PyTorch the graph structure is defined during execution, while with Tensorflow it is first defined and then executed. Here, however, Tensorflow has now followed with its own eager mode. However, this is not yet fully developed at this stage.

PyTorch vs TensorFlow - The figure shows the main differences between Google's Tensorflow and Facebook's PyTorch
Tensorflow vs PyTorch

PyTorch vs Tensorflow – Who is ahead now?

It remains an exciting head-to-head race. Despite its recent development history, PyTorch has already made up a lot of ground and is interesting in an entrepreneurial context precisely because of its user-friendliness. As is often the case, however, it is not a question of which solution will come out on top, but rather of the principle that competition stimulates business. In the end, competitive pressure leads to great new innovations and exciting new tools.

scikit-learn – Machine learning, Data Mining and Data Analysis in Python for free

In almost no scientific discipline you can get around the programming language Python nowadays.
With it, powerful algorithms can be applied to large amounts of data in a performant way.
Open source libraries and frameworks enable the simple implementation of mathematical methods and data transports.

What is scikit-learn?

One of the most popular Python libraries is scikit-learn. It can be used to implement both supervised and unsupervised machine learning algorithms. scikit-learn primarily offers ready-made solutions for data mining, preprocessing and data analysis.
The library is based on the SciPy Toolkit (SciKit) and makes extensive use of NumPy for high performance linear algebra and array operations. If you don’t know what NumPy is, check out our article on the popular Python library.
The library was first released in 2007 and since then it is constantly extended and optimized by a very active community.
The library was written primarily in Python and is based on Cython only for some high-level operations.
This makes the library easy to integrate into Python applications.

scikit-learn Features

Easily implement many machine learning algorithms with scikit-learn. Both supervised and unsupervised machine learning are supported. If you don’t know what the difference is between the two machine learning categories, check out this article from us on the topic.
The figure below lists all the algorithms provided by the library.

The figure  lists all the upervised and unsupervised machine learning algorithms provided by scikit-learn..
machine learning algorithms provided by scikit-learn..

scikit-learn thus offers rich capabilities to recognize patterns and data relationships in a dataset. Thus, high dimensions can be reduced to visualize the relationships without sacrificing much information.
Features can be extracted and data clustering algorithms can be easily created.

Dependencies

scikit-learn is powerful and versatile. However, the library does not exist completely solitary. Besides the obvious dependency on Python, the library requires the import of other libraries for special operations.

NumPy allows easy handling of vectors, matrices or generally large multidimensional arrays. SciPy complements these functions with useful features like minimization, regression or the Fourier transform. With joblib Python functions can be built as lightweighted pipeline jobs and with threadpoolctl methods can be coordinated as threads to save resources.

SciPy turns Python into an ingenious free MATLAB alternative

Python vs MATLAB

== Open source Python library
– a collection of mathematical algorithms and convenience functions

– is mainly used by scientists, analysts and engineers for scientific computing, visualization and related activities

– Initial Realease: 2006; Stable Release: 2020
– depends on the NumPy module
→ basic data structure used by SciPy is a N-dimensional array provided by NumPy

Benefits

scipy benefits

Features

– SciPy library provides many user-friendly and efficient numerical routines:

scipy subpackages

Available sub-packages

SciPy ecosystem

– scienitific computing in Python builds upon a small core of open-source software for mathematics, science and engineering

scipy ecosystem
SciPy Core Software

More relevant Packages

– the SciPy ecosystem includes, based on the core properties, other specialized tools

scipy eco sidepackages

The product and further information can be found here:

https://www.scipy.org/

Seaborn – High-level interface for the visualization of statistical data in Python

Overview

== Python visualization library based on Matplotlib (Python’s core 2D plotting library)
– provides a high-level interface for the visualization of statistical data
– does not have its own graphics library, but uses the functionalities and data structures of Matplotlib internally

Dependencies

– Python 3.6
numpy
scipy
pandas
– Matplotlib

Matplotlib vs. Seaborn

Matplotlib weaken:

– bad default options for size and color of plots
– Low level technology compared to today’s requirements, requiring very specialized code to generate appealing plots
– no development for Pandas Dataframes

Features

– Built-in themes for styling Matplotlib graphics
– Dataset-oriented API for determining the relationship between variables
– Visualization of univariate and bivariate data
– Automatic estimation and display of linear regression models
– Plotting of statistical time series data
– works well with NumPy and Pandas data structures
– It comes with integrated themes for styling matplotlib graphics

seaborn
Overview of Seaborn plotting functions

The product and further information can be found here:

https://seaborn.pydata.org/

PyTorch BigGraph (PBG) – Facebook’s open source library for process embedding on large graphs for free

PyTorch BigGraph – The graph is a data structure that can be used to clearly represent relationships between data objects as nodes and edges.
These structures can contain billions of nodes and edges in an industrial context.

pygraph graph
Typical Graph structure

So how can the multidimensional data relationships be accessed in a meaningful way?
Graph embedding offers one possibility for dimension reduction.
This is a sequence of different algorithms with the goal of reducing the graph’s property relations to vector spaces. These embedding methods usually run unsupervised.
If there is a large property similarity, two points should also be close to each other in the vector space.

The reduced feature information can then be further processed with additional machine learning algorithms.

What is PyTorch BigGraph?

Facebook offers PyTorch BigGraph, an open source library that can be used to create very performant graph embeddings for extremely large graphs.

The figure shows the main principle of PyTorch BigGraph graph embedding.
PyTorch BigGraph Principle

It is a distributed system that can unsupervised learn graph embeddings for graphs with billions of nodes and trillions of edges. It was launched in 2019 and is written entirely in Python. This ensures absolute compatibility with common Python data processing libraries, such as NumPy, Pandas, and scikit-learn.
All calculations are performed on the CPU, which should play a decisive role in the hardware selection. A lot of memory is mandatory. It should also be noted that PBG can process very performant large graphs, but is not optimized for small graphs, i.e. structures with less than 100.000 nodes.

Facebook extends the ecosystem of its popular Python scientific computing package PyTorch with a very performant Big Graph solution. If you want to know more about PyTorch, you should read this article from us. Here we will show you the most important features and compare it with the industry’s top performer Google Tensorflow.

Fundamental building blocks

PGB provides some basic building blocks to handle the complexity of the graph. The graph partitioning splits the graph into equal parts and can be processed in parallel. PGB also supports multithreading computations. A process is divided into several threads, which run independently, but can access the same memory. In addition to the distribution of tasks, PyTorch BigGraph can also be used intelligently by distributed execution of hardware resources.

PyTorch BigGraph- How does the training work?

The PGB graph processing algorithms can process the graph in parallel using the fundamental building blocks already described. This allows the training mechanisms to run in a distributed manner and thus with high performance.

Once the nodes and edges are partitioned, the training can be performed for one bucket at a time.

The figure shows schematically the parallel training of PyTorch BigGraph which is enabled by graph partitioning.
PyTorch BigGraph – Parallel Training through Graph Partitioning

The training runs unsupervised on an input graph by reading its edge list.
A feature vector is then output for each entity. Here, neighboring entities in the vector space are placed close to each other, while unconnected entities are pushed apart. Thus, the dimensions are iteratively reduced. It is also possible to configure and optimize this calculation using parameters learned during training.

PGB and Machine Learning

The graph structure is a very information-rich and so far unfortunately too much neglected data structure. With tools like PGB the large structure is more and more equalized by high parallelism.

A very interesting concept is the use of PGB in machine learning large graph structures. Here, the graph structures could be used for semantic queries with nodes, edges and properties to represent and store data and could replace a labeled data structure. Through the connections between the nodes certain relations can be derived. By PGB the graph can be processed enormously parallelized. This would allow individual machines to train a model in parallel with different buckets, using a lock server.