Data Warehousing – In today’s flood of data, it is becoming increasingly difficult to maintain a clear data management system. More and more data sources are recorded via different software systems.
A unified, centralized system can facilitate analysis and ensure that only one data truth exists in an organization.
Table of Contents
What is a Data Warehouse System?
Data warehouse systems are built by integrating data from multiple heterogeneous sources and, in addition to centralization, performs the task of structuring data, supporting analytical reporting and structuring decision-making.
The system can perform data cleansing as well as data integration and data consolidation and does not require transaction processing or recovery.
It is thus a powerful Big Data information system that can centrally handle everything related to data processing.
What does a Data Warehouse structure look like?
The term data warehouse is used to describe various architectures and systems. However, multi-layer architectures are typical. In this article, we will introduce you to the most commonly used three-tier architecture.
If you are interested in the different types, you should read this article from us on the topic. Here we present the individual types in detail.
This article is primarily about what the advantages of the system actually are and how the data communication works.
Data Warehousing Features
Data warehousing offers several features. Such an information system is subject oriented. It does not focus on the current operation, as these data are separated. This means that frequent changes in the operational database are not reflected in the data warehouse. Thus, the focus is on modeling and analysis of data.
The system is Time variant, which means that the collected data are identified with a certain period of time and previous data are not deleted when new data are added.
However, some terms that often come up in connection with this system need to be clarified.
When metadata is mentioned, a kind of roadmap to the data warehouse is meant. Here the warehouse objects are defined and it acts as a directory. This means that the decision support system finds the contents via the metadata.
The metadata is stored in the metadata repository. An integral directory that manages both the business metadata, i.e. data ownership information, business definition and change policies, and the operational metadata. Operational metadata refers to the timeliness of the data is it active, archived or cleansed, and data lineage, which is the history of the data. This includes the data used to map the operational environment, source databases and their contents, data extraction, data partitioning, cleansing, transformation rules, data refreshing and cleansing rules, but also the algorithms for summation, dimensional algorithms, data for granularity, aggregation, summation, etc.
The so-called data cube represents data in multiple dimensions and the data mart contains only the data specific to a certain group.
Load Data into Warehouse
In addition to the different components and architectures, data can also be transmitted to the information system in different ways.
As shown in the figure, a basic distinction is made between two elementary processes.
What is ELT?
Extract, Load, and Transform, or ELT for short, is about extracting aggregate information from the source system and loading it into the target method.
The following figure shows such an example system. In this case, the Hadoop framework handles the central data management, while applications and analysis tools access the untransformed data.
What is ETL?
In Extract, Transform and Load, or ETL for short, the data set is first extracted from the sources into a staging area, then transformed or reformatted with business manipulation performed on it, and only then loaded into the target or destination database or data warehouse.