Overview

– Open source stream processor framework developed by the Apache Software Foundation (2016)
– Data streams with high data volume can be processed and analyzed with low delay and high speed

flink analytics — Flink provides various tools for efficient real-time processing of continuous data streams and batch data

Core functions

– diverse, specialized APIs:
→ DataStream API (Stream Processing)
→ ProcessFunctions (control of states and time; event states can be saved and timers can be added for future calculations)
→ Table API
→ SQL API
→ provides a rich set of connectors to various storage systems such as Kafka, Kinesis, Kubernetes, YARN, HDFS, Elasticsearch, and JDBC database systems
→ REST API

Stream Processing

pexels pixabay 2438 — How to handle this flood of data?

== Data is processed continuously with a short delay
→ without intermediate storage of the data in separate databases
– several data streams can be processed in parallel
– Each stream can be used to derive own follow-up actions and analyses

Architecture

Data can be processed as unbounded or bounded streams:

Unbounded stream
- have a start but no defined end
- must be continuously processed
Bounded stream
- have a defined start and end
- can be processed by ingesting all data before performing any computations(== batch processing)

– Flink automatically identifies the required resources based on the application’s configured parallelism and requests them from the resource manager.

– In case of a failure, Flink replaces the failed container by requesting new resources.

– Stateful Flink applications are optimized for local state access

Category: query

Apache Flink

Overview

Core functions

Stream Processing

Architecture

Support us…..

Recent Posts

Archives

Categories

Meta