Overview
– Open source stream processor framework developed by the Apache Software Foundation (2016)
– Data streams with high data volume can be processed and analyzed with low delay and high speed
Core functions
– diverse, specialized APIs:
→ DataStream API (Stream Processing)
→ ProcessFunctions (control of states and time; event states can be saved and timers can be added for future calculations)
→ Table API
→ SQL API
→ provides a rich set of connectors to various storage systems such as Kafka, Kinesis, Kubernetes, YARN, HDFS, Elasticsearch, and JDBC database systems
→ REST API
Stream Processing
== Data is processed continuously with a short delay
→ without intermediate storage of the data in separate databases
– several data streams can be processed in parallel
– Each stream can be used to derive own follow-up actions and analyses
Architecture
Data can be processed as unbounded or bounded streams:
-
Unbounded stream
-
have a start but no defined end
-
must be continuously processed
-
-
Bounded stream
-
have a defined start and end
-
can be processed by ingesting all data before performing any computations(== batch processing)
-
– Flink automatically identifies the required resources based on the application’s configured parallelism and requests them from the resource manager.
– In case of a failure, Flink replaces the failed container by requesting new resources.
– Stateful Flink applications are optimized for local state access