Apache Flink is a distributed stream processor with intuitive and expressive API’s to implement stateful stream processing applications. Stream processing is becoming more popular because it provides superior solutions for many established use cases such as ETL and transactional pipelines.
Data and data processing have been omnipresent in businesses for many decades. The typical architecture most businesses implement distinguishes two types of data processing: transactional and analytical.
Companies use all kinds of applications for day to day business activities, such as enterprise resource planning (ERP) systems, customer relationship management (CRM) software and web based applications. These applications are typically designed with separate tiers for data processing and storage

This application design can cause problems when applications need to evolve or scale, so changing the schema of a table or scaling a database system requires careful planning and a lot of effort. A recent approach around this is the microservices design pattern. Microservices are designed as small, self-contained and independent applications. More complex applications are built by connecting several microservices with each other that only communicate over well-defined interfaces.

The data that is stored in various transactional database systems of a company can provide valuable insights about a company’s business operations. However, transactional data is often distributed across several disconnected database systems and is more valuable when it can be jointly analysed. This is where datawarehouses come in, with ETL processes being used to transofrm it into a common representation.

Virtually all data is created as a continuous stream of events. Stateful stream processing is an application design pattern for processing unbounded streams of events and is applicable to many different use-cases in the IT infrastructure of a company.
Any application that processes a stream of events and does not just perform trivial record-at-a-time transformations needs to be stateful, with the ability to store and access intermediate data. Apache flink stores the application state locally in memory or in an embedded database. Flink guarantees fault tolerance by periodically writing a consistent checkpoint of the application state to a remote and durable storage.
