Kinesis KPL vs API
The Kinesis Producer Library (KPL) and the Kinesis API can both be used to send data to Kinesis Data Streams. The advantage of the KPL is it provides a lot of added features, such as failed transmission handling built in. If you use the Kinesis API you have to code these features yourself. The advantages of Kinesis API are that records can be sent without delay, whereas the Kinesis KPL has buffering built in. Also, the Kinesis API is accessed via AWS SDKs which are available in many programming languages including the Data Scientist’s favorite, Python. KPL is available as a Java library only.
Last updated: 5 June 2021
|Features||Kinesis Producer Library (KPL)||Kinesis API|
|Delay||Possible, due to the buffer feature||Instant, no delay.|
|Languages||Java only||Many including Python|
|Failed transmission handling||Built in||Must be coded|
|Available features||Restricted to features relevant to Producers||API provides complete access to all Kinesis features|
Video: Amazon Kinesis Consumers Explained
8.08 minutes video by Stephane Maarek
Kinesis Producer Library (Kinesis KPL)
The Kinesis Producer Library is used to transmit records to Kinesis Data Streams. It acts as an intermediary between the producer application code and Kinesis Data Stream API actions. The KPL sits on the Kinesis API and provides a subset of functions specific to Producers.
Video: How can I put records into an Amazon Kinesis data stream using the KPL?
9.44 minutes video from AWS
- Performance benefits
- Consumer- side ease of use via the Kinesis Client Library
- Producer monitoring via CloudWatch
- Asynchronous architecture, KPL has a buffer to store records whilst they are processed
- Buffering can delay feeding data to Kinesis
- KPL is a Java library
The Kinesis API is used by the AWS Software Development Kit (SDK) to add records to Kinesis Data Streams. There are SDKs for many languages including Python. The API exposes the full range of capabilities of Kinesis Data Streams, not just those concerned with data producers.
Data is transmitted using two classes:
- PutRecord – this sends records one at a time
- PutRecords – this class sends batches of records, up to 500, for higher throughput
When the programmer has to handle records that fail to get sent.
Kinesis Client Library (KCL)
The Kinesis Client Library (KCL) sits on the Kinesis API and calls the Kinesis Producer Library to help extract user records from Kinesis Data Stream records. The KCL is a Java library, although it may be used by other languages via the MultiLangDaemon.
KCL uses checkpointing to ensure all records are recovered. To do this it uses DynamoDB to store check pointing data. Note, if the DynamoDB is under provisioned throttling may be experienced. So the throughput of the DynamoDB database must be in balance with the provisioned throughput of the Kinesis Data Streams. KCL provides connectors to Amazon DynamoDB, Amazon Redshift, Amazon S3, and Elasticsearch.
The Kinesis Connector library is a legacy, deprecated library.
Kinesis Data Streams is served by two methods to ingest data and one to extract it. The Kinesis Producer Library and Kinesis API are used to feed data in and the Kinesis Client Library is used to extract it.
This study guide is part of subdomain 1.2, Identify and implement a data-ingestion solution. This is part of the Data Engineering domain.
AWS Certified Machine Learning Study Guide: Specialty (MLS-C01) Exam
This study guide provides the domain-by-domain specific knowledge you need to build, train, tune, and deploy machine learning models with the AWS Cloud. The online resources that accompany this Study Guide include practice exams and assessments, electronic flashcards, and supplementary online resources.