top of page
Search
Sidero Ltd

Enabling Powerful Search and Analytics on DynamoDB content with Elasticsearch

Sidero cloud center of excellence analysed and implemented the steps of integrating DynamoDB with Elasticsearch to provide powerful search and analytics capabilities for data stored in DynamoDB, which also allows customers to analyse transactions for audit purposes.




In this blog we will discuss the integration of DynamoDB with Elasticsearch to provide powerful search and analytics capabilities for data stored in DynamoDB. We will start by looking at the strengths of DynamoDB and Elasticsearch, which helps us to understand the different aspects of the application that are handled by these tools;

AWS DynamoDB:

  • Serverless Key-Value and Document database

  • Highly Durable Storage

  • Low Latency

  • High Throughput

  • Highly Scalable (nearly infinite)

Elasticsearch:

  • Apache Lucene based search engine with RESTful interface

  • Supports indexing large number of attributes

  • Supports many search capabilities like phrase matching, spelling correction, synonym correction, stemming, bigram matching, SQL like search, Full text search etc.

  • Supports search for nested json document

  • Supports relevance/score, sort

  • Supports geospatial data

  • Supports Aggregation

  • Dashboard using Kibana

It is possible to leverage the secondary index capabilities of DynamoDB to avoid costly scans, which support only a limited type of queries. However for a number of analytic/search use cases, it is cost effective (by avoiding scans) to export the data from DynamoDB into a different purpose-built system. For applications which use the DynamoDB as primary data source and demands powerful, modern search experiences, Elasticsearch can be a good fit. The combination of DynamoDB and Elasticsearch make it possible to use DynamoDB as SSOT while using Elasticsearch as the secondary data store optimised to provide the desired search and analytics experiences.


Amazon Elasticsearch Service is a fully managed service that makes it easy to deploy, secure, and run Elasticsearch cost effectively at scale. The service provides support for open source Elasticsearch APIs, managed Kibana, integration with Logstash and other AWS services, and built-in alerting and SQL querying.

Solution:

The following diagram shows the components and workflow for pulling DynamoDB table change to Elasticsearch service.


This solution makes use of DynamoDB Streams and AWS Lambda function. Workflow for this setup as follows:

  1. Application makes changes (Create / Update / Delete item) to DynamoDB table

  2. A new DynamoDB stream record is written to reflect the changes made to the table

  3. Lambda reads records from DynamoDB stream and triggers the AWS Lambda Function

  4. Lambda functions Create / Update / Delete the document in Elasticsearch. Lambda functions can do the necessary transformation to the document content before writing it to Elasticsearch

  5. Application / users perform search / analytics on the Elasticsearch Index

  6. CloudWatch alarm based on IteratorAge metric can be used to monitor function error, as an increasing trend in this metric would indicate function errors.

Below are a few key integration points in the setup:

  • DynamoDB Stream: Enable DynamoDB stream table for desired table as below:


  • Lambda: AWS Lambda function gets triggered by event source (DynamoDB Streams) and publishes associated changes to Elasticsearch. Below are few key configurations for Lambda:

    • Policy: Lambda must have execution role with enough permissions, as shown below, to read from DynamoDB Stream, write/delete from Elasticsearch and access / write to CloudWatch. Lambda must have trust relationship associated.

For least privilege, permission can be given only to the required resources.


  • Trigger: Add required DynamoDB table in trigger configuration as shown below:

A reasonable number of retries and a maximum record age, according to use case, can be configured in trigger configuration. Lambda will discard the batch after this retry/max age and to retain a record of discarded batches, configure a failed-event destination (SNS or SQS). Lambda sends a document to the destination queue or topic with details about the batch.


  • Code: Below is sample code to update ES index. It makes use of boto3 for getting credentials. To create the record in ES (as shown below) appropriate ES host, index name and type details are required.

This code is not doing any transformation in DynamoDB document before writing it to ES. Here, the ES index is not created in advance, so the index will be created as the first record is inserted into ES. For production use case index mapping needs to be fine-tuned as per requirements.


  • Elasticsearch: Once record is saved to ES, we can see ES index details as shown below.


Advantages of this solution:

  1. Enables indexing and analytics on DynamoDB content

  2. Adapting Lambda functions for data transformation provides quick and near real time sync between DynamoDB table and AWS Elasticsearch

  3. Using alarm and SNS allows us to keep track on lambda events for monitoring and audit purposes

Things to Consider:

  1. If the table has already some data, then the initial load of table data to Elasticsearch is required.

  2. For Elasticsearch indexing, a specific field in every document must have the same datatype.

  3. Limit of simultaneous Readers of a shard in DynamoDB streams

  4. Limits on maximum write capacity for a table with a stream enabled

  5. Maximum execution time of Lambda

Use Cases:

Below are few use cases were DynamoDB can be used as the primary data source and ES as secondary data source for Search / Analysis

  1. Search / Analysis across Product catalog

  2. Search / Analysis for transactions, audits and logs

  3. Monitoring time series data like events and metrics

  4. Data visualisation using analytics platform/tools



243 views

Recent Posts

See All

Commentaires


bottom of page