Skip to Content

Storing data at scale using Amazon DynamoDB ( aws dynamodb)

Posted on October 25, 2022 by

Categories: AWS

Tags:

One-of-a-kind end-to-end eCommerce, fulfillment, and logistics platform is our Ocado Smart Platform (OSP). It was designed to provide some of the most prominent merchants in the world access to the finest consumer experiences and unparalleled financial returns made possible by technology.

Its design uses hundreds of microservices to support the online grocery stores run by our partners.

We must develop and employ performant and scalable solutions since the demand for online grocery services is proliferating, and the amount of data we manage is also growing.

Amazon’s DynamoDB is a crucial tool for us when it comes to data storage, and it’s frequently recommended as an alternative to Amazon’s RDS. It can handle any volume of data and offers scaleable performance in the single-digit millisecond range as a fully managed NoSQL database.

At Ocado Technology, we are frequently tasked with developing original and cutting-edge solutions to complex issues. DynamoDB has been effectively implemented in several mission-critical areas, including the microservices in charge of shopping baskets, applying for promotions, placing purchases, and processing refunds.

It might be challenging to modify your perspective on data modeling if, like many of our engineers, you have experience with relational databases. To assist us in understanding DynamoDB’s concepts and making the migration, I’ve compiled several best practices and design patterns.

I utilize specific recognized DynamoDB keywords throughout this article. Before continuing, make sure you are familiar with the following:

  • Dividend Key (PK)
  • Sort Key (SK)
  • Secondary Index Local (LSI)
  • Secondary Index Worldwide (GSI)

Please be aware that the examples below are only intended for demonstration and do not reflect actual OSP implementations.

Foremost, data access patterns in AWS dynamodb

The main emphasis of the relational paradigm is on data, or rather, relations. To organize your entities in a memory-efficient way, model them into several tables and adhere to the normalization requirements.

The NoSQL strategy, in contrast, optimizes CPU cycles rather than the memory footprint. You want to denormalize the data and save it in formats near what the company actually requires. For this reason, you must thoroughly analyze the data access patterns before you begin data modeling.

What precisely are data access patterns?

The read patterns that your application must adhere to come first. Consider the many methods users search for data, the filters they employ, the features often combined, etc.

Read access pattern examples.

However, reads are not the only aspect of access patterns. Your application’s data writing practices may also determine your access patterns. Your team needs to determine whether your app is write-heavy, read-heavy, or possibly balanced in terms of the access patterns that should significantly influence your data model.

Consider a basket used for shopping. The most common actions taken by users are to add or delete a single item, add more items, or check the entire basket’s contents. You may consider modeling it for quick readings, but it would involve storing the basket’s entire contents each time (1). Because any modification to a basket would write its entire contents back into the database, this looks unnecessary and might have an impact on performance at scale. Instead, you would store each item individually using the write access pattern (2). Only the first item would require two writes—item information and basket metadata. Later-added items will just need a single write with a little payload.

  1. Items from the shopping cart with just one characteristic
  2. Items from the shopping cart are kept apart.

What makes access patterns so crucial?

It’s crucial to carefully consider your data access patterns when approaching data modeling for DynamoDB, to write them down, examine them, and then repeat them. Simply said, it’s too simple to overlook specific use cases or properties. When this happens, accessing the necessary data from your DynamoDB table will be challenging or even impossible.

Keep in mind that some flexibility must be given up while optimizing the data model for performance.

1. One table is (usually) sufficient.

Keeping your corporate entities in different, specialized DynamoDB databases might be tempting. At first glance, it makes the data model simpler to comprehend and is more similar to the tried-and-true relational database management system (RDBMS) approach, which helps users feel more at ease.

However, it is not ideal. Any use case that calls for combining data from several DynamoDB tables will require numerous round trips to your database and perhaps some extra logic in the application. Your service performance might be impacted by this, especially at scale.

On the other hand, using a single table architecture and index overloading techniques will let you store the related objects next to one another. Instead of searching for ad hoc joins of data from several tables, you should concentrate on the well-chosen, modeled joins that are crucial to your application logic.

2. Overloading a partition

Understanding the idea of storing many object types in a single DynamoDB table could take some time. Making your partition key and sorting key attribute names as broad as possible is an incredible hack that prevents you from thinking in entity-centered terms; in our experience, PK and SK work just well.

You may partition your table into various namespaces, or data containers, without encoding their types into the table schema by using generic PK and SK attribute names. If it’s an ordered, itemId, customerId, or any other entity, it’s not clear from the PK attribute name. It is a simple matter of specifying a new format for its PK to overload it or, to put it another way, to add a new data container to your table. Additionally, you may represent several complicated data connections in a single table if you utilize composite keys differently for each entity type.

Let’s look at a customer orders table example that has two distinct data containers:

  • Customer info with customer ids PK and SK
  • Orders placed by customers, where PK = customer id and SK = order id

3. A single table with orders and client details

When employing partition overloading, it is advised to create an entityType property with values like order, item, or customer. This would facilitate the identification of your entities for both human readers and programmers, saving you time from having to decode the format of your PK or SK. Don’t forget to record your overloaded partitions and key formats, just one more thing.

Index overflow

With DynamoDB, secondary indexes, you may give your data new views. They are often a key component needed to offer all of your application’s data access patterns.

A local secondary index lets you specify yet another method of sorting your primary data since it uses the same partition key but a separate sort key. It is, in essence, a different sort of key.

The partition key and sort key for a global secondary index may differ from those specified in the primary table. They essentially give you a new, virtual table with the same rows of data but a new main key.