At Trek10, we frequently ask, “What is the best cost-effective and efficient AWS platform service for a new system to employ for ingesting data?” This is a question that arises when we create Serverless AWS systems. While it’s easy to simply create EC2 instances and load data into them, we at Trek10 strive to make all new systems as “serverless-ly” designed as possible by extensive use of AWS platform services like Lambda, DynamoDB, and S3.
This problem affects a wide range of applications: Of course, the Internet of Things (IoT) is one of the most obvious (getting data from “things” into the cloud), but there are many others, such as remote offices sending data to a central system, a slow or “lazy load” migration from your data center, or even an always-on integration between legacy environments and a new AWS environment.
So, let’s get back to the test. Several different AWS services are designed specifically for data intake, and it turns out that one of them might be the most efficient and appropriate choice. The purpose of this section is to attempt to summarise the plot for you.
(With two quick disclaimers: this is not meant to be exhaustive; there are plenty of other options; we simply believe these to be the best. (Note that all prices reflect the US Eastern Time Zone.)
In the Kinesis Streams
A queueing service for data streams in real-time. To handle data, Kinesis Streams consumer apps pull it from producer apps. There is no need to maintain a server to process and store the data from Kinesis Streams because AWS Lambda functions may act as a consumer.
- One advantage is that it can handle a large amount of traffic. Highly adaptable
- Producing and consuming applications are not simple to construct, which is a drawback. Costs associated with maintaining your consumer app or Lambda; maximum PUT size of 1 MB
- Priced at $10.80 per month for a single “shard” (1 MB/s entrance and 2 MB/s egresses) and $0.014 per million PUT payload units.
- Every 25KB is a PUT payload unit.
Force of Kinesis Firehose
Firehose streamlines the Streams user experience. The Firehose service will routinely send your data to either Amazon Simple Storage Service (S3), Amazon Redshift (R), or Elasticsearch.
Forepros: It’s not hard to load data into Amazon’s S3, Redshift, or Elasticsearch.
Negative: Each item may only be one megabyte in size. More complicated processing or disaggregation of the data uploaded to S3 necessitates the use of additional services.
Kinesis Firehouse costs $0.035 per GB consumed in addition to S3 fees (albeit due to buffering and compression, the latter is often a negligible percentage of the total).
The original and best AWS feature: massively scalable object storage. Having read-after-write consistency means that S3 may be used as a component of your ingestion process, not only as a final storage destination. Earlier, we discussed a similar structure’s design.
- Features like an item size cap of 5TB and an extremely user-friendly interface are also pluses.
- Negative: Relying on third-party services is essential for any sort of processing
The high price tag associated with Amazon Simple Storage Service (S3) is healthy knowledge. We’ll disregard it here because these services may archive the data in S3 regardless. The critical question is how to direct S3 PUTs to stack up against the alternatives.
- Cost: $0.05 per 1000 PUTs
- Currently, there is a 5 MB limit on the size of a single PUT (multi-part PUTs can allow you to push a total object of up to 50GB)
- the AWS IoT Platform
Newer AWS service that facilitates the development of IoT programs. Data may be published, processed, and stored using its underlying MQTT broker and rules engine.
Pros: Works well for “constrained” (low-power, low-compute) edge devices with limited amounts of data. In this regard, both MQTT and the AWS IoT SDK were explicitly developed for this application.
- Additional processing and storage services are required, which is a drawback. The new message length is 512 bytes.
- It scales up by $5 million in messages every year.
- A maximum of 128 kilobytes (KB) per message is permitted.
- Generally Accepted Practices
We have calculated the expenses of each of these strategies under different data ingestion scenarios. Here are a few overarching interpretations to help you make sense of it all:
AWS IoT is likely required if your data producers have limited computing resources. An option to reduce the expense of data input is to use AWS Greengrass for edge buffering and processing.
Streams, Firehose, and S3 all cost less than a hundred dollars a month for under ten thousand PUTs per hour at fifty kilobytes each PUT. Therefore, cost need not be a primary design issue. Decide which service works best with your current setup.
That “don’t consider cost” threshold is closer to 100,000 PUTs per hour at 512 bytes or 1,000 PUTs per hour at 50 KB when using AWS IoT Service.
Kinesis Streams triumph over the competition when the number of PUT requests is significant, but the payload size is small (a few hundred bytes). The price for 10 million PUTs per hour with a 5 KB payload is only $255 per month, and with a 50 KB payload, it’s only $1,700 per month!
While Firehose is competitive up to around 10 terabytes per month, it begins to separate itself from Streams at that capacity. If the processing paradigm and ease of use of Firehose are what you want, and you’re not dealing with tens of terabytes of data each month, then go with that.
With a lot of data being put in at once and relatively little in size (10s of KB), S3 is not a good option. However, when you get closer to Kinesis’s 1 MB restriction, S3’s prices appear more competitive. Streams are $3400/mo for 1M PUTs/hr at 1MB, and S3 is $3600/mo. Consequently, when you go closer to more important things, you may want to consider S3 if its ease of use meets your requirements. Above 1MB, however, you will be limited to using S3.
Let’s hope this helps. Email us on Twitter at @trek10inc if you have any inquiries or suggestions on the AWS data ingestion possibilities.
While at it, look at how AWS Lambda’s pricing stacks up against EC2.