Skip to Content

How to Use Amazon Elasticsearch for Log Management

Posted on October 28, 2022 by

Categories: AWS

Tags:

Elasticsearch-based products have seen a number of Amazon Elasticsearch interesting new releases in the past several months, with AWS-hosted Elasticsearch being the most recent. The news clearly reflects the growing popularity of the ELK software stack, of which Elasticsearch is a component, among businesses worldwide.

So, based on your unique use-case, I wanted to recommend some approaches to enhance your usage of Elasticsearch for individuals wishing to establish an Elasticsearch or ELK cluster:

The search engine use-case: Using Elasticsearch as a potent search engine integrated within an application stack. This is most frequently utilised while searching on e-commerce websites or in similar circumstances.

AWS-hosted Elasticsearch makes a lot of sense in the first use-case. It gives you direct access to Elasticsearch and enables you to customise and configure it specifically to meet the demands of your application. (However, the one negative is that Amazon Web Services still supports the legacy version of Elasticsearch; I hope that this will change in the future.)

However, Elasticsearch alone is not a log management solution. It is just one of a number of elements required to build up a log management solution.

We at Logz.io have spent a lot of time assisting clients who have attempted to use Elasticsearch that is hosted by AWS. We’ve put up a list of suggestions and add-ons that will enhance the log analysis capabilities of Elasticsearch.

Here are our fourteen suggestions for people seeking for a log analytics solution and intending to use the Elasticsearch service hosted by AWS:

1. Message Queuing on AWS

Install a queuing system like Kafka, Redis, or RabbitMQ. Any ELK reference design must include this since Logstash may overuse Elasticsearch, which would slow it down until the small internal queue overflows and data would be lost. Additionally, since there is no method to store data during crucial cluster updates, upgrading the Elasticsearch cluster without a queuing system becomes next to impossible.

As an internal replacement for the aforementioned message queues, AWS now provides Amazon Kinesis, which is based on Apache Kafka. In this article, we explain how to send AWS Kinesis logs to Logz.io. You may look at our general Redis vs. Kafka comparison if you prefer the alternatives.

2. Beats and Logstash

The ELK Stack contains Logstash (the L from ELK), which parses the logs and produces JSON messages for Elasticsearch to index. Logstash reads data from the queuing system. It’s not easy to run Logstash consistently and on a large scale. For the queuing system to read data and submit logs to AWS Elasticsearch, you must connect at least one Logstash server. Additionally, to ensure that Logstash uses the appropriate amount of RAM and to properly parse the data, you must check GROK patterns. (For additional information about Logstash and problems to watch out for, see our tutorial.) See the section on log shipping below for additional information on Logstash’s role as a log shipper.

Over time, Elastic has developed a Beats family of lightweight shippers to solve problems with Logstash’s burdensome setup. The most important beats are Filebeat and Metricbeat, however the entire collection is helpful for a comprehensive monitoring solution.

3. Scalability of Elasticsearch

Although they are a part of the solution, using a queuing system in front of Logstash can help with scalability.

Elasticsearch scaling It’s simpler to say than to accomplish. The AWS solution does not automatically scale out and is static. Since adding another node will considerably increase the load when Elasticsearch transfers shards to the new nodes and will frequently result in cluster failure, naive scalability (with scaling groups) would not work in this situation.

Unfortunately, there isn’t a simple solution to this. Years of developer work have been dedicated to solving this issue at Logz.io. You’ll likely need to assign more resources, keep a close eye on clusters, and manually raise capacity if you feel like “winter is approaching.”

Clustering and sharding the incoming log data is a crucial component of scalable infrastructure. For more information on this, see our tutorial on Elasticsearch clusters.

4. Elastic Load Balancing and High Availability

If you use AWS, you are undoubtedly aware that EC2 instances can occasionally just cease functioning. We experience it practically daily. Make sure the ELK implementation is highly available by configuring the following if you want a solution you can rely on in production:

  • a highly available queuing system with complete replication that utilises two AZs
  • servers for Logstash that read from a queue
  • a three-master node Elasticsearch cluster that will operate across AZs

Many of these issues are intended to be addressed by the elastic load balancer. There are three basic types: network load balancing, application balancing, and classical balancing. More information on using the AWS ELB for monitoring may be found here.