ServicesTechnologies

Datadog: A Brief Overview of Monitoring Platform

By PlaysDev
Published: Nov 21, 2023

The trend towards increasing complexity of systems and applications is driving the adoption of more effective monitoring tools. In this context, Datadog stands out as a powerful and versatile tool.

What are the main features of Datadog and why do companies choose this paid SaaS monitoring in the world of open source and free software? Let’s figure it out.

The main factors we focused on are:

  • Price
  • Scalability
  • UI and Ease of Use
  • Functionality
  • Customization Capability
  • Predictable Usage
  • Deployment Simplicity

We will try to elaborate on each of them. Let’s start with the features of Datadog that have kept it at the top of the monitoring tools for several years and also conduct a brief comparison with the Prometheus-Loki-Grafana stack.

Easy Initial Setup (Low Entry Threshold)

For monitoring, it is enough to install an agent on the host and/or add the library in the application. The user-friendly interface and dashboards with key metrics save a significant amount of time at the initial stage. Moreover, Datadog has simplified integration with Microsoft Azure, making it even more attractive compared to Prometheus-Loki-Grafana. The latter requires initial configuration for collaboration and additional setup of exporters.

Dashboard Creation for Applications, Environments, and Custom Metrics in One Panel

Drag-and-drop widgets allow the creation of custom views without the need for coding. The visualization toolset allows data to be viewed in various formats and enables report creation.

A significant advantage is the ease of query creation, unlike Prometheus. In Datadog, everything is quickly and easily configured through the UI (often, Datadog even suggests potential queries, or metrics that already provide the required information, such as cpu_usage – a metric that returns the percentage of CPU usage).

Database Monitoring

Datadog Database Monitoring supports both self-hosted and cloud versions of Postgres, MySQL, Oracle, and SQL Server. The “Query Metrics” dashboard displays the historical performance of normalized queries. It allows visualizing performance trends using infrastructure or custom tags, such as data center availability zones, and provides alerts for anomalies.

Datadog also supports functionalities like:

  • Identifying queries that take the most time.
  • Tracking database-level metrics not captured by APM, such as updated/returned rows.
  • Filtering and grouping queries based on arbitrary parameters like command, user, cluster, and host.
  • Detecting unusually slow but rare queries not captured by metrics.
  • Assigning the execution of a specific query to a user, application, or client node.

A key feature is Datadog’s ability to monitor cloud versions of databases, a capability absent in PLG. In such cases, separate services like AWS CloudWatch or Azure Monitor would be necessary.

Serverless Stack Monitoring

Datadog provides solutions for monitoring AWS Lambda, Azure App Service, Azure Container Apps, and Google Cloud Run with features including:

  • Real-time alerts on memory, timeout, and parallelism metrics to prevent degradation of end-user service quality.
  • Tracking microservices calls for end-to-end visibility into client requests.
  • Visualization of distributed microservices on a service map, categorized by tags such as function, client, version, etc.
  • Receiving and analyzing 100% of traces over the last 15 minutes.
  • Isolating individual client requests and transitioning to related logs and metrics for a complete history.
  • Monitoring anomalies, outliers, and forecasting based on machine learning.

Kubernetes (K8s) Monitoring

This point is noteworthy as Datadog integrates with Kubernetes, Docker, containerd, and Istio, allowing:

  • Collection of metrics, events, and logs from cluster components, pods, and other Kubernetes objects.
  • Gathering container-level metrics for detailed resource breakdowns (at the Docker and containerd levels).
  • Datadog’s agent automatically tracks Kubernetes cluster nodes.
  • The Datadog automatic detection system, along with over 650 built-in integrations, automatically monitors technologies you deploy.
  • APM and distributed tracing provide an understanding of applications running in Kubernetes clusters at the transaction level. Datadog ensures monitoring at various levels of infrastructure, thanks to the installed agent—a feature PLG lacks, as it requires deploying Helm charts, installing kubernetes-event-exporter, and configuring all of it.

Unified Monitoring

Datadog offers robust monitoring capabilities for real-time tracking of various components’ performance. This includes server status monitoring, network activity, application response times, etc.

  • Log collection: Datadog allows organizations to centralize and analyze log data, aiding in troubleshooting and pattern identification.
  • Real-time monitoring: The real-time monitoring feature provides instant updates on performance and system status.
  • API monitoring: Datadog enables tracking of APIs to ensure their availability and responsiveness.
  • Response time tracking: The platform provides information on application response times, optimizing user experience.

Synthetic Monitoring

Synthetic tests allow observation of how systems and applications operate using modeled queries and actions from around the world. Datadog monitors the performance of web pages and APIs from the server-side to various network levels (HTTP, SSL, DNS, WebSocket, TCP, UDP, ICMP, and gRPC) in a controlled and stable manner, alerting about malfunctions.

  • Calculating SLOs on key endpoints and user routes simplifies adherence to application performance targets and ultimately ensures stable customer service quality.
  • Synthetic tests can be created within the Datadog application, using the API, or with Terraform.

Despite all the capabilities of Datadog, there are aspects that should not be overlooked when choosing and implementing this solution.

Complex Log Reception, Indexing, and Storage Process:

The log analysis process in Datadog is more complex than it should be. While you can send logs to Datadog, analyzing them proves challenging. If you want to analyze logs, you need to index and store them, which comes with a separate pricing structure for reception and storage. Due to complexity and cost structures, some organizations prefer not to store as many logs as they might need or want. This leads to challenges in troubleshooting and root cause analysis, especially for persistent issues that continue beyond the storage period.

To index and analyze logs, they need to be extracted from cloud object storage (e.g., Amazon S3) and reanalyzed. This process can take several hours and requires someone to manage it. Due to a constant shortage of specialists and an excess of work for DevOps and site reliability teams, many organizations cannot handle this level of complexity.

Expensive Log Analysis Workflow:

Regarding logs, Datadog charges $0.10 USD for data ingestion and from $1.06 USD (3 days) to $2.50 USD (30 days) for storage. To store logs for a longer duration, you need to contact Datadog and negotiate individual prices, which can quickly escalate as your company scales. While Datadog is useful for monitoring and detection, when it comes to root cause analysis and issue resolution, these costs can quickly spiral out of control.

Scaling Issues:

Reducing log retention periods can be a significant compromise, resulting in a loss of visibility into more complex issues—from prolonged performance problems with applications and infrastructure to persistent security threats. Many startups, starting with Datadog, discover that as they scale, they eventually spend absurd amounts of money on log retention. With scaling, Datadog becomes more expensive and challenging to use. Datadog has made a name for itself as a monitoring tool for startups due to its quick and easy setup. However, as projects grow, it becomes more expensive and challenging to use.

However, the ability to monitor serverless applications, k8s clusters and databases, and the use of AI in analyzing metrics and logs, makes Datadog undoubtedly one of the most modern and popular SaaS applications.

You may also like

Expertise
2024-03-22
PlaysDev
Books for self-development – what to read for self-discipline
What to read for self-development: a list of useful books that are suitable for everyone. These books will help you develop self-discipline, expand your knowledge in the field of business and reach new heights in your professional activities, provided that you are striving for this! Suitable for employees, managers and students.
Читать
Expertise
2023-12-28
Dmitry Ostroga
Ways to increase employee motivation
More and more managers are facing a period of so-called «stagnation» in the team, not knowing how to find a common language with colleagues and believing that employees can only be motivated financially.
Читать
Technologies
2024-01-20
PlaysDev
Google launches Gemma – New Open-Source AI model
Review of the latest Google Gem release. How is Gemma different from Gemini and what are its key advantages? What technologies does Gemma use and why should developers try it?
Читать
Technologies
2024-04-17
PlaysDev
What is Google Colab and how are CPU, GPU, TPU processors used?
Let's talk about Google Colab. What is this tool and how to use it, who needs it? What are the main differences between the processors used by the Google Colabs platform?
Читать
Technologies
2023-12-08
Dmitry Ostroga
New technologies and development trends 2023
This article is about new technologies and development trends 2023.
Читать
Industries
2024-03-12
Dmitry Ostroga
IT Conferences for business: What’s beneficial about it and where to find one
Learn where to find the most impactful IT conferences, whether through global platforms or specialized niche gatherings. Elevate your business's IT strategy and stay ahead of the curve with the insights shared in this comprehensive article.
Читать
Technologies
2024-04-10
PlaysDev
Complete Guide To IoT technology: Internet Of Things in simple words
We describe the Internet of Things market using examples of popular IoT devices. What is IoT technology and what does it have to do with things: we tell you where the concept of the Internet of Things came from and how it is used now.
Читать
Industries
2024-03-20
PlaysDev
Mobile development trends in 2024: market overview and popular technologies
Spending on mobile apps has been growing steadily over the past 5 years, according to a report by Statista, while the number of new mobile users is also increasing. The main trends of 2024 were blockchain technology, multi-platform development, the use of biometric data, iBeacon.
Читать
Expertise
2023-10-20
PlaysDev
10 Practicable Resources for Android Development
10 Practicable Resources for Android Development. Learn about such useful platforms as Developer Guide, Android Weekly, Udacity, Medium and others.
Читать
Expertise
2023-11-17
PlaysDev
Tips for a Successful IT Interview
In this article, we will look at how to prepare and successfully pass an interview in an IT company
Читать