Michael Kopp

Production Monitoring is about ensuring the stability and health of our system, that also includes the application. A lot of times we encounter production systems that concentrate on System Monitoring, under the assumption that a stable system leads to stable and healthy applications. So let’s see what System Monitoring can tell us about our Application. Let’s take a very simple two-tier Web Application: A simple two tier web application This is a simple multi-tier eCommerce solution. Users are concerned about bad performance when they do a search. Let's see what we can find out about it if performance is not satisfactory. We start by looking at a couple of simple metrics. CPU Utilization The best known operating system metric is CPU utilization, but it is also the most misunderstood. This metric tells us how much time the CPU spent executing code in the last interv... (more)

Application Performance Monitoring in Production

Setting up Application Performance Monitoring is a big task, but like everything else it can be broken down into simple steps. You have to know what you want to achieve and subsequently where to start. So let’s start at the beginning and take a top-down approach Know What You Want The first thing to do is to be clear of what we want when monitoring the application. Let’s face it: we “do not want to” ensure CPU utilization to be below 90 percent or a network latency of under one millisecond. We are also not really interested in garbage collection activity or whether the database ... (more)

Don’t Let Load Balancers Ruin Your Holiday Business

An eCommerce site that crashes seven times during the Christmas season being down for up to five hours each time it crashes is a site that loses a lot of money and reputation. It happened to one of our customers who told this story at our annual performance conference earlier this month. Among the several reasons that led to these crashes I want to share more details on one of them that I see more often with other websites as well. Load balancers on a round-robin instead of least-busy can easily lead to app server crashes caused by heap memory exhaustion. Let's dig into some deta... (more)

Why Averages Are Inadequate, and Percentiles Are Great

Anyone who ever monitored or analyzed an application uses or has used averages. They are simple to understand and calculate. We tend to ignore just how wrong the picture is that averages paint of the world. To emphasis the point let me give you a real-world example outside of the performance space that I read recently in a newspaper. The article was explaining that the average salary in a certain region in Europe was 1900 Euro's (to be clear this would be quite good in that region!). However when looking closer they found out that the majority, namely 9 out of 10 people, only ea... (more)

A Discussion on Top Performance Problems for Hadoop and Cassandra

In the last couple of weeks my colleagues and I attended the Hadoop and Cassandra Summits in the San Francisco Bay Area. It was rewarding to talk to so many experienced Big Data technologists in such a short time frame - thanks to our partners DataStax and Hortonworks for hosting these great events. It was also great to see that performance is becoming an important topic in the community at large. We got a lot of feedback on typical Big Data performance issues and were surprised by the performance related challenges that were discussed. The practitioners here were definitely no n... (more)