Phil Windley's Technometria | Internet Application Performance

Internet Application Performance

One of the hardest things to get right in building an internet application designed for use by lots (hundreds or thousands) or simultaneous users is what is called "performance." The trouble is that lots of terms are thrown around and its sometimes difficult to get a handle on what they mean and what you ought to do about it. I started this page as a way-point for information on Internet application performance. I expect to add to it from time to time.

Terminology

Martin Fowler's book Patterns of Enterprise Architecture has a section in the first chapter entitled "Thinking About Performance" that defines some performance related terms that people bandy about when talking about Internet applications.

Response time - the amount of time it takes a system to process an external request.
Responsiveness - how quickly the application acknowledges the user's request.
Latency - the minimum time required to get any kind of response, even one that requires no processing time.
Throughput - the number of transactions the system can complete in a given period of time.
Load - the amount of stress a system is under. The typical measurement for this is the number of concurrent users being served.
Load sensitivity - a measure of how response time varies with load. This is also sometimes referred to as degradation.
Efficiency - performance divided by resources.
Capacity - the maximum effective throughput of a system. Usually based on a threshold beyond which performance is unacceptable.
Scalability - a measure of how adding resources (hardware, network connectivity, etc.) affects performance. Scalable systems gain additional throughput or faster response time when more resources are available.

Discussion

Performance is either response time or throughput. Unfortunately, often techniques which increase one and decrease the other. Consequently, systems architects must specify which takes priority as well as minimum thresholds for both.

Responsiveness is often more important than response time. If users think something is happening they're much more patient. Designing little progress bars that can be displayed during long operations may be less expensive and more important to user satisfaction than investments in other areas.

Responsiveness and response time are from the user's point of view. The type of equipment and the speed of the network make a big difference and these are often out of your control. Keynote and others make a living letting you test response time and responsiveness from various points on the network.

Throughput measurement can be difficult because it depends on the type of transactions considered. Throughput should be measured using a basket of common transactions to be effective. Real historical data can be a big plus here.

Response time is sensative to load, so its common to state response time in seconds for a given user load. (3s with 5 users, etc.)

I like to say that an engineer is someone who can do with a dollar what any fool could do with two. That's good to keep in mind when thinking about efficiency. Designing systems so that they scale efficiently means better returns on investment.

Performance is a measurement, not a design criterion. By that, I mean that performance is a property of a specific system on a certain configuration. Changes that can affect performance include hardware type, OS version, Web server version, driver selection, vitual machine upgrades, and so on. Even small changes on many systems can affect performance non-linearly. Thus, while performance needs to be considered in the design and implementation stage, you won't know the results until you have an artifact you can test.

Design the system to be tested for performance and for performance to be monitored continuously once its in operation. Test even small configuration changes unless you have significant experience with that particular change. Even so, you'll get burned sometimes. The only way to get performance efficiency is to tune the system and that will require building instrumentation into the code and having a plan for how to use it when the system is designed.

Scalability

Scaling your system is how you get better throughput. As I mentioned above, throughput is different than performance, although they're often confused. UPS presents a good example. If UPS needs to deliver twice as many packages in a given period (i.e. scale) they can buy more trucks and planes. That won't get packages to the customer any faster (e.g. performance). Note that if we could fly the planes twice as fast, not only would we be more performant, but we'd also scale. So performance can affect scalability, but there are other ways to achieve it.

Another way that UPS could scale is to buy bigger planes and trucks. Rather than make more trips, they'd carry more on each trip. This is called "scaling up" where as buying more of the same size vehicles is called "scaling out."

In a Web application, the question is whether you buy lots of small machines (scale out) or buy bigger boxes (scale up). Scaling up can be more expensive than scaling out because generally big iron is more expensive per cycle than small boxes. Large, fast memory, fancy, fast buses, and the latest CPUs are generally bought at a premium.

The problem is that some applications, like databases are difficult to scale out because they require a large shared memory space, have many dependent threads, and have a tightly-coupled internal architecture.

Web server are just the opposite, they use a small non-shared memory space, have many independent threads, and have a loosely-coupled external interconnect (e.g. HTTP).

Scalability is heavily affected by software architecture decisions. Scaling out requires loose coupling and process independence. Achieving those objectives isn't easy for Web applications that manage large amounts of user state. Spending time on the data architecture can be beneficial. Also, carefully consider user state and try to minimize it.