My development Cookbook : Mastering the performance of my apps

17 min readSep 29, 2021

Mastering the behavior of an application in production requires to be able to understand how an application will react under an increasing pressure due to a higher workload from external systems, a higher number of end user’s solicitations. To achieve this goal, it is crucial to define how a system should perform to be able to measure the potential deviation and to master the End-to-end vision of a solution to ensure that the bottlenecks are accurately detected (Network, Middleware, Solution…)

A performance assessment campaign: What it is, and what it isn’t

To limit the breadth of the campaign, a scope is established defined what is comprised and what it out of the scope. Of course, this could be seen dogmatic, the sole purpose being to limit the perimeter of the subject.

What it is

A performance assessment campaign is a process that will help developer, administrator to assess the behavior of a specific solution in terms of responsiveness and stability under a workload.

It is also a mean to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.

What it isn’t (here)

Though you may want to address all the different aspects, it is highly recommended to assess the performance ‘within the border of your datacenter’. As such, E2E test campaign to assess the network latency is not considered in this document. Nor the E2E test integrating several external components will be considered. External components will be highly performing ‘black boxes’ that would not affect the overall performance of the system under assessment. A simple way to reach this, is to rely on mock-ups (a.k.a. functional stubs)

It all started with some definitions…

The following definitions will be used in this document

Test typologies

Different tests can be run during a performance assessment campaign depending on the objectives pursued.

Dry-run test

Unitary validation test to ensure that, on a platform considered optimal (First level of tuning) the scripts and scenarios once variabilized are functionally consistent and that the measurement means are operational. This test shall be done with the ‘most-atomic’ approach:

Testing a single user in a UI
Testing a single Mobility connection
Testing the interface of a Machine2Machine communication

What’s in it for me?

While making this test, you can define the best time expected on the platform (A single user, single app, using the application) Making this first injection will allow you to have the first ‘optimal’ abacus of the application, allowing you to measure the deviation of the response time while injecting load. One will easily understand that

This test should be done, first, with an extreme care as it is the foundation of all the outcomes that will be made during the campaign. Hint:
If you are not totally mastering the numbers you measure at this stage, or if you think that optimization can already be done on the platform…. Fix, amend, and replay the test.
Magic rarely happens. Meaning, the time measured during dry run will not be better under load even on supercharged servers. Also, it is considered at this stage that, all the response times measured are above the objectives expected by the business sponsors of the application

Load Tests

This test is played to determine and optimize the behavior of the platform under a nominal load during the hour of peak

What’s in it for me?

This is the test that should reveal if the application has met its performance objectives. Remark: The Non Functional Requirements that the application should meet, must be formalized as a load to support during a peak hour.

This is the core of a performance campaign.

Stress Tests

Associated to a load test this test will determine the behavior of the platform under an abnormal (but realistic) load (X% on top of the 1hour peak).

What’s in it for me?

This is the vision on how the hosting could performed in (reasonable) exceptional conditions in a nominal mode (platform fully operational). It helps to know how and to what extent the servers will be able to absorb a very high peak during a short moment. Though it is not the “BAU” (Business As Usual) that the team has committed to reach, it will be helpful to know that you can support an overhead of load, for example after a platform outage, when the backlog of load to absorb has abnormally increased.

Endurance Test

Endurance testing consists in playing the load test for a very long term duration (Hours). The major interest of this test if to help the detection of the memory leaks and ensure that the platform is performing well at long range.

What’s in it for me?

This is the test you will love to play once load and stress tests are done, and it is time to go back home. Processing this test overnight will help you to discover the behavior of the platform after hours (of peak load). Especially, you will ensure: “ that there is no leak on the platform (for example a small amount of memory continuously growing over time) “ that all recycling process are correctly handled (such as automatic logout, garbage collection of objects, etc.)

Destructive tests

The destructive test consists in injecting an always increasing workload to determine the point of rupture of the platform/application.

What’s in it for me?

Low gain vs the effort to put in the settlement. Though every teenager wants to break the system, usually, servers are made to … serve. They don’t break if they are correctly setup, which is the basic assumption before running the test campaign. You will probably waste time to try to implement a huge load (that will probably generate a congestion on hard disks) and an incredible data pool if you want to be truly representative, that will not help to understand more the behavior of the platform. It will not be covered in this document

Remark: Iterative “Stress” Tests at different % of growth would be far more valuable to explicit the behavior of the platform under high load.

Robustness tests

This kind of test consists in assessing the behavior of the platform under specific conditions such as a platform partially available (To assess the High Availability capability by testing the fail over mechanism) It will not be covered in this document

What’s in it for me?

This is not a test to assess the behavior of a platform under load. It will not be covered in this document.

Abacus

Formalizing abacus are a major outcome of a performance campaign. It is a powerful tool to understand how your system will evolve under load. Of course, most of the time the load will not follow a mathematical formula, it is more a matter of impression based on the curve you can draw that matters. To draw a curve, you must perform several injections under the same server configuration at different loads. Once the curve can be drawn, you can iterate with a new server configuration to determine if, under the same load, the application is performing better or not. Of course, a minimal number of injections (3) is required to draw a significant curve ( The figure below show dozen of points to ease readability but reality is more on 3 to 5 injections) Abscise: The load you inject (for example the number of users) Ordinate: The system responsiveness (For example the page opening time)

Based on experience, different envelopes can be generated

The system seems to have a linear deviation. This is good news; it means that the system is performing in a ‘secure zone’ and probably capable to ingest additional load with a low risk of unmetered/unpredicted decrease of the performance.(In fact, you are probably in the below situation but load is not impacting enough the overall responsiveness of the system to be visible)

The system seems to have a quadratic (or worse… exponential) deviation. This happens when you have reached the inflexion point of the system, and where load can have a strong impact on the responsiveness of the system. It is important here, to position of the load corresponding to the nominal mode (Load under the 1 hour peak) against this inflexion point

The system seems to demonstrate an erratic behavior Well, not much can be conclude from any injection except… than the system seems to be uncontrollable. By experience, 2 options can raised:

You are focusing on a too small scale, so that the deviations you measure are indeed contained in a frame with min/max values. In that case, it is important to measure if the high value is within the limit of the NFR of the application.

System is totally erratic. In that case, you must optimize each subsystem before replaying the injection from the very beginning (Dry run to make a reference…)

Real over Representative

The quest pursued in a performance campaign is to have outcomes that can explain/mimic a real situation (and/or even predict a near future).

Does it mean that the tests to setup shall be the reflection of the reality? No, being representative of the BAU (Business As Usual) is already a big step. To be representative you must determine scripts, containing steps that you will arrange in business scenario. These scenarios, when validated by the business sponsors will be the “reality” that you will inspect.

Burden: An injection is performed at the sole intention of assessing the performance of an application. But as a service, an application can run on production along with other applications. A burden shall be generated to reproduce this state.

External Factors/Application mock-up: A performance test is run relying on application mock-ups to avoid external side effects. Nevertheless, a closer look at external factors must be taken upon completion of the test campaign to provide a global vision. Especially the impact of a network latency must be taken into consideration for worldwide communication

Script

A script consists in a single/atomic use case. It involves a single type of users. It is organized as a succession of steps.

Step

A step is a unitary action in a script.

Scenario

A (business or functional) scenario is an assembly of different scripts, each of them having a specific percentage of representation.

Injection

An injection corresponds to the assessment of a given scenario, under a predefined load, to test a specific objectives/parameters/configuration of the application

An example,

Script #1: supervisor — Dashboard follow-up

Description: As an supervisor, I want to be able to connect to the platform, and follow-up the dashboard during 4 hours.

Remark: To simplify the model, only the ‘happy path’ of a script shall be considered. Exception management shall be addressed as a specific script with a very low percentage of representation in a scenario (For example: The end user has no right to use the application)

Steps:

Opens the homepage as an anonymous user.
Opens the login form, and fill credentials (Role: Supervisor)
Upon success, Open the dashboard page
Wait 4 hours
Log out

Scenario #1: BAU at the usage composed of

Script #1: Representativeness: 40%
Script #2: Representativeness: 30% (not described in this document)
Script #3: Representativeness: 10% (not described in this document)
Script #4: Representativeness: 20% (not described in this document)

Cold start vs. Hot start

Cold start: The platform is totally reset between 2 injections. All data are reset; all server are recycled.

Hot start: The platform is not reset between 2 injections. The data gathered from previous injection are considered as noise increasing the representativeness of the process.

Phases of an Injection

Ramp-up: Users are progressively accessing the platform. Steady: All users have accessed the platform at least once. This is the phase of the observation Ramp-down (optional): Users are progressively removed from the platform.

Virtual users & real users

During an injection, virtual users are used to process a script. They are called “virtual” as they will take the identity of different users and a different set of data each time they play an instance of a script. The virtual users will play their scripts in parallel but with a small drift in time to generate randomness.

Think times and sleep times

When injecting a scenario two notions can influence the page rate

Think time: Is the time that is waited between two steps of a script. It corresponds to the time that the user thinks before clicking on a screen.

Sleep time: During an injection, each script is processed several times for a single ‘virtual’ user. The time waited between 2 runs of a script for a given virtual user is called “Sleep time”

Remark: Reducing think time and sleep time will have an impact on the injection as ‘more pressure’ will be applied on the servers (if performing a step requires CPU computation while think times and sleep times don’t and that the scripts are played in an infinite loop during a scenario)

Cache management

Caching is a mechanism for the temporary storage or caching web elements, such as HTML code, statics resources such as images, css, jscript files, to reduce bandwidth usage, server load, and perceived lag.

Cache can be enabled at several different levels:

On the browser’s client
On dedicated network appliance
On dedicated middleware solution, such as an ‘Edge server”
On dedicated stores or the WAS server
On specific stores from a custom code
…

The injection done during test campaign will be done under the production configuration and therefore, all caching mechanisms on servers, networks will probably be enabled.

If static resources are part of the KPI they should be downloaded, otherwise injection tool shall allow to avoid such download if possible.

Data/User pools

The scenario parameterization based on data / user pools will ensure to have a wide range of data to use.

Variability

At the core of an injection is the “variabilization” It is one key of a successful script recoding. It consists in detecting in a transaction the parameter that should change for each iteration.

As an example, the login/password is a variable. Its value, taken from the user pool shall be replaced in the corresponding placeholder in a transaction.

Correlation

Mastering correlations is the second key of script recording. It consists in mastering the dataflow between the request and response of the different transaction.

A simple example of correlation.

User session SessionID is something that is generated by a server after the login (and as such not a variable directly taken from a data pool). But each time a new request is sent, it must be propagated. So, each time, it is not a reference to a variable, it shall be the value contain in the previous response that is resend to the next request.

Principles

Some golden rules can be established that should be applied:

1. Keeping the control of the injection: Injecting load is not only a contest to put pressure on servers. It is an E2E Process that should be controlled at every stage (Data, pressure put on the machines) to ensure the relevance of the observations.

2. Having good scripts will make great scenarios: Mastering the scenario is the cornerstone of a successful campaign.

You must master at least 3 dimensions

The correspondence between the pages seen/interactions done and the transactions observed (Especially if you have AJAX asynchronous loads)
The ‘variabilization’ of a page.

1: Some content, might be static (a javascript, an image …),

2: Some content will be the content of the page itself,

3: Some will be the technical elements hidden for the end user (jSessionID…).

2 shall lead to the definition of a variable linked to the data pool of the injection

3 shall lead to the definition of a variable linked to the technical variable generated by the platform.

The correlation of variables among transactions. Different transaction will share some variables. Consistency of the injection will rely in the correct correspondence/correlation of these parameters.

3. Monitor the system under test, relying on dedicated tools: Tools that perform the load injections often enable a monitoring of the system under load. An even better option is to rely (to the most extent) on the standard monitoring tools available in the production environments to measure these elements. This will allow to be able to detect how the E2E system will perform in the real.

4. Can a machine provide the “End user perception”? The best/easiest option today is to get the end user perception from an end user. Meaning, during the injection, a real user can perform the scenario to have the perception of the response times on the loaded platform.

5. Prepare, Inject, Assess, Update: Any injection shall be performed with a clear objective (Making a reference, validating an assumption, assessing the impacts of a parameters), that is formalized before the injection, the system is assessed just after the injection (and prior any others) to evaluate whether the objectives is reached or not, and as a conclusion, new recommendations shall be done to enhance the platform that will be tested in the next injection.

6. One step at a time: A single parameter shall be modified between 2 injections to be able to validate its impact

7. You have clearly defined what is a “performing application”. It shall rely on a definition established before the injection and agreed with the business sponsor. It shall not be something based on impression, wishes. Especially, you can only commit to what is at your hand (for example external tiers, network latency is out of your direct control).

Assessing the performance within a project

Defining the objectives

Formalize the functional performance requirements
Identify Performance Acceptance Criteria
Develop Test strategy with business sponsors (Data pools, Scripts and Scenarios, Think Time, Sleep Time)

Elaborating the scenarios

Record the scripts
Variabilize, correlate the scripts
Once scripts are validated, assemble the scenario scenarios
Define the representative dataset
Implement tools: Data set generation, Application Mockups
Validate the scenarios: Perform the dry run test
Establish the Reference response time

Governing the test process

Test shall be followed carefully using the following minimum set of metadata:

Test #: xx from 01 to NN, for example

DTxx: for Dry Test
LTxx: for Load Test
STxx: for Stress Test
ETxx: for Endurance Test

Application Version
Scenario Name Precise the set of Scenarios executed in this test, for any reason it can be a subset of all the scenario defined.
Start Date
Start Time
End Time
Test results: For example

— How was the stability of the test (Bad / Good)

— High level comment on test results
— If the test is interrupted/aborted describe the reason
— Describe enhancement performed if some has been done
— Describe any issue encountered

Test confidence (Low / Medium / High)

Determine for each test typologies executed which test must be taken as reference: the reference test will be described in full details in the test report.

Performing a dry-run test

Validate the scenarios: Perform the dry run test
Establish the Reference response time
At this stage, if the reference time doesn’t allow to meet the Performance Acceptance Criteria, application shall be reworked or criteria shall be revised with business sponsor.

Performing a load test iteration

Inject a scenario
Record the metrics
Analyze the outcomes
Formalize conclusions
Make recommendations
Reset the environment, if required
Apply recommendation, if required
Iterate a new load test, if required
Remark: Domains of recommendations for performance improvement
Network improvement
Caching mechanisms
Application Code changes
Other Configuration tuning

Performing a stress test iteration

Once load tests have been finalised, stress tests shall be run to work in comparison

Inject the same scenario than load tests
Record the metrics
Analyze the outcomes
Formalize conclusions
Make recommendations
Reset the environment, if required
Apply recommendation, if required
Iterate a new stress test, if required. Usually a single Endurance test is required if load and stress tests have been successfully run

Performing an endurance test

Once stress tests have been finalized, endurance test shall be run

Inject the same scenario than load tests
Record the metrics
Analyze the outcomes
Formalize conclusions
Make recommendations Usually a single Endurance test is required if load and stress tests have been successfully run

Performing a destructive test

Out of scope of this document (See Test typologies).

Performing a robustness test

Out of scope of this document (See Test typologies).

Finalizing the campaign

A final validation test based on the load test can be run to validate the different recommendations. This can be used as the reference test for the report
Formalize the performance assessment campaign report

KPI to monitor

Different metrics can be collected to monitor a system under load. The minimal set is

Application Level

Average Response times
Max and min response times
Standard deviation
Throughputs (Page rate)
Specific metrics depending on the application

MiddleWare

Threads (HTTP Server)
JVM Heap Analysis (Websphere application servers)
Database connections pools
SOAP connectors state
Active thread count (Was session)
Garbage collector analysis
DB health KPI

HardWare

Processes
CPU (User, System, Wait)
RAM
Disk space (disks I/O rates, transfers, read/write ratios and disk cache usage)
NetWork Data throughput

Performance assessment inan application during the Prototype

Prototype aims at demonstrating different functional / technical concepts in a short timeframe. Shaping the industrialization of the platform, and especially putting in place the first elements to adopt a Performance First! strategy is a stake.

Obviously assessing exhaustively the performance requires a campaign involving matured code to be assessed… that is not applicable to this phase of the project.

Nevertheless, in order to integration the performance dimensions within the coding factory, an assessment of specific URL (being user URL and / or REST API) in order to establish the first times to benchmark the evolution of the different endpoints during coding phases.

Tools

Different tools can be used to assess Performance

jMeter
SOAPUI/LoadUI
Gatling
Custom implementation

For the prototype, only a custom implementation will be assessed to test a static URL and gather the response time as well as the HTTP Return Code

Best and ‘hmm hmm’ practices

Best Practices

Hot start or cold start? To be comparable, each injection shall occur on a platform in a similar context. It is a good idea to perform all injections under a cold start approach, but it is a very complex process to set-up (Restore of backups on several platforms…). Hot Start, is most of the time far easier, and keep the overall integrity of the test. That said, to reset the user session on server, it can be a good solution to restart them instead of waiting for the session to be flushed (Especially if session are kept several hours)

Where is the smoking gun? Each time a bad behavior is encountered, a systematic process shall be done. Response time shall be split among the different components involved. For the most impacted components, a “Application > MiddleWare > Hardware >Network” analysis shall be run to link the findings to a specific technical bottleneck.

Antipatterns or ‘hmm hmm’ practices

Hurray I have a bunch of numbers; I will be able to process them after, while making the report… Hmm hmm, no this is never happening. You just get numbers that you won’t be able to process. Ever.

I am sure that modifying these 10 parameters will optimize the response time Hmm hmm, maybe but, let’s do it one step at a time. The best way to ensure that no conclusion can be drawn from an injection is to modify several parameters at a time.

My development Cookbook : Mastering the performance of my apps

A performance assessment campaign: What it is, and what it isn’t

What it is

What it isn’t (here)

It all started with some definitions…

Test typologies

Dry-run test

Load Tests

Stress Tests

Endurance Test

Destructive tests

Robustness tests

Abacus

Real over Representative

Script

Step

Scenario

Injection

Cold start vs. Hot start

Phases of an Injection

Virtual users & real users

Think times and sleep times

Cache management

Data/User pools

Variability

Correlation

Principles

Assessing the performance within a project

Defining the objectives

Elaborating the scenarios

Governing the test process

Performing a dry-run test

Performing a load test iteration

Performing a stress test iteration

Performing an endurance test

Performing a destructive test

Performing a robustness test

Finalizing the campaign

KPI to monitor

Performance assessment inan application during the Prototype

Tools

Best and ‘hmm hmm’ practices

Best Practices

Antipatterns or ‘hmm hmm’ practices

RACI

Written by Sylvain Wilbert