16. Purpose and scope of testing¶

The test plan presented herein describes tasks related to testing the CDOC2 capsule server (hereinafter also simply ‘server’). The test plan does not cover the client-side components of CDOC2; it is solely focused on the server side of the system.

Within the scope of the test plan presented here, server testing serves two main purposes:

Functional testing, to verify that the implemented functionality of the server is in line with what is prescribed in the documentation and the server can fulfil the established goals.
Load testing, to establish the operation and behaviour of the server under different usage loads and patterns.

16.1 Functional testing¶

The purpose of functional testing is to verify that the implemented functionality of the server is in line with what is prescribed in the documentation and the server can fulfil the established goals.

Functional testing uses unit tests and automated API tests written by the developers that cover all scenarios described in server use cases and potential API error situations.

16.2 Load testing¶

Although no explicit requirements have been set out for server capability, it is important to be aware of the capacity limits of the developed software on a specific existing platform. This information is provided by load tests which seek answers to the following questions:

What is the system’s response time to users under different system loads?
How many users can the system serve simultaneously without exceeding the defined response time?
How will the system perform under extreme load conditions (e.g. very large number of simultaneous users or key shares in the database)?
How will the system recover after peak load?
How will the system perform under conditions of long-term moderate or higher load?

17. Test procedure and results¶

Test planning is a continuous process, regularly repeated throughout the project after the addition of significant new information. The result of this process is the test plan (the present document).

In the case of the capsule server, functional testing is covered by unit tests and API tests run against the server API.

Unit tests are created by the developers and can be used by the development team, as well as in a continuous integration environment. The output of the tests is a human- and machine-readable test report. The source code for the unit tests is maintained along with the server’s source code, following the same principles as the application source code.

Just as the unit tests, the capsule server API tests are designed to be implementable as automatic tests to ensure the simplicity of running the tests and facilitate their repeatability. A test report on the unit tests run will be generated by the tool used.

Tools for the development of API tests are chosen to ensure the simple maintenance of the developed tests, and the maintainability of their source code along with and following the same principles as the application source code.

Tests used for establishing server capability are also implemented as automatic tests, utilizing a suitable testing tool. The best tool for this purpose will be determined during test development, considering the following requirements:

Test development should not be unreasonably complex for the developer.
Adjusting the desired load should not require software development skills.
The resulting test report should be easily available and understandable.

Note that the scope of load test development only includes the capability of stressing the system and the load tests do not include solutions for monitoring the system during load testing. Tools created for system monitoring must be used for this purpose.

The result of load testing is a test report containing information on the questions covered in the corresponding section of the test plan. The report provides information on the loads applied to the server, as well as the usage of system resources (memory usage, processor load, storage usage) required for servicing queries.

18. Designed test scenarios¶

Test ideas and scenarios are normally not explicitly covered in a test plan. However, as the capsule server is an application with limited functionality, there would be little need for a separate document for test scenario management. The present section also describes the scenarios designed to be used in load testing.

18.1 Capsule server API functionality tests¶

Server functionality is tested by emulating the end-user client application utilizing the server API interfaces. These tests are run for both ECC and RSA keys. Positive scenarios:

Sender successfully transmits a capsule to the server ([ECC|RSA]-PUT_CAPSULE-POS-01-ONCE)
Sender has already transmitted a capsule and is retransmitting the capsule to the server ([ECC|RSA]-PUT_CAPSULE-POS-02-REPEATEDLY)
Sender transmits a random byte array not exceeding the defined length to the server as RSA key material (RSA-PUT-CAPSULE-POS-03-RANDOM_CONTENT)
Recipients successfully requests a capsule ([ECC|RSA]-GET_CAPSULE-POS-01-CORRECT_REQUEST)
Successful transmission of a capsule in a multi-arm system: the capsule is received by one arm and issued by another arm of the system.

Negative scenarios:

Sender transmits an RSA capsule containing overlength key material (RSA-PUT_CAPSULE-NEG-01-CAPSULE_TOO_BIG)
Recipient requests a capsule with a random transaction ID (GET_CAPSULE-NEG-02-RANDOM_UUID_TRANSACTION_ID)
Recipient requests a capsule with an underlength transaction ID (GET_CAPSULE-NEG-03-TOO_SHORT_TRANSACTION_ID)
Recipient requests a capsule with an empty transaction ID (GET_CAPSULE-NEG-04-EMPTY_STRING_TRANSACTION_ID)
Recipient requests a capsule with an overlength transaction ID (GET_CAPSULE-NEG-05-TOO_LONG_RANDOM_STRING_TRANSACTION_ID)
Recipient requests a capsule with a valid transaction ID but the recipient’s public key does not match the ID ([ECC|RSA]-GET_CAPSULE-NEG-06-PUBLIC_KEY_NOT_MATCHING)

18.2 Server load tests¶

To receive information about the server’s behaviour under stress, the server must be overloaded with queries designed to be as close as possible to the behavioural patterns of real-life users.

Depending on the design and functionality of the capsule server, queries made to the server can be divided into two main groups: transmission of capsules to the server, and capsule requests via user authentication. As two different levels of authentication are used on the server, it can be said to essentially comprise two independent web servers with different configurations, sharing a common database for storing key shares.

Putting a load on the server requires the use of queries for transmitting key shares to the server and requesting key shares from the server.

In the case of capsule transmission queries, the queries must be functionally successful and result in capsules being saved to the server database.
In the case of capsule request queries, the capsule to be used and the user to be authenticated are selected randomly and the reply returned by the server may be either positive (i.e. contain a capsule) or negative (i.e. contain an error code).

The contents of the query are irrelevant for load testing, as the same internal queries and comparisons are required for both positive and negative results.

18.2.1 Load generation¶

Both the functional tests and the load tests utilize the Gatling test framework where the desired load on the tested software can be adjusted using the following parameters:

start-users-per-second: Number of active users (queries per second) immediately applied at the start of the test
increment-users-per-second: Number of active users added per each following test cycle
increment-cycles: Number of test cycles used
cycle-duration-seconds: Duration of a single test cycle (in seconds)

The duration of a load test depends on the number of test cycles and cycle duration. Number of queries being made to the tested software depends on the initial number of users and the number of users added per each following test cycle. For example, in order to generate a steadily increasing load, the initial number of users can be adjusted to be relatively small, and a larger number of users added per each test cycle.

Using the settings below, the duration of the test will be 600 seconds (10 minutes), the initial query rate at the start of the test is 10 queries per second, and the query rate in the last test cycle is 110 queries per second:

start-users-per-second = 10
increment-users-per-second = 10
increment-cycles = 10
cycle-duration-seconds = 60

To generate a steady load, the number of users applied at the start of the test must be adjusted to the desired query rate and the number of users added per test cycle kept minimal.

Using the settings below, the duration of the test will be 600 seconds (10 minutes), the initial query rate at the start of the test is 75 queries per second, and the query rate in the last test cycle is 85 queries per second:

start-users-per-second = 10
increment-users-per-second = 1
increment-cycles = 10
cycle-duration-seconds = 60

18.3 Implementation of scenarios in load tests¶

What is the system’s response time to users under different system loads?
How many users can the system serve simultaneously without exceeding the defined response time?

Since system operability is directly dependent on the system’s operational environment, answering these questions requires repeatedly running load tests using a variety of loads. To start off, gradually increasing loads can be used to determine potential capacity limits.

How will the system perform under extreme load conditions (e.g. very large number of simultaneous users or key shares in the database)?

The goal of this test is to take the system to or above maximum load and monitor the system’s performance under such conditions. Probing for extreme loads should be carried out using a steady load over a long period of time.

How will the system recover after peak load?

The goal of this test is to gather information on the ability to recover from peak load. Multiple simultaneous load generators can be used here, one performing a longer, steady-load test and the other a shorter, increasing-load test.

How will the system perform under conditions of long-term moderate or higher load?

The system is run for a long period of time under a steady moderate load to detect anomalies or errors that could occur in the long-term operation of the system (small memory leaks etc.). A test involving a steady load of a large number of longer test cycles can be used to seek an answer to this question.