Difference between revisions of "Adding a test case"
(FdsfyVhPn) |
(reduce maximum suggested test runtime to 1 minute) |
||
(7 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
− | + | =Adding a Test Case to a Thorn= | |
+ | |||
+ | This page describes how to add a test case to a Cactus thorn. It will | ||
+ | guide you step-by-step through the ideas behind test cases to some of | ||
+ | the finer details that make a test case easy to use. | ||
+ | |||
+ | ==A test case is...== | ||
+ | |||
+ | Cactus test cases are primarily ''regression tests'', i.e. they are | ||
+ | supposed to catch unintended changes in the behaviour of thorns. Test | ||
+ | cases should be small, so that one can quickly run many test cases, | ||
+ | and see whether anything changed unexpectedly. Test cases are not | ||
+ | supposed to perform a convergence test, or test any other kind of | ||
+ | physics correctness. It is important to test these as well, but a | ||
+ | regression test case is not really a good way to do so. | ||
+ | |||
+ | In Cactus, test cases are parameter files that are run by Cactus, and | ||
+ | which are then compared to the expected output. To set up a test | ||
+ | case, you have to (a) write a parameter file, and (b) run it to create | ||
+ | this "expected output". These are stored in the "test" directory of a | ||
+ | thorn. The output directory must have the same name as the parameter | ||
+ | file (without the ".par" suffix). | ||
+ | |||
+ | Cactus compares the new and the expected output by comparing real | ||
+ | numbers in ASCII files, allowing a certain "fuzz", i.e. a difference | ||
+ | of about 1.0e-12. | ||
+ | |||
+ | ==How to design a test case== | ||
+ | |||
+ | Test cases should be | ||
+ | * simple | ||
+ | * small | ||
+ | * quick | ||
+ | * use as few thorns as possible (simple!) | ||
+ | * output only a few variables (small!) | ||
+ | * output only norms and 1D ASCII quantities (small!) | ||
+ | * finish in under one minutes (quick, there are 300 tests to run!) | ||
+ | |||
+ | Preferably, each major feature in a thorn should be covered by a test | ||
+ | case. If necessary, one has to introduce helper thorns that use the | ||
+ | tested features. | ||
+ | |||
+ | By default, test cases are run on two processes. Obviously, the test | ||
+ | case should not output anything that would change between different | ||
+ | runs, such as e.g. the current date. It does also not make sense to | ||
+ | output the parameter file itself. | ||
+ | |||
+ | They should also not output quantities that are significantly | ||
+ | influenced by floating-point round-off error. For example, the "sum" | ||
+ | reduction operator on N grid points has an absolute error that is N | ||
+ | times the floating-point round-off, which is unsuitably large. The | ||
+ | "average" reduction operator divides this by N and thus has a smaller | ||
+ | absolute error. | ||
+ | |||
+ | In general, norms are quite insensitive to changes that occur at a | ||
+ | single grid point, and can thus miss changes in behaviour. Similarly, | ||
+ | 1D ASCII output will miss changes that occur off-axis. In general, a | ||
+ | combination of norms and 1D output is best. | ||
+ | |||
+ | Checkpoint files, binary I/O, reading initial data from files, web | ||
+ | servers etc. cannot easily be tested, and should be avoided in test | ||
+ | cases if possible. They are certainly an advanced topic that we will | ||
+ | skip here. | ||
+ | |||
+ | It is a common mistake (read: a mistake that I made) to output too | ||
+ | many variables in test cases. This makes test cases too large, and | ||
+ | then inconvenient to handle. If there are more than a dozen test | ||
+ | cases per thorn, the size of the expected output can significantly | ||
+ | increase download speed and disk space required for this thorn. These | ||
+ | additional files will probably not help in detecting whether the | ||
+ | thorn's behaviour has changed -- they only may make it easier to track | ||
+ | down the reason, and this is not what a test case is for. | ||
+ | |||
+ | ==How to create a test case== | ||
+ | |||
+ | It is probably best to start from a working parameter file that uses a | ||
+ | particular feature. Then one removes all those features and thorns | ||
+ | that are not necessary. To reduce the size further, one reduces the | ||
+ | number of time steps, chooses a much smaller domain and a much coarse | ||
+ | grid. It may make sense to introduce symmetries or periodic | ||
+ | boundaries. | ||
+ | |||
+ | Note that a test case does not need to provide physically interesting | ||
+ | output; it is only supposed to check whether the code still provides | ||
+ | the same output. Thus, as long as a coarser grid still executes the | ||
+ | same routines, it is fine. Of course, the grid must not be so coarse | ||
+ | that it generates nans. | ||
+ | |||
+ | At this stage, it may also make sense to avoid mesh refinement or | ||
+ | multi-block systems, or to reduce the order of accuracy. | ||
+ | |||
+ | In the end, the parameter file should run on a single core, should not | ||
+ | need more than a few hundred MByte of memory, and should finish in | ||
+ | under two minutes. Typically, it suffices to run for a few time steps | ||
+ | on a 10^3 domain. | ||
+ | |||
+ | At this stage, the physics output of the parameter file has probably | ||
+ | changed a lot. This is fine; all it needs to do is to check that the | ||
+ | feature that is to be tested will return the same result in the future | ||
+ | as it does now. | ||
+ | |||
+ | ==How to finish a test case== | ||
+ | |||
+ | After these steps, you should make a few more modifiations to the | ||
+ | parameter file. | ||
+ | |||
+ | Output directory: Choose the I/O options | ||
+ | IO::out_dir = $parfile | ||
+ | IO::out_fileinfo = "axis labels" | ||
+ | IO::parfile_write = "no" | ||
+ | to ensure that the output directory has the right name, and that no | ||
+ | additional information is written into the output files. | ||
+ | |||
+ | TODO: add compact Carpet output format and omission of ghost zones here. See ML_BSSN_Test for how to do this so that the tests can then run on all numbers of processes. | ||
+ | |||
+ | Output quantities: Certain reduction operations (such as "sum") should | ||
+ | be avoided, since they change too much due to floating-point | ||
+ | round-off. The following reductions (and similar ones) are good; | ||
+ | others should be avoided: | ||
+ | * count minimum maximum average norm1 norm2 norm_inf | ||
+ | Do not use | ||
+ | * product sum sum_abs sum_squared sum_abs_squared | ||
+ | |||
+ | Screen output (info output) is ignored in test cases. You should | ||
+ | leave a bit of info output enabled, since this helps debugging, but it | ||
+ | won't be checked when the test case is run. | ||
+ | |||
+ | You will also want to beautify the parameter file at this point, | ||
+ | adding comments, and maybe adding your own name for vanity at the top. | ||
+ | You should also explain what the test case is testing. | ||
+ | |||
+ | Then you run the parameter file on two processes. After a short while | ||
+ | (under two minutes!), you should see an output directory. Have a look | ||
+ | into the output directory and check: | ||
+ | * Is there a ".par" file? (There should not.) | ||
+ | * Are there binary files? (There should not.) | ||
+ | * Are there large files? (There should not.) | ||
+ | * Are there many files? (There should not.) | ||
+ | |||
+ | Then move the parameter file as well as this output directory into the | ||
+ | "test" directory of the thorn you are testing. | ||
+ | |||
+ | Next, run this test case, following the Cactus instructions for doing | ||
+ | so. The test case should pass. | ||
+ | |||
+ | TODO: advise that the test should also be tested on 1 process, and higher numbers, and should still pass. | ||
+ | |||
+ | Don't forget to commit the test case, or to attach it to a patch you | ||
+ | are proposing. | ||
+ | |||
+ | ==How to track down why a test case fails== | ||
+ | |||
+ | (INCOMPLETE) | ||
+ | |||
+ | * Check out old code versions, find version that succeeds | ||
+ | * Output additional quantities, since the test case probably doesn't output enough data for this | ||
+ | * When doing so, keep several versions of the code/executable around, since you will be re-running the test cases many times | ||
+ | |||
+ | ==Making the test output independent of the number of processes== | ||
+ | |||
+ | You may find that output data depends on the number of processes being run on. Setting the following parameters may make the output the same on 1 and 2 processes, which are the numbers of processes used in the automated tests. | ||
+ | |||
+ | CarpetIOASCII::compact_format = yes | ||
+ | CarpetIOASCII::output_ghost_points = no |
Latest revision as of 01:21, 2 August 2020
Contents
Adding a Test Case to a Thorn
This page describes how to add a test case to a Cactus thorn. It will guide you step-by-step through the ideas behind test cases to some of the finer details that make a test case easy to use.
A test case is...
Cactus test cases are primarily regression tests, i.e. they are supposed to catch unintended changes in the behaviour of thorns. Test cases should be small, so that one can quickly run many test cases, and see whether anything changed unexpectedly. Test cases are not supposed to perform a convergence test, or test any other kind of physics correctness. It is important to test these as well, but a regression test case is not really a good way to do so.
In Cactus, test cases are parameter files that are run by Cactus, and which are then compared to the expected output. To set up a test case, you have to (a) write a parameter file, and (b) run it to create this "expected output". These are stored in the "test" directory of a thorn. The output directory must have the same name as the parameter file (without the ".par" suffix).
Cactus compares the new and the expected output by comparing real numbers in ASCII files, allowing a certain "fuzz", i.e. a difference of about 1.0e-12.
How to design a test case
Test cases should be
- simple
- small
- quick
- use as few thorns as possible (simple!)
- output only a few variables (small!)
- output only norms and 1D ASCII quantities (small!)
- finish in under one minutes (quick, there are 300 tests to run!)
Preferably, each major feature in a thorn should be covered by a test case. If necessary, one has to introduce helper thorns that use the tested features.
By default, test cases are run on two processes. Obviously, the test case should not output anything that would change between different runs, such as e.g. the current date. It does also not make sense to output the parameter file itself.
They should also not output quantities that are significantly influenced by floating-point round-off error. For example, the "sum" reduction operator on N grid points has an absolute error that is N times the floating-point round-off, which is unsuitably large. The "average" reduction operator divides this by N and thus has a smaller absolute error.
In general, norms are quite insensitive to changes that occur at a single grid point, and can thus miss changes in behaviour. Similarly, 1D ASCII output will miss changes that occur off-axis. In general, a combination of norms and 1D output is best.
Checkpoint files, binary I/O, reading initial data from files, web servers etc. cannot easily be tested, and should be avoided in test cases if possible. They are certainly an advanced topic that we will skip here.
It is a common mistake (read: a mistake that I made) to output too many variables in test cases. This makes test cases too large, and then inconvenient to handle. If there are more than a dozen test cases per thorn, the size of the expected output can significantly increase download speed and disk space required for this thorn. These additional files will probably not help in detecting whether the thorn's behaviour has changed -- they only may make it easier to track down the reason, and this is not what a test case is for.
How to create a test case
It is probably best to start from a working parameter file that uses a particular feature. Then one removes all those features and thorns that are not necessary. To reduce the size further, one reduces the number of time steps, chooses a much smaller domain and a much coarse grid. It may make sense to introduce symmetries or periodic boundaries.
Note that a test case does not need to provide physically interesting output; it is only supposed to check whether the code still provides the same output. Thus, as long as a coarser grid still executes the same routines, it is fine. Of course, the grid must not be so coarse that it generates nans.
At this stage, it may also make sense to avoid mesh refinement or multi-block systems, or to reduce the order of accuracy.
In the end, the parameter file should run on a single core, should not need more than a few hundred MByte of memory, and should finish in under two minutes. Typically, it suffices to run for a few time steps on a 10^3 domain.
At this stage, the physics output of the parameter file has probably changed a lot. This is fine; all it needs to do is to check that the feature that is to be tested will return the same result in the future as it does now.
How to finish a test case
After these steps, you should make a few more modifiations to the parameter file.
Output directory: Choose the I/O options
IO::out_dir = $parfile IO::out_fileinfo = "axis labels" IO::parfile_write = "no"
to ensure that the output directory has the right name, and that no additional information is written into the output files.
TODO: add compact Carpet output format and omission of ghost zones here. See ML_BSSN_Test for how to do this so that the tests can then run on all numbers of processes.
Output quantities: Certain reduction operations (such as "sum") should be avoided, since they change too much due to floating-point round-off. The following reductions (and similar ones) are good; others should be avoided:
- count minimum maximum average norm1 norm2 norm_inf
Do not use
- product sum sum_abs sum_squared sum_abs_squared
Screen output (info output) is ignored in test cases. You should leave a bit of info output enabled, since this helps debugging, but it won't be checked when the test case is run.
You will also want to beautify the parameter file at this point, adding comments, and maybe adding your own name for vanity at the top. You should also explain what the test case is testing.
Then you run the parameter file on two processes. After a short while (under two minutes!), you should see an output directory. Have a look into the output directory and check:
- Is there a ".par" file? (There should not.)
- Are there binary files? (There should not.)
- Are there large files? (There should not.)
- Are there many files? (There should not.)
Then move the parameter file as well as this output directory into the "test" directory of the thorn you are testing.
Next, run this test case, following the Cactus instructions for doing so. The test case should pass.
TODO: advise that the test should also be tested on 1 process, and higher numbers, and should still pass.
Don't forget to commit the test case, or to attach it to a patch you are proposing.
How to track down why a test case fails
(INCOMPLETE)
- Check out old code versions, find version that succeeds
- Output additional quantities, since the test case probably doesn't output enough data for this
- When doing so, keep several versions of the code/executable around, since you will be re-running the test cases many times
Making the test output independent of the number of processes
You may find that output data depends on the number of processes being run on. Setting the following parameters may make the output the same on 1 and 2 processes, which are the numbers of processes used in the automated tests.
CarpetIOASCII::compact_format = yes CarpetIOASCII::output_ghost_points = no