Automated testing
Cactus has a test suite mechanism designed to identify when code is committed that breaks existing functionality (regressions). This mechanism consists of a set of tests which either pass or fail. We would like to have a system for running these tests regularly and an interface where one can see if tests have started to fail where they were previously passing.
Possible starting points
- Thomas Radke implemented such a system in the past at the AEI - we should find out about this and see if it can be used now.
- There are freely available frameworks for performing code testing like this. See, for example, http://trac.buildbot.net/.
- Ian Hinder put together a simple system for running the tests on a workstation which could possibly be adapted. See http://damiana2.aei.mpg.de/~ianhin/testsuites/einsteintoolkit/. Almost everything on the page is a hyperlink to some useful information about the test which might help you to find out why it failed. There is no mechanism for scrolling the table in a convenient way, and old test data is not removed from the table. There is also some problem which means the tests did not run since a while ago.
- It is intended that SimFactory should eventually be able to run the Cactus testsuites. This would simplify the process of distributing the code to each remote machine and managing the queue jobs. There is a TRAC ticket for this: https://trac.einsteintoolkit.org/ticket/113.
Requirements
Essential:
- Run the Cactus test suites regularly (every night)
- Test on both one process and two processes, with several OpenMP settings
- Parse the output and identify which tests passed and which tests failed
- Present the results in a form which is very easy to interpret
- It should be possible to see easily which tests are newly failing
- It should be possible to see what has changed in the code since the last passing test
- Run on all the major production clusters (this means using their queue systems)
Optional:
- Allow people to implement the testing mechanism themselves on their own machines for their own private codes
Additional notes
- The Wikipedia page, http://en.wikipedia.org/wiki/Continuous_integration, has a list of available software for performing tests like this and contains what looks like useful information.
- The ET Release info page has some useful information about running tests on different systems.
- Erik has set up a nightly build and test system using NMI, but this was found to be inappropriate. It could not run tests on our production systems, and was in general not suited to our uses.
- The Cactus test system is very old and not particularly user-friendly. For example, it was only designed to run interactively and the method for running from a script is awkward. Providing an easy to script interface (in Perl) for running the tests would be a nice idea. The output could also be arranged in an easy to parse format, rather than the human-readable format that is currently used. This belongs in Cactus, not in SimFactory.
- The entire Einstein Toolkit thornlist testsuite runs in approximately two hours, so it is certainly possible to run overnight. If we are relying on the queue systems on a cluster, we might have to wait a long time to get the results. The system would need to handle this gracefully (i.e. have a test state of "didn't run yet").
- What allocation would we use for these tests? Would the computing centres be happy for us to be using SUs for this?