Difference between revisions of "Fixing examples"

From Einstein Toolkit Documentation
Jump to: navigation, search
(Project to fix the Einstein Toolkit example parameter files)
 
(8 intermediate revisions by one other user not shown)
Line 9: Line 9:
 
I have submitted all arrangements/*/*/par/*.par files on 12 cores on datura, to get a first estimate of what happens.
 
I have submitted all arrangements/*/*/par/*.par files on 12 cores on datura, to get a first estimate of what happens.
  
* There are 228 such files (in my current trunk checkout)
+
=== 14-Aug-2015 ===
* 146 of them (64%) ran and output the line "Done."
 
* 27 (12%) of them took more than a few minutes to run (maybe because they are swapping due to needing more nodes)
 
* 88 (34%) failed due to needing thorns which were not available.
 
  
So 228 - 88 - 27 = -33 probably failed with some sort of fatal errorSomething is wrong there.  Need to check.
+
The following tests take a very long time:
 +
 
 +
* jpeg_amr
 +
* jpeg
 +
* http -> is supposed to take long. It starts an (empty) simulation and pauses it for user interaction at the web interface)
 +
 
 +
The current status of the examples is listed at [http://damiana2.aei.mpg.de/~ianhin/ETExamples/examples.xml Einstein Toolkit Examples Status].  This is regenerated manually at the moment.
 +
 
 +
Key:
 +
 
 +
{| class="wikitable" style="text-align:left;"
 +
! Status indicator
 +
! Description
 +
! No. of examples
 +
|-
 +
| Done
 +
| Run ended with "Done."  (TODO: eventually check that these are actually "good" somehow)
 +
| 81
 +
|-
 +
| ExternalThorns
 +
| Failed due to depending on non-ET thorns
 +
| 88
 +
|-
 +
| Assertion
 +
| Assertion failure
 +
| 5
 +
|-
 +
| CCTK_Abort
 +
| Warning level 0
 +
| 43
 +
|-
 +
| FailedParamCheck
 +
| Failed Cactus parameter check
 +
| 3
 +
|-
 +
| Signal11
 +
| Segmentation fault
 +
| 4
 +
|-
 +
| Signal9
 +
| SIGKILL; probably out of memory, need to run on more nodes?
 +
| 1
 +
|-
 +
| Unknown
 +
| Not categorised, but also includes runs which ran for a long time and were terminated manually for now
 +
| 1
 +
|}
 +
 
 +
Proposals:
 +
* Make sure that the examples all run in a resonable amount of time, e.g. less than one hour.  Most of them are much quicker than thisFor the long-running ones, see if it actually is necessary to have them take so long.  Ideally they should run in a few seconds, unless there is a good reason.
 +
* It would be good to have a list of the examples which can be run with just the ET on the web page, automatically generated, so that people can use these as the master listWe could also add a comment to the parfiles indicating whether they can be run using just the the ET.
 +
* Fix the errors

Latest revision as of 05:40, 14 August 2015

Project to fix the Einstein Toolkit example parameter files

See #641.

13-Aug-2015

(Ian)

I have submitted all arrangements/*/*/par/*.par files on 12 cores on datura, to get a first estimate of what happens.

14-Aug-2015

The following tests take a very long time:

  • jpeg_amr
  • jpeg
  • http -> is supposed to take long. It starts an (empty) simulation and pauses it for user interaction at the web interface)

The current status of the examples is listed at Einstein Toolkit Examples Status. This is regenerated manually at the moment.

Key:

Status indicator Description No. of examples
Done Run ended with "Done." (TODO: eventually check that these are actually "good" somehow) 81
ExternalThorns Failed due to depending on non-ET thorns 88
Assertion Assertion failure 5
CCTK_Abort Warning level 0 43
FailedParamCheck Failed Cactus parameter check 3
Signal11 Segmentation fault 4
Signal9 SIGKILL; probably out of memory, need to run on more nodes? 1
Unknown Not categorised, but also includes runs which ran for a long time and were terminated manually for now 1

Proposals:

  • Make sure that the examples all run in a resonable amount of time, e.g. less than one hour. Most of them are much quicker than this. For the long-running ones, see if it actually is necessary to have them take so long. Ideally they should run in a few seconds, unless there is a good reason.
  • It would be good to have a list of the examples which can be run with just the ET on the web page, automatically generated, so that people can use these as the master list. We could also add a comment to the parfiles indicating whether they can be run using just the the ET.
  • Fix the errors