# Adding requirements to the Cactus scheduler

## Working Group

• Samuel Cupp
• Steven R. Brandt
• Erik Schnetter

If you want to contribute, please add your name to the list above and email Steven R. Brandt at sbrandt@cct.lsu.edu.

## Problem Outline

One of the currently most complex aspects of programming with Cactus is writing schedule.ccl files for new routines, in particular if mesh refinement is used. The basic problem is that it is very difficult to ensure that routines are executed in the correct order, i.e. that all grid variables which are required for a routine are actually calculated beforehand. It is also difficult to ensure that boundary conditions (and synchronisation and symmetry boundaries) are applied when needed, in particular after regridding.

The Cactus schedule consists of several independent "parts": There are schedule bins defined by the flesh, there are schedule groups defined by infrastructure thorns (e.g. MoL or HydroBase), and there is the recursive Berger-Oliger algorithm traversing the bins implemented in Carpet. It is for the end-user difficult to see which groups are executed when and on what refinement level, and in which order this occurs.

The Cactus schedule offers "before" and "after" clauses to ensure a partial ordering between routines. Unfortunately, this ordering applies only to routines within the same schedule group and the same schedule bin and refinement level. It is not possible to ensure a particular order between routines in different schedule groups or schedule bins, and it is very complex to ensure that a routine is executed e.g. after another routine has been executed on all refinement levels.

There is one example setup that illustrates this problem. When setting up initial conditions for a hydrodynamics evolution, one may e.g. want to first set up a neutron star, then calculate its maximum density, and then set up the atmosphere to a value depending on this maximum density. Making this possible in Cactus required introducing a new schedule bin "postpostinitial" to the flesh, and requires careful arrangement of schedule groups defined by ADMBase and HydroBase. Even now that this is possible, it is probably not possible to ensure at run time that these actions occur in a correct order. (Note: Get a more precise definition of this use case so we can implement and test the solution)

## Suggested Solution

To resolve this issue, and to generally simplify the way in which schedule.ccl files are designed and written, the following was suggested:

• Each scheduled routine declares which grid variables it reads and which grid variables it writes
• Since most routines write only parts of grid variables, the routine would also specify which part it reads/writes, e.g. the interior, outer boundary, symmetry boundary, etc.
• In addition, a new method for implementing boundary routines should be identified so that physical boundaries are populated automatically at the same time synchronization is carried out. Note that this might involve applying a boundary condition repeatedly in some cases. (Note: Someone please elaborate the multi-patch use case here)
• This allows the Cactus scheduler in a first step to validate the schedule and detect cases where a required variable has not been defined, or where a variable is calculated multiple times or synchronized multiple times
• In a second step this will also allow the Cactus scheduler to completely derive the schedule from these declarations. This may even make it possible to execute routines in parallel if they are independent. Even SYNC statements can be automatically derived, and schedule groups would not be necessary any more.

One particular issue arises with routines which modify a variable, e.g. imposing the constraint that $\tilde A^i_i=0$. These routines read and write the same variable, and it is thus not immediately clear why they should be executed or in which order they should be executed.

One possibility to resolve this would be to add a tag to variables, declaring that this routine "reads Aij:original" and writes "Aij:constraints-enforced". Each other routine accessing this variables would then also need to declare whether it reads or writes the original Aij or the Aij with constraints enforced. This is also the problem of this mechanism: it would create unwanted thorn-dependencies.

Another possibility would be to use the existing BEFORE/AFTER mechanism for those cases where it is generally not possible to define proper data dependencies through variables alone. This would make it very easy to add a function which modifies e.g. Aij easily at any place, without making other thorns depend on the presence of this inserted routine. ICH: As a result, we think the BEFORE/AFTER mechanism will be necessary, and it is not worth developing the tagging/aliasing mechanism at this point.

Another issue arises with loops in the schedule. This is currently mostly used by MoL for the sub-timesteps. There is currently no good idea for handling this. Logically those loops can be seen as a nested schedule tree: it should be possible to do the same as for the complete tree.

## Notes based on a Telecon dated 2/21/2013

The solution we think we've converged on is to put reads and writes declarations in the schedule tree, and then generate function-specific macros, e.g.

void funwave_dispersion(CCTK_ARGUMENTS)
{
DECLARE_CCTK_ARGUMENTS_funwave_dispersion;
...
}


This solution appears to be the easiest way to get the benefits of compile-time checking (e.g. const and intent(in) declarations, etc.), and keeping the code and schedule in sync with each other. The schedule.ccl should also be updated to support multiple "at" or "in" declarations for a single function.

We also agreed, however, that we need a dynamic mechanism for special kinds of functions (I/O routines, for example).

With regard to fine-tuning of where functions get read or written, we want to only worry about interior or boundary/ghost zones at the moment. Something simple. There is the possibility of calling a user function on only a subsection of the grid. This is a subject of ongoing discussion.

We want to plan a video workshop on this topic for mid-march (2 hours) with some warmups. If we can get things working properly, this sort of workshop can occur more regularly.

## Current State

Current effort at LSU: https://github.com/stevenrbrandt/PresyncWave.git

This effort does only a part of the task, automating the synchronization, application of boundary conditions, and ensuring that the read and write directives are present and correct. Currently, LSU is working on getting the system to function for the static_tov.par example.

The flesh that allows adding REQUIRES and PROVIDES clauses to the schedule block for every routine. They are stored in the schedule database of the flesh and are ignored by default. There is a suggestion to rename these clauses to READS and WRITES; this has not yet been done.

The component list of the project can be found at the following URL, which includes the branch of the flesh, examples and other project files:

https://svn.cactuscode.org/projects/NewSchedule/NewSchedule/NewSchedule.th


Carpet has a file Requirements.cc that detects the presence of these clauses and performs rudimentary checks. These checks are probably useless in their current form.

UPDATE: The CCL parser currently parses reads/writes directives of the form:

IMPLEMENTATION_NAME::GF_NAME([Interior|Boundary|Everywhere])


Directives which omit the domain (i.e. the value ([Interior|Boundary|Everywhere]) are assumed to mean "Everywhere."

Steven Brandt has developed a scheduler modification which uses reads and writes clauses to generate a partial order within a group. If there are multiple writes to the same variable or a dependency cycle exists it is a fatal error, e.g.

 schedule foo1 in MoL_CalcRHS
{
LANG: C
WRITES: example::a
}
schedule foo2 in MoL_CalcRHS
{
LANG: C
WRITES: example::b
}


This situation can be disambiguated with a version string, e.g.

 schedule foo1 in MoL_CalcRHS
{
LANG: C
WRITES: example::a$v2 READS: example::b } schedule foo2 in MoL_CalcRHS { LANG: C WRITES: example::b READS: example::a }  Although the names example::a$v2 and example::a refer to the same grid function, the scheduler does not see an ordering relationship between them. Thus, in the above example, foo2 executes first because foo1 depends on it as a result of example::b.

Ordering of schedule items is done per group, as before, however, groups inherit reads/writes clauses from their sub-items.

There are clear deficiencies in this scheme. It does not address the problems related to AMR outlined in the statement of the problem. The solution does, however, allow the reads and writes clauses generated by Kranc to have an effect. Is this a step in the right direction?

## Next Steps

To bring this project further, we need to define what the "reads" and "writes" clauses should look like. As mentioned above, it is insufficient to list only grid variables there, since most routines access only parts of grid variables. Ian Hinder volunteered to come up with an initial plan for what kinds of "parts" there should be (e.g. interior, outer boundary, symmetry boundary, ghost zone, etc.). Those parts are driver-dependent, which means we have to come up with a way to tell the flesh about those parts and their connections (what is part of what).

A simple example would look like the following (syntax arbitrary):

INTERIOR ∈ DOMAIN
BOUNDARY ∈ DOMAIN
INTERIOR ∩ BOUNDARY = ∅
INTERIOR ∪ BOUNDARY = DOMAIN


## Defining parts of grid functions

Application thorns typically write to either the interior of the grid (for example, those points which can be updated using finite differencing) or to the physical outer boundary (for applying user-supplied boundary conditions). Other types of points are those on symmetry boundaries, interprocessor boundaries and mesh refinement boundaries, which an application thorn should never need to write to. Symmetry thorns would write to symmetry boundaries, and the driver would write to interprocessor and mesh refinement boundaries.

Consider a single local grid component. It is a cuboidal set of points. According to Cactus, each of the 6 faces of the component is either an interprocessor boundary (including refinement boundaries) or a symmetry boundary, or a physical boundary. Each face can be only one of these. Each face has a boundary width. Points on edges and corners are associated with multiple faces, and are considered physical boundary points if they are not part of a symmetry or interprocessor boundary. Hence, physical boundary points are only those which absolutely have to be updated, as they are not updated by any other mechanism.

A typical application thorn only needs to be concerned with interior and physical boundary points. We can divide the points in a component into the following non-overlapping (by our definition) categories:

• Interior;
• PhysicalBoundary;
• SymmetryBoundary;
• InterprocessorBoundary;
• RefinementBoundary;
• Ghost "sources".

We also have:

• Everywhere

to include every point on the grid. We might want to say that functions can fill regions "partially" and only when all such functions have been called do we check.

Most scheduled application functions need to read their variables from everywhere on the grid, and some write variables to everywhere on the grid. We can use READS and WRITES lines in a schedule block to specify the variables and locations that each scheduled function reads from and writes to. Each line would be a space-separated (we should think of a mechanism to allow new-lines) list of variables or groups (qualified with an implementation name if outside the current implementation). To specify which part of the grid was being read or written, we could have "part" keywords in curly brackets after the grid function or group name. If omitted, the default would be Everywhere. (FrankL: Shouldn't we make the default for reading everywhere, but for writing only the interior? This is what most thorns do. IanH: I agree that most thorns do this, but we have to weigh that against the confusion of having two different defaults.)

## Examples

For example,


SCHEDULE TwoPunctures AT Initial
{
LANG: C
} "Create puncture black hole initial data"

{
LANG: C
WRITES: ML_log_confac ML_metric ML_trace_curv ML_curv ML_shift

{
LANG: C
WRITES: ML_Gamma{Interior}

schedule ML_BSSN_RHS1 in MoL_CalcRHS
{
LANG: C
WRITES: ML_log_confac_rhs{Interior} ML_metric_rhs{Interior} ML_trace_curv_rhs{Interior} ML_curv_rhs{Interior} ML_Gamma_rhs{Interior} ADMBase::dtlapse{Interior} ML_shift_rhs{Interior}
} "ML_BSSN_RHS1"

{
LANG: C
WRITES: ML_log_confac_rhs{PhysicalBoundary} ML_metric_rhs{PhysicalBoundary} ML_trace_curv_rhs{PhysicalBoundary} ML_curv_rhs{PhysicalBoundary} ML_Gamma_rhs{PhysicalBoundary} ADMBase::dtlapse{PhysicalBoundary} ML_shift_rhs{PhysicalBoundary}
} "ML_BSSN_RHS1"

schedule ML_BSSN_enforce in MoL_PostStep
{
LANG: C
WRITES: ML_curv
} "ML_BSSN_enforce"

schedule psis_calc_4th AT Analysis
{
LANG: C
WRITES: Psi4r{Interior} Psi4i{Interior}
} "psis_calc_4th"

schedule Multipole_Calc AT Analysis
{
LANG: C
} "psis_calc_4th"



It might be useful to modify the syntax to say that variables are all read and all written from and to the same parts of the grid, as that will be the usual case.

## Interaction with MoL

MoL is the time integrator that takes grid functions on the previous time level as input and produces new values for the grid functions on the current time level as output. It requires routines that calculate the RHS and/or apply boundary conditions to the evolved grid functions.

Integrating MoL with the mechanism provided above faces several difficulties:

• The set of evolved grid functions is not defined in the schedule.ccl; it is instead defined via function calls at run time. One approach would be to define call-back functions that MoL has to provide, so that the scheduler can access this information. ICH: We will need dynamic registration of READS and WRITES for thorns like Dissipation and Noise anyway. We can have a registration function which thorns call to say what variables it reads and writes. It can determine this at roughly parameter check time (or MoL registration time). Thought: the symmetry boundary conditions will also need to declare READS and WRITES dynamically.
• It is a priori not clear whether MoL evolves only the interior or also the boundary of grid functions. This can even be different for different grid functions. We can probably safely assume that MoL does not evolve ghost zones or symmetry zones (although this is technically also not defined). ICH: MoL does not know about CoordBase, CartGrid3D, or even the nature of the 1D array of CCTK_REAL variables that it is integrating. I don't want to have to couple MoL to these details. One idea would be that MoL detects which parts of the RHS variables have been written (via posion) and only evolves those points. This means that MoL doesn't need to be passed a mask function to decide what to evolve, and doesn't need to know anything about the nature of the points. MoL would register that it WRITES only those parts of the evolved variable that the RHS function claims to write. And in fact it would do this, by using poison. So the overall MoL READS and WRITES statements are inferred (by MoL) from those of the scheduled RHS functions.
• MoL integrates in time in a WHILE loop implemented in the scheduler. The WHILE condition depends on the particular time integrator that is chosen.

To simplify things, I suggest that we leave MoL unmodified and treat it as black box. MoL needs to specify (e.g. via callback functions) what variables are integrated in time, and which region of these variables is integrated. The input to MoL is then the past time level of these variables, and the output of MoL is the current time level of these variables. ICH: and this information can be derived from what the RHS functions say they do, and enforced using poison.

Further, there is one special bin (or group) very similar to the existing MoL_RHS. In this bin, initially the current time level of these variables is defined (MoL needs to ensure this). At the end of this bin, the RHS grid functions need to be defined (MoL requires this). This is equivalent to a WRITES and READS statement.

Since it is now known which regions of which variables MoL accesses (reads/writes), the scheduler can do the remainder and can schedule all other required routines, such as e.g. boundary conditions. For example, if MoL provides ("writes") in the beginning of the RHS bin the interior of the state vector, and there is a routine which reads the whole domain of the state vector and writes the interior of the RHS, then the scheduler can easily deduce that the corresponding boundary condition routine must be called.

### Example

MoL provides a call-back function that specifies the READS and WRITES declarations for MoL altogether and for MoL_RHS: (ICH: why a callback? why can MoL not just register READS and WRITES with the flesh or whatever dependency system?)

• MoL WRITES ML_BSSN{Interior, current-timelevel}
• MoL_RHS WRITES ML_BSSN{Interior}

The declarations for MoL_RHS are understood as describing what is present in the beginning and what is required at the end of this bin.

Of course, the programmer could also decide that certain evolved variables are integrated all over the domain, not just the interior.

The application would then provide (at least) the following routines:

We can easily extend this example to include conversion to ADMBase if e.g. another RHS routine requires them.

Synchronisation and symmetry boundaries would also be applied automatically. (There is a slight complication regarding whether "Boundary" includes ghost zones or not – grid points on the edge or in the corder of grid functions can be both an outer boundary and a ghost zone, and one needs to be clear whether these are included or not. However, this is a detail that can be solved later.)

### Simple Test Case

Since current schedules, even for WaveToy, are already very complex, we have a test code with a very simple schedule. This is implemented in the WaveToySimple thorn (https://svn.cactuscode.org/projects/NewSchedule/WaveToySimple/trunk/). To get the test code working, checkout Cactus using this thornlist: https://svn.cactuscode.org/projects/NewSchedule/NewSchedule/NewSchedule.th. Simple parameter files are provided in arrangements/NewSchedule/WaveToySimple/par.

The requirements part of the schedule looks as follows:

• WaveToy_InitialData
PROVIDES: scalarevolve scalarevolve_p
• WaveToy_Evolution
REQUIRES: scalarevolve_p scalarevolve_p_p[Interior]
PROVIDES: scalarevolve[Interior]
• WaveToy_Boundaries
PROVIDES: scalarevolve[PhysicalBoundary]
• WaveToy_Analysis
REQUIRES: scalarevolve
PROVIDES: scalaranalysis

There are some issues encountered with this schedule:

• Curly brackets do not work for specifying parts of the grid as they confuse the parser. Square brackets were used instead.
• It's not clear how to refer to past time levels. The _p syntax was used, but that isn't accepted by the schedule checker.
• WaveToy_Analysis requires scalarevolve, but the schedule checker does not recognize it as being provided (because the provides are in a separate schedule bin?).
• READS and WRITES seem more appropriate than REQUIRES and PROVIDES.

## Notes/Issues

• How do we deal with thorns whose dependencies are not known at compile-time, such as Dissipation? This thorn reads and writes variables named by a parameter. We could add a special case in the schedule.ccl to say "READS: <some syntax to mean all variables listed in this parameter>"
• Investigate workflow systems, e.g. as DAGMan, GWES, Triana, or ProC.
• If it is known at CST-time which variables are only read and not written by a scheduled function, the CST could generate const pointers for those variables to ensure that the data is not written to.