Version control

From Einstein Toolkit Documentation
Revision as of 06:47, 19 August 2013 by Hinder (talk | contribs) (Arguments against having lots of thorns in a single repository)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

Version Control for the Einstein Toolkit

The Current System

Most thorns stored in SVN, some in Git. No other VCS is used for any component of the ET.

Reasons to Change

Won’t argue for Git vs SVN here; search the internet for why one or the other is preferred. In the end its about taste and what it is used for (nothing is the best for everything). When the proposal to switch to Git for some components was raised on the ET telecon, nobody objected, and many people voiced support. Further, there was no objection on the mailing list. We conclude that of the active developers who care about the issue, most prefer Git and support a transition.

Considerations

  • For the majority of the toolkit, it has always been possible to check out individual thorns; this is because for much of the toolkit, there is a 1:1 correspondence between thorns and repositories. This decision was made to allow people to check out just what they needed
  • The Cactus framework is designed to be modular, with the unit of modularity being the thorn. Thorns can be assembled and combined in a thornlist from which an executable is built.
  • Thorns are grouped in arrangements for convenience, but Cactus actually does not use the concept of arrangement for any purpose besides organising the directory structure.
  • Some thorns can reasonably be treated as independent of the other thorns in their arrangement, but others are intimately linked. One example is Carpet and CarpetLib. Currently, one is not usable without the other. Note, however, that the decision to split them into separate thorns (a high-level Cactus driver and a low-level mesh-refinement library) implies that CarpetLib could be used as a library independent of Carpet by some other code. This aside, developers often treat sets of thorns in an arrangement as being part of a more cohesive whole, and want to think of specific versions of the arrangement rather than of individual thorns. Combining multiple thorns into a single repository means it is very much harder to mix different versions of each thorn. Thorns should be in separate repositories if it is possible that people will want to check them out individually, or if they are sufficiently logically-separated from other thorns that you may want to use a different version of one thorn.

Options

Assuming that we wish to transition parts of the toolkit to Git, we have the following options:

  • Require that all components of the toolkit are hosted in Git repositories
  • Allow a mixture of Git and SVN repositories
  • For all thorns hosted by the ET, use Git. For any thorns from external sources not hosted by the ET, provide ET-managed Git-SVN mirrors.

Managing the ET becomes much easier if everything is in one repository system (Git provides mechanisms to treat collections of repositories together; but then Subversion has that as well), so (2) adds some work, both on the management side as well as on the user side. On the other hand, we cannot and should not enforce Git for all repositories, so we will always have a mix.

Repository rearrangements

  • Require each thorn to be an independent repository
  • Require each arrangement to be an independent repository
  • Decide whether a given arrangement should be split into subrepositories on a case-by-case basis

The third option is the most pragmatic.

It is strongly desirable to be able to manage the entire toolkit (or more generally, an entire Cactus tree) as a single entity. One can then talk about a specific version of the whole code, can globally revert to a previous version, see history of all thorns together, update the tree, etc... Also note that here one of the deficiencies of Git shows: it is not possible to checkout only a sub-tree of a Git repository, while this is not a problem at all for Subversion. However, in either case, with thorns spread across the globe, controlled by different groups, this is hard to impossible to achieve.

GetComponents and Heterogeneous Repositories

There is no (?) established method for treating multiple subcomponents as a single repository if those subcomponents use different VCSs. We have developed GetComponents for the purpose of checking out such a heterogeneous collection, and support has been added to it for performing some basic operations on the whole repository, but this is a “home-grown” solution, and we would need to develop and support it ourselves.

Possible arrangement/thorn repository split

The following split is based on logical relationships between the thorns, and ignores the issue of the potential convenience of having a number of thorns in a single repository in the absence of infrastructure for working with sets of repositories.


Arrangement repositories

  • CactusBase (nearly all of these thorns are used nearly all of the time)
  • CactusPUGH:
    • These thorns are probably quite tightly-coupled
  • CactusTest:
    • Mostly provides test cases for the flesh. Use a single repository. [IMHO could also be merged into a single thorn CactusBase/CactusTest]
  • PITTNullCode:
    • A logical grouping which makes up a larger code; these thorns are probably always used together, and might well be updated together. The spherical harmonic thorns might be usable without the rest; move out?
  • EinsteinExact:
    • Logically independent thorns, but all automatically generated from sources in a single repository. Probably makes sense to keep this as a single repository.
  • KrancNumericalTools:
    • Contains a single thorn (which may go away in future, to be replaced by code copied into the generated thorns). Part of a Git repository already.
  • McLachlan:
    • A set of thorns, logically understood as a single code, or code family, many of which share generation-scripts. Treat as a single repository.
  • Carpet:
    • Thorns are components of a larger code. Arguably some of these thorns (LoopControl, CycleClock) should not be in the Carpet arrangement, but somewhere else. Probably makes sense to keep the current repository.
  • CarpetExtra: (see above)


One repository per thorn

  • CactusNumerical:
    • The arrangement does not form a “larger code” for which it would make sense to treat the thorns as a unit. The thorns do not really depend on each other, and there is no intrinsic value to treating CactusNumerical as a single repository.
  • CactusPUGHIO:
    • We only actually use 2 of them in the ET thornlist and they seem to be independent of each other.
  • CactusUtils:
    • All essentially independent from each other
  • CactusWave:
    • Independent thorns; usually you only want one
  • EinsteinAnalysis:
    • Independent thorns. Note that these are not even all stored at the same host currently.
  • EinsteinBase:
    • The situation here is more fuzzy; you often want a lot of these, and there are dependencies, but you may well want a vacuum-only tree with no EOS, Tmunu or Hydro thorns.
  • EinsteinEOS:
    • These seem to be alternative options, not linked to each other
  • EinsteinEvolve:
    • These are not part of a larger code, and you might want any one of them without having the rest.
  • EinsteinInitialData:
    • Individual independent thorns.
  • EinsteinUtils
  • ExternalLibraries:
    • Very much individual thorns (but the discussion of what to do with these libraries is beyond the scope of this document)
  • TAT, AEIThorns, LSUThorns:
    • Groupings of thorns hosted by institutions; no real logical relationship between them, and we probably don’t include all the thorns in these repository groupings in the ET anyway. The thorns in each arrangement are not related to each other.

Miscellaneous repositories

  • Cactus flesh
  • SimFactory

Undecided:

  • CactusConnect
  • CactusElliptic
  • CactusIO

Cactus flesh, CactusBase, CactusUtils, CactusPUGH etc could conceivably all be in a single Cactus repository, with commit rights being controlled by subdirectory, but CactusPUGH doesn't quite fit in and might be separate. All Cactus thorns have the same license, so no problem should stem from this.

Arguments for having lots of thorns in a single repository

If your tools can only “see” a single repository at a time, it is hard to see all the changes that exist in your current working tree, so you don’t know what might need to be committed or discarded. Having many thorns per repository means a smaller number of repositories to check manually. Counter-argument: having to check more than a single repository is something which should be handled by a tool anyway, whether this is based on a native Svn/Git solution (submodules or subtrees) or functionality added to GetComponents. Checkout time is reduced due to requiring a smaller number of connections to the server Counter-argument: With parallel checkouts, this might not be an important consideration.

A argument for having multiple thorns per repository is that this way one can ensure consistency between different thorns, eg to react to interface changes. For example scheduling and/or new aliased functions fall in this category.

The main objection raised on the ET call to having one thorn per repository in general was that this leads to a large number of repositories, and this overwhelms the user interface. SourceTree shows submodules (if we go that route) in a hierarchical structure, and lists separately those with uncommitted changes, but SorceTree is Mac (and Windows) only, so also not a solution for everyone. “git status” lists the submodules with uncommitted changes, like "svn status" as well. If people don't want to use submodules or subtrees, we would need to develop tools to manage multiple repositories together. Such tools are needed anyway unless you have a single repository for the whole toolkit, which isn't going to happen.

Arguments against having lots of thorns in a single repository

Once a certain sized is reached, and assuming that the code actually is developed, then the number of commits to the repository increases and it becomes very difficult for a user/developer to check out a "stable" copy of the repository. At any given revision one thorn will be undergoing rapid changes and with at single repository it is not possible to hold on to an older version of that thorn (which eg might offer all the functionality one needs) while updating another thorn with a crucial bugfix/new feature (note: this is not true with Subversion. This problem really stems from the limitation of git that it cannot handle different subdirectories of a checkout living at another revision, nor being able to checkout only part of a repository). This happened to some degree in Zelmani when a new thorn was added for postprocessing production results and thus a new revision was created however this also updated the evolution thorn to their development state which is often unstable. Larger repositories have this problem to an even larger degree (ie the code is constantly changing "under ones feet") which makes it hard to isolate causes of bugs in (presumably fairly isolated) sub-components.

Philosophically the ET seems to be a loose collection of independently developed modules. Combining unrelated thorns in shared repositories negates this to some degree. Note that in this light, having AEIThorns be a single repo would be ok, since they are all from one institution (and eg Zelmani works this way mostly without problems).

Finally it seems as if there are ways out there to combine individual sources into a super-repository with git (submodules and subcomponents which can be used even with svn), but it is much harder to split up a repository since on essentially has to keep around several checkouts for the different revisions one wants to access (eg I keep a git-svn copy of EinsteinEvolve/GRHydro for the single reason of preparing the GRHydro updates from Zelmani in it before submitting them to the svn server).

Organise repositories by project

Another option is to have a single repository for the Einstein thorns, one for Cactus, and one for each of the additional independent projects. The list of repositories would then be:

  • Cactus
  • Einstein
  • Kranc
  • EinsteinExact
  • McLachlan
  • Carpet
  • Something about external libraries, to be decided
  • SimFactory

One concern which has been raised with this approach is that it makes it harder to mix and match different versions of thorns, e.g. within the Einstein repository. The argument against this is that you should not need to do this, as the current trunk should be fairly stable (aggressive development can happen on branches, if people are making drastic changes). A further advantage is that if people are developing one particular thorn, they are more likely to also be working with the latest versions of all the other thorns.