Difference between revisions of "GitSuperRepo"

From Einstein Toolkit Documentation
Jump to: navigation, search
(Documents)
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
(draft)
+
[[Category:GitSuperRepo]]
  
Background:
+
The current method for checking out and updating components of the Einstein Toolkit has several shortcomings.  We propose an additional layer of management around a Cactus tree achieved by storing it in a Git repository.  This provides a uniform version control interface to the code with all the version control information from the source repositories available.  It also allows the entire Cactus tree to be treated as a single repository, and allows "versions" of the code to be identified by a single Git revision.
  
* Einstein toolkit built from many different components living in their own repositories
+
This project is still in the early stages.  We (Barry Wardell and Ian Hinder) are developing a set of techniques and tools which enable what we consider to be better workflows for using Cactus.  As we use these ourselves, we will learn about what works and what doesn't, and hopefully will be able to provide recommended workflows for end-users.
  
* End user must check out each component and compile them together into an executable which is then run to produce output
+
==Documents==
  
* End user is often also a developer of some of the components (public or private)
+
* [[GitSuperRepoRationale|Rationale]]
 +
* [[GitSuperRepoUsersGuide|User's guide]]
 +
* [[GitSuperRepoAdminGuide|Server administrator's guide]]
 +
* [[GitSuperRepoUnsorted|Unsorted information]]
  
* GetComponents (URL) is a tool to simplify this process by collecting component repository information into a single "CRL" file (CRL = Component Retrieval Language).
+
==Progress==
  
* GetComponents allows you to check out the latest versions from a CRL file, or to update an existing set of checkouts to the latest version
+
* We have set up a repository which implements most of what is described in the [[GitSuperRepoRationale|Rationale]].
  
Problems:
+
* Repository is updated from the source repositories every hour
  
* Upstream projects use different version control systems (SVN, Git, Mercurial, ...) leading to a nonuniform experience for the end user/developer.  Multiple tools must be learned for merging/branching/committing etc.
+
==Planned work==
  
* It is not easy to see at a glance exactly what version of the code is in use.  One could iterate over all the different repositories, of different types, and print the revision information, and any local differences.  This could be added to GetComponents, but this has not been done yet and we argue that this is not the best solution to the problem.
+
* Figure out some good scripts or git aliases for making working with the submodules more straightforward.
  
* Knowing what version of the code has been used to produce a given scientific result is essential for the scientific process, where results must be repeatable.  The current best solution to this problem is the Formaline thorn which stores a complete copy of the source code of all thorns in the simulation output directory.  We argue that this is only a partial solution to the problem.  While all the source code is present, the version control metadata has been entirely stripped.  When comparing different simulations, at best one obtains a large diff of all the source changes between them, without information about why they were made or who made them.  There is also no method for conveniently using the formaline output for a new simulation.
+
* Set up a public "einsteintoolkit" repository
  
* Updating a Cactus source tree is currently an irreversible and dangerous process.  There is no guarantee that the "current" trunk branch of all the components will function correctly, and there is no way, short of a manual backup beforehand, of reverting to the previous state if they don't.
+
* Integrate with build-and-test system so that there is always a branch which passes build-and-test
  
* It is desirable for different members of a scientific research group to be using the same version of the code for production simulations, or at least for this to be possible/easy.  In the current setup, each user is responsible for managing their own Cactus tree and will likely have completely different versions of the code, depending on when they last updated.  It is not even guaranteed that the code can be described by a single "checkout date", since different components could be checked out at different times.  Users may also have applied patches or altered behaviour, fixing bugs or adding features, to any of the components.
+
* Implement a technique for storing version control information (revision plus differences) into simulation output
 +
 
 +
* Update the group repository based on commit emails or an RSS feed rather than hourly

Latest revision as of 15:51, 27 June 2011


The current method for checking out and updating components of the Einstein Toolkit has several shortcomings. We propose an additional layer of management around a Cactus tree achieved by storing it in a Git repository. This provides a uniform version control interface to the code with all the version control information from the source repositories available. It also allows the entire Cactus tree to be treated as a single repository, and allows "versions" of the code to be identified by a single Git revision.

This project is still in the early stages. We (Barry Wardell and Ian Hinder) are developing a set of techniques and tools which enable what we consider to be better workflows for using Cactus. As we use these ourselves, we will learn about what works and what doesn't, and hopefully will be able to provide recommended workflows for end-users.

Documents

Progress

  • We have set up a repository which implements most of what is described in the Rationale.
  • Repository is updated from the source repositories every hour

Planned work

  • Figure out some good scripts or git aliases for making working with the submodules more straightforward.
  • Set up a public "einsteintoolkit" repository
  • Integrate with build-and-test system so that there is always a branch which passes build-and-test
  • Implement a technique for storing version control information (revision plus differences) into simulation output
  • Update the group repository based on commit emails or an RSS feed rather than hourly