[Contents] [TitleIndex] [WordIndex

GIT: A small guide to GIT and CodeReview

This information is for Scilab developers and code contributors. See GIT for a table of content.

If you are a user of Scilab, you probably don't need to look at this.

What are GIT and CodeReview

Scilab is an open-source project. Because there are contributors all over the world, Scilab code base needs to be maintained in a distributed way so that users modifying the sources can publish their changes. The purpose of GIT is to allow users to keep track of the changes of the Scilab code base.

Contrary to older source code control version like CVS or SVN, GIT works on a local copy of the source code. It has some powerful features to facilitate working locally as well as keeping track of changes that are done on the reference repository. Before delving more into the subject of GIT you might want to read a little about it here.

In order to maintain the quality of Scilab code base, users cannot push their changes to the reference repository without a proper review of their changes. This is the goal of CodeReview. Users can see incoming changes, comment on them, rate them and eventually validate them or reject them. Scilab's CodeReview is built on gerrit, which simplifies greatly its integration with GIT.

GIT Basics and Vocabulary

Local and remote repositories

GIT can be a little tricky to master because it has a very specific vocabulary. The first notion to understand in GIT is that it works locally. If you work on a source tree under git, modify files, commit changes, they are all local, you are the only one to know about all thoses changes and nobody is bothered with your modifications. There are therefore no locks on files or directories while you work on them. This helps understand the following terms:

local repository
Your repository. Changes you do here are known only by you, until you publish them on the remote repository.
remote repository
The reference repository. This is the centralized source code where published changes will appear.

The first step when working on the source code of scilab is to get a copy of the remote repository locally this is done by performing a

git clone
Command issue to copy a remote repository on your machine locally.

To clone Scilab remote repository you can follow the instructions given in GIT step by Step.

GIT represents the modifications on the source code as a continuous set of revisions. Picture a magnetic tape that records snapshots of your code base and a read/write head that can move between snapshot. The definitions given below will refer to the metaphoric view of GIT.

commit or revision

Bundle of changes done to the repository during one snapshot. The commit is identified by a commit message (human readable text that says what changed) and a unique identifier called a commit-id in the form of a SHA1. GIT will refer to a revision as any way to identify a commit, usually the commit id, but it can be other pointers such as a tag, or pointers such as HEAD.

HEAD

The position of the current snapshot being looked at. HEAD is a pointer to a particular commit. HEAD~ is a pointer to the previous commit. HEAD~5 is a pointer to the fifth commit before, HEAD@{10 minutes ago} is the commit your HEAD was 10 minutes ago.

tag
Human readable identifier for a commit, usually in the form of a version number.

Branches

Branches are derivations from the main source code. Consider a branch as a separate tape that stems from one of the snapshot. Branches are good to work on specific topic without interfering with the main series of commits. GIT is very good at managing branches. We'll see that branches are the main feature used in the CodeReview.

Local branches are branches that exist on your local repository. Remote branches are branches available on the reference repository.

Creating a branch is easy in GIT, you'll simply need to move to the commit you want your branch to stem from and create the branch there. This is done by the following method

   1 git checkout <rev> # Place the HEAD at the commit identified by <rev>
   2 git branch <branch_name> # create the branch stemming from identified by commit <rev>
   3 

The git branch command will give you the list of local branches you have in your repository.

I mentionned GIT is very good at managing branches. This means that it will move around branches without efforts. A user does not need to create several directory each holding a branch. GIT keeps a record of what changed between branches so moving from one branch to another can be done in the same directory without efforts. The following command let you move from branch to branch. Note that master is the name of the reference branch, i.e. the main series of commits.

   1 # After this command your code base will change
   2 git checkout <branch_name> # Move to the latest commit of <branch_name>
   3 
   4 # After this command you will move back to the main branch
   5 git checkout master

This means that the command git checkout is used both for moving the HEAD to a specific commit in the current branch or to move between branches. In both scenarios the HEAD reads the content of the commit it is pointed to and sets your code base at this snapshot.

Creating and sharing commits

When contributing to Scilab you will have to either create new files or modify existing files in the source tree. GIT keeps track of the files that are present on the code base. This means files will have different status depending on what was done during your development session. The following command is maybe the most important to know what is going on with GIT:

   1 git status # Tells you about your position and status of files
   2 

Modifications you perform are not recorded in a commit until you specifically tells GIT to record them. In order to do so, you need to stage the commit before writing it. To continue with our metaphor of the magnetic tape, staging is similar to buffering the changes you want your HEAD to write before actually sending the instruction to write the snapshot.

staging area
Set of modifications ready to be commited

If you modify a file, remove a file, or create a file, GIT will tell you about the modification but they will not be in the staging area. The following commands will stage the modifications:

   1 git add <filename> # places a modified or new file in the staging area
   2 git rm <filename>  # removes the file and stages that removal
   3 git mv <filename1> <filename2> # renames the file <filename1> to <filename2> and stages the modification
   4 git status # shows you the staged files and unstaged files whether they are modified, untracked, modified or renamed
   5 

If you want to see the actual modifications on file the command git diff is there to help you:

   1 git diff # Shows the diff for all files between their current status and the status at last commit
   2 git diff <filename> # Shows the diff for file <filename>
   3 git diff --staged # Shows the diff for files in the staging area
   4 

Once you are satisfied with the modifications of the code in the staging area you can create the commit with:

   1 git commit -m "Your commit message" # creates a new commit with a commit message
   2 

After the previous command your HEAD is now at your new commit, unstaged modification are still unstaged, previously staged modification have been written in the history. If you do new modifications, stage them and commit them, you will create an entirely new commit. If you want to modify the freshly created commit you can do it with:

   1 git commit --amend # writes your staged changes on top of current commit
   2 

Do not hesitate to check the history of commits using:

   1 git log -5 #shows the last 5 commits in the history of your current branch
   2 

Do not forget that all modifications you do on the code and history are local until you synchronise with the remote repository. This means you can delete commits you have created. For instance if you have created 3 commits locally and you want to rewrite them as only one commit you can do it with the following:

   1 git reset HEAD~3 #places the HEAD 3 commits before all previously commited changes are unstaged
   2 git commit -m "my new commit message" # will commit all staged file
   3 
   4 #Alternately you can use
   5 git reset --soft HEAD~3 # previously commited changes are staged
   6 git reset --hard HEAD~3 # CAREFUL: all changes are discarded you'll start with a clean state
   7 

git reset is therefore the exact opposite of git add you can use it to unstage files in the staging area

   1 git reset <filename> # sets file <filename> as unstaged
   2 

Working with the remote repository

Fetching

The operation of fetching will help GIT know what has happened in the remote directory without actually getting all the changes. git fetch will read the remote index of changes and make sure you know about it. Fetching does not modify your code, simply the index. It is useful to know for instance if there are new commits merged into the remote while you were working of your own fix.

Pulling

Contributors will publish changes to the code base, thus modifying the history of commits on the remote repository. You will want to obtain those changes. To do so use git pull. If your repository has changes that are not present in the remote, git will refuse to move the HEAD to the latest commit it has retrieved. Instead you will be warned that your repository and the remote differ and tell you the number of commits in difference.

If you have commits, use the following command:

   1 git pull --rebase # retrieves all changes from the remote AND move your commit to the last commit of remote
   2 

This command is in fact a pull and a rebase command together.

rebase
Rebasing moves a series of commit from one branch to another. It is similar to cutting the tape at the branch and reattaching it to another commit

Rebasing is one of the actions that can lead to a conflict. GIT is quite good at managing differences between file automatically. If you have a modified file in one of your local commits that has also been modified in commits on the remote, GIT will try to replay the changes as best as it can without bothering you with it. Except if modified lines occur at the same position in both your local and remote repository. GIT will warn you it cannot choose between the two modification and will not finish its current operation. You will have to solve conflicts manually before continuing.

The three operations that can lead to conflicts are the following

I will detail the operations later, what is important now is that all the operations that lead to conflict will behave the same when the conflict arise and will be solved in the same manner.

In case of a conflict, first thing to do is check the status of files. GIT will warn you it is in the middle of an operation. Files in the staging area have no conflicts, GIT managed to resolve changes without your intervention, great! Files will be marked as in conflict, and will be unstaged. You will need to manually edit them to solve the conflict.

The little scenario after highlights a conflict. I have a file called myfile.txt on the master branch. Here is the file

List of things to buy
* an apple
* an orange
* flour
* sugar

I have modified it on my local repository to look like this (and commited the changes). My branch is called new_shopping and here is my file:

List of things to buy
* two apples
* two oranges
* 3 cups of flour
* 3 cups of sugar
* eggs

Someone else has modified myfile.txt and when I pull I have a conflict Here is the modification done by other people

List of things to buy
* 10 apples
* 10 oranges
* flour
* sugarb

   1 C:\Users\agnel\test>git status
   2 On branch master
   3 You have unmerged paths.
   4   (fix conflicts and run "git commit")
   5 
   6 Unmerged paths:
   7   (use "git add <file>..." to mark resolution)
   8 
   9         both modified:   myfile.txt
  10 
  11 no changes added to commit (use "git add" and/or "git commit -a")
  12 

To show you the content of the conflicted file myfile.txt

List of things to buy
<<<<<<< HEAD
* 10 apples
* 10 oranges
* flour
* sugar
=======
* two apples
* two oranges
* 3 cups of flour
* 3 cups of sugar
* eggs
>>>>>>> new_shopping

GIT warns you of the changes by adding the revision done by others between <<<<<<< rev1 and =======, my changes between ======= and >>>>>>> rev2. That's where you need to work, edit the file, stage it and continue the operation. GIT will tell you what to do to finish git commit for a merge, git cherry-pick --continue for a cherry-pick, git rebase --continue for a rebase.

Pushing

Pushing the changes you have made will send your changes to the remote repository. Scilab CodeReview does not let user push code directly to the remote reference repository, instead when you push your modification they will be added to the CodeReview repository. I will not go into details here, just know for now that the command for pushing your modifications is simply git push. If your GIT is properly configured with the hooks the command will be modified to include where to push and modify your commit messages to add a specific Change-ID for the CodeReview to know about your patch set.

Branching, Rebasing, Cherry-picking and Merging

I did mention that branches is the bread and butter of GIT. I'll tell you why: branches let you work on one aspect of the code without being bothered by changes occuring on the main development branch. If you want to contribute to Scilab you will probably have ideas for a fix on a bug, or for a new feature. The best way to work without interference is by creating a local branch for each fix or feature, work on them how you want, commit changes even if everything is not finished, move back to another branch because you have a new idea, get back to the main branch and pull the changes that were added while you were working on something else, and all without interferences, without multiplying copies of the full scilab source code!

There I said it. Working with branches will increase your productivity. The CodeReview is based on GIT branches automated for the review. Remote branches is also a way to work on new features with other users. Branches are everywhere and branches are good.

As a quick reminder

   1 git branch #will list your local branches
   2 git branch <branch_name> #will create branch <branch_name> at your current HEAD
   3 git branch -d <branch_name> #will delete branch <branch_name>
   4 git checkout <branch_name> #will place your HEAD at the tip of <branch_name>
   5 git checkout -b <branch_name> #will create <branch_name> and place HEAD on it
   6 

But why is branching so good? Because GIT knows how to manipulate branches and commits on branches almost without efforts.

Here is one scenario. As a contributor, I want to add a new feature in Scilab. This will involve adding some .sci in one of the modules, adding the help pages .xml files, modifying other help pages to reference my new feature, modifying the CHANGES file that keeps track of bug fixes and new features, some .tst and .dia.ref files to test this new feature, etc.

The first thing I do is create my local branch for the feature, I'll create it from the tip of the master branch

   1 git checkout master #to place myself on the master branch
   2 git pull #to update my master with the latest changes
   3 git checkout -b my_awesome_feature #to create my branch from the tip of the master
   4 

After this I can edit the files directly in the source tree. I am after all on my local branch, changes here will not aftect the master branch. I do not have to finish my feature now, I can commit my changes after staging the few files I have edited.

   1 git commit -m "WIP: added the .sci file and en_US help file"

This commit is yours, you can go on working on your feature and commit. You might want to test your commit by compiling Scilab with the changes, if its only macros and doc you can make macros and make doc to do the checks.

Rebasing

Now suppose that while you were working on your feature, some changes have been added to reference tree, and you wanted your commit to use some of these changes. Because you created a branch in the past, you code base on the branch diverges from the reference code base. To check it you can go back to the master branch and pull the changes, do this once you have commited all your pending changes.

   1 git checkout master #go back on master branch
   2 git pull #get the latest changes from the remote
   3 

You test the newest features that were merged and decide to see if your commit can benefit from them. What you want is your branch to be moved from the former code base to the newest commit. For this you rebase.

rebase
moves a branch stemming from one commit so that it stems from another

To perform the rebase use:

   1 git checkout <branch_name> #moves to the branch
   2 git rebase master #will rebase the current branch on the tip of master
   3 
   4 # Alternately you can combine the two operation by
   5 git rebase master <branch_name>

Do not hesitate to look at the GIT documentation for rebase for more information.

Because you are moving things you might have Conflicts, make sure you resolve them before continuing.

   1 git rebase --continue #to continue the rebase once conflicts are solved
   2 git rebase --abort #to abort the process of rebasing
   3 

Once done you have your new branch stemming from the latest master with all the features available. You can go back to working on your branch.

Merging

Once you have finished the work on your branch, you will want to integrate those changes to the master branch and publish it by pushing. This is done by merging.

merge
Creates a commit on top of a destination branch by accumulating all changes from the source branch

The process of merging will create a new commit on top of the destination branch. If you merge on master this commit will be the one you will publish! To perform the merge:

   1 git checkout master #moves to the master branch
   2 git merge <branch_name> #will merge commits from <branch_name> into a commit on top of master
   3 

GIT will prompt you if any conflict occur, you will have to resolve them. Once done, you will be able to edit the commit message. If you have configured your hooks, the commit message will contain changes from the file CHANGES. By default if you had conflicts the commit message will list the files that conflicted. Remove them if you want to push afterwards because the CodeReview will not accept the commits with the list of conflict files (it searches for the Change-ID in the last lines of your commit message).

   1 git commit #will commit the changes done after solving conflict
   2 git reset --hard HEAD #will abort the merge commit
   3 

Cherry-picking

Cherry-picking is a very simple way to apply a unitary commit in a branch. This is exactly what you want to do if you want to review the commit someone else has pushed and test it on your working branch. If I reuse my metaphor about magnetic tape, cherry-picking consists in cutting a snapshot and pasting it where you are.

If you want to try or review commits on the CodeReview, the line to cherry-pick the commit is available in the download submenu.

To cherry-pick use the following:

   1 git cherry-pick <commitID> #applies changes in <commitID> to your working branch
   2 

As per rebasing and merging, cherry-picking might result in conflicts, you must resolve them before finishing the cherry-pick process

   1 git cherry-pick --continue #to continue your cherry-pick after resolving conflicts
   2 git cherry-pick --abort #to abort the cherry-pick
   3 

The process of CodeReview lets user modify a cherry-picked commit, amend it and push it, provided the Change-ID is not modified. This ID will be in the commit message from the cherry-picked commit thus if you simply git commit --amend this ID will still be present.

If you want to remove the cherry-picked commit, you can simply reset your working branch to the previous commit with git reset --hard HEAD~.


2022-09-08 09:27