Lecture 13: Version control and git

Version control

Git

Basic idea:

Some nice features:

Two important concepts

Commits and branches:

Getting started

Two ways to make a git repository:

git init

There will now be a .git folder in the project directory. This is where the history of the project is stored.

git clone https://github.com/tidyverse/ggplot2

This will make a directory called ggplot2 and copy the ggplot2 repository into it. This means that you have the current state of the ggplot2 project and access to all the previous versions.

Staging and creating commits

Once you have the repository, you usually want to change things and record those changes = create a commit.

Our first problem is that you might want to record just a subset of the changes that you made.

This problem is solved with the “staging area.”

When you make changes to files, Git sees that they are different from the last version recorded.

Files can be in four states:

Checking the status of your files

The command git status will describe the state of the files in your repository.

If nothing has changed since the last commit, the output will look something like this:

On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean

If there were changes, the files will be shown as untracked, modified but not staged, or staged.

Tracking new files

To start tracking a file that wasn’t part of the last commit, use git add <filename>.

For example, if you’ve made a new file called README:

$ git add README
$ git status

On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)

    new file:   README

Staging modified files

If we change a file that was tracked in the most recent commit, it will be marked as modified.

If we then use git add <filename>, the file status will go from modified to staged.

Viewing your staged and unstaged changes

Committing your changes

The command git commit creates a new commit corresponding to the staged changes (so the unstaged changes are not part of the new commit).

Viewing the commit history

The command git log shows the commit history.

Some useful flags:

Branches

Problem: We have made a lot of commits. If we want to switch between them, how do we do that?

Start with three commits, two branches

Add a commit:

Switch branches with checkout:

The repository in more detail

How all the data is stored

Three basic types of objects:

All these objects are stored by their hash

Hashes

Blobs

Trees

To see the tree, you can use:

git cat-file -p master^{tree}

And the output might look like this:

100644 blob 01b480b010b7fe66e312e1271dd24e128f3a0290    .gitmodules
100644 blob 1d17afb2a980076fc389f3d2747b0bfefd4df839    Dockerfile
100644 blob 716007c1456163b933cb086acae151fc6a24ca6d    README.md
100644 blob 9af5513cf53dfbdedbc69ec43865dec054de0ccd    SConstruct
040000 tree 100d47915afe22615ff111d390170c7265900b7a    analysis

Conceptually, if we have a directory containing

Git would store a snapshot of the directory as three blobs and two trees:

Commits

Remember that a commit records the state of the repository at a particular point in time. It contains:

Commits are also referred to by their hashes, and you should think of git as storing a set of commits.

Putting everything together, we get a graph that describes the files that were present at different commits:

Overall