Jay Taylor's notes
back to listing indexGit Internals - Learn by Building Your Own Git
[web search]ugit: DIY Git in Python
Welcome aboard! We're going to implement Git in Python to learn more about how Git works on the inside.
This tutorial is different from most Git internals tutorials because we're not going to talk about Git only with words but also with code! We're going to write in Python as we go.
This is not a tutorial on using Git! To follow along I advise that you have working knowledge of Git. If you're a newcomer to Git, this tutorial is probably not the best place to start your Git journey. I suggest coming back here after you've used Git a bit and you're comfortable with making commits, branching, merging, pushing and pulling.
Why learn Git internals?
For most tools that we use daily, we don't really care about their internals. We can use Firefox or Vim without understanding their inner workings.
At first you shouldn't care about Git internals either. You can use Git as a set of CLI commands that track code history. Run git add
, git commit
and git push
all day long and you'll do fine, as long as you're a sole developer who just commits to one branch.
But once you start collaborating with multiple people on multiple branches and things like rebase or force push are getting involved, it's easy to become lost if you don't have a good mental model of Git internals.
From my experience with using Git myself and teaching others, a better way to improve your effectiveness with Git is by understanding how it works behind the scenes and not by learning more "advanced" Git commands. This understanding is what will allow you to solve the kind of problems that multi-user collaborative coding sometimes produce.
Introducing: μgit
μgit (ugit) is a small implementation of a Git-like version control system (VCS). It's top goal is simplicity and educational value. ugit is implemented in small incremental steps, with each step explained in detail. Hopefully you will be able to read the small steps (explanation and code) and slowly build a complete picture of the internals.
ugit is not exactly Git, but it shares the important ideas of Git. ugit is way shorter and doesn't implement irrelevant features. For example, to reduce the complexity of ugit, ugit doesn't compress objects, doesn't save the mode of the files or doesn't save the time of a commit. But the important ideas, like commits, branches, the index, merges and remotes are all present and are very similar to Git. If you know ugit well you will be able to recognize the same ideas in Git.
This tutorial organized as a series of code changes, each change contains an explanation and the diff of the change. For example, you're now reading the first change, and you can see the code that we've added in this change as a diff on the other side. The code is an empty Python application that prints "hello world".
In more detail, we created a setup.py file that describes the ugit executable. The executable calls the main()
function in cli.py once invoked.
I also recommend to download the source (or type it yourself) in order to follow along and try the ideas in practice. The source for ugit is hosted in a Git repository and the command to download it can be found in the other window. If you want to run the code, I recommend installing ugit in development mode. Run the following command in the root directory of the project:
$ python3 setup.py develop --user
Installing in development mode creates a link to our source code instead of copying it to the installation directory, so we can still edit the source and run it immediately.
Now we can run ugit
and see "Hello, World!" printed out.
To go to the next change, please press the "Next" button or use the right arrow key.
Why learn Git using code?
As I mentioned earlier, in this tutorial we will actually implement Git in Python. I believe that for programmers, seeing the concepts implemented in code crystallizes understanding. It's cool to see Git explained in a diagram, but when you see the same concepts in live code that you can fully understand and actually run, a deeper understanding can be achieved. That's because if the code works no details can be omitted from it, unlike an expalation with words.
Why not learn Git by reading the real Git code?
The real Git code is too complicated to be useful for learning basic concepts with ease. It is production quality code that is optimized for speed. It is written in C. It implements so many advanced Git features. It deals with a lot of edge cases that we don't care about for learning. In this tutorial we will focus on the bare minimum to get the point across.
About Me
Hi, I'm Nikita and this is a tutorial I've been working on for a long time. If you have any questions or suggestions, please leave a comment on any of the relevant sections.
Diff
@@ -0,0 +1,2 @@
+ __pycache__
+ *.egg-info/
@@ -0,0 +1,12 @@
+ #!/usr/bin/env python3
+
+ from setuptools import setup
+
+ setup (name = 'ugit',
+ version = '1.0',
+ packages = ['ugit'],
+ entry_points = {
+ 'console_scripts' : [
+ 'ugit = ugit.cli:main'
+ ]
+ })
@@ -0,0 +1,2 @@
+ def main ():
+ print ('Hello, World!')
- .gitignore
- setup.py
- ugit/cli.py