Snapshots, Not Differences.
Todo: Explaining some background on version control tools.
Target: Should understand why Git is around, why you should use it.
What is “version control”? (aka Version Control System, VCS)
Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. (And compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when.)
So using a VCS also generally means that if you screw things up or lose files, you can easily recover.
Local VCSs is which had a simple database that kept all the changes to files under revision control.
Such as RCS, it works by keeping patch sets (that is, the differences between files) in a special format on disk; it can then re-create what any file looked like at any point in time by adding up all the patches.
Centralized VCS (such as CVS, Subversion) have a single server that contains all the versioned files, and a number of clients that check out files from that central place.
The most obvious downside is the single point of failure: whenever you have the entire history of the project in a single place, you risk losing everything.
In a DVCS (such as Git), clients don’t just check out the latest snapshot of the files; rather, they fully mirror the repository, including its full history.
Thus, if any server dies, any of the client repositories can be copied back up to the server to restore it. Every clone is really a full backup of all the data.
Some of the goals of Git were as follows:
- Simple design
- Strong support for non-linear development (thousands of parallel branches)
- Fully distributed
- Able to handle large projects like the Linux kernel efficiently (speed and data size)
- amazingly fast
- very efficient with large projects
- has an incredible branching system for non-linear development
Git stores and thinks about information in a very different way (compared to other VCS), and understanding these differences will help you avoid becoming confused while using it.
Conceptually, most other systems store information as a list of file-based changes.(this is commonly described as delta-based version control).
Instead, Git thinks of its data more like a series of snapshots of a miniature filesystem.
With Git, every time you commit, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.
To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored.
Git thinks about its data more like a stream of snapshots.
We’ll explore some of the benefits you gain by thinking of your data this way when we cover Git branching in Git Branching.
This may not seem like a huge deal, but you may be surprised what a big difference it can make.
Everything in Git is check-summed before it is stored and is then referred to by that checksum.(The mechanism that Git uses for this checksumming is called a SHA-1 hash, a 40-character string composed of hexadecimal characters (0–9 and a–f))
When you do actions in Git, nearly all of them only add data to the Git database.
This makes using Git a joy because we know we can experiment without the danger of severely screwing things up.
Git has three main states that your files can reside in: modified, staged and committed:
- (Working tree)Modified means that you have changed the file but have not committed it to your database yet.
- (Staging area)Staged means that you have marked a modified file in its current version to go into your next commit snapshot.
- (.git directory)Committed means that the data is safely stored in your local database.
The working tree is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.
The staging area is a file, generally contained in your Git directory, that stores information about what will go into your next commit. Its technical name in Git parlance is the “index”, but the phrase “staging area” works just as well.
The Git directory(.git) is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.
The basic Git workflow goes something like this:
- You modify files in your working tree.
- You selectively stage just those changes you want to be part of your next commit, which adds only those changes to the staging area.
- You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.
While your choice of graphical client is a matter of personal taste, all users will have the command-line tools installed and available.
git config --global user.name "RayJune"
There are two equivalent ways to get the comprehensive manual page (manpage) help for any of the Git commands:
git help <verb>
In addition, if you don’t need the full-blown manpage help, but just need a quick refresher on the available options for a Git command, you can ask for the more concise “help” output with the -h or –help options.
git add -h
If you can read only one chapter to get going with Git, this is it.
- Todo: This chapter covers every basic command you need to do the vast majority of the things you’ll eventually spend your time doing with Git.
- Target: Should be able to configure and initialize a repository, begin and stop tracking files, and stage and commit changes.
Remember that each file in your working directory can be in one of two states: tracked or untracked.
- Tracked files are files that Git knows about.
- Untracked files are everything else.