Metadata-Version: 2.1
Name: grokmirror
Version: 1.2.0
Summary: Smartly mirror git repositories that use grokmirror
Home-page: https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git
Author: Konstantin Ryabitsev
Author-email: konstantin@linuxfoundation.org
License: GPLv3+
Download-URL: https://www.kernel.org/pub/software/network/grokmirror/grokmirror-1.2.0.tar.xz
Project-URL: Source, https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git
Project-URL: Tracker, https://github.com/mricon/grokmirror/issues
Description: GROKMIRROR
        ==========
        --------------------------------------------
        Framework to smartly mirror git repositories
        --------------------------------------------
        
        :Author:    konstantin@linuxfoundation.org
        :Date:      2019-02-14
        :Copyright: The Linux Foundation and contributors
        :License:   GPLv3+
        :Version:   1.2.0
        
        DESCRIPTION
        -----------
        Grokmirror was written to make mirroring large git repository
        collections more efficient. Grokmirror uses the manifest file published
        by the master mirror in order to figure out which repositories to
        clone, and to track which repositories require updating. The process is
        extremely lightweight and efficient both for the master and for the
        mirrors.
        
        CONCEPTS
        --------
        Grokmirror master publishes a json-formatted manifest file containing
        information about all git repositories that it carries. The format of
        the manifest file is as follows::
        
            {
              "/path/to/bare/repository.git": {
                "description": "Repository description",
                "reference":   "/path/to/reference/repository.git",
                "modified":    timestamp,
                "fingerprint": sha1sum(git show-ref),
                "symlinks": [
                    "/location/to/symlink",
                    ...
                ],
               }
               ...
            }
        
        The manifest file is usually gzip-compressed to preserve bandwidth.
        
        Each time a commit is made to one of the git repositories, it
        automatically updates the manifest file using an appropriate git hook,
        so the manifest.js file always contains the most up-to-date information
        about the repositories provided by the git server and their
        last-modified date.
        
        The mirroring clients will constantly poll the manifest.js file and
        download the updated manifest if it is newer than the locally stored
        copy (using ``Last-Modified`` and ``If-Modified-Since`` http headers).
        After downloading the updated manifest.js file, the mirrors will parse
        it to find out which repositories have been updated and which new
        repositories have been added.
        
        For all newly-added repositories, the clients will do::
        
            git clone --mirror git://server/path/to/repository.git \
                /local/path/to/repository.git
        
        For all updated repositories, the clients will do::
        
            GIT_DIR=/local/path/to/repository.git git remote update
        
        When run with ``--purge``, the clients will also purge any repositories
        no longer present in the manifest file received from the server.
        
        Shared repositories
        ~~~~~~~~~~~~~~~~~~~
        Grokmirror will automatically recognize when repositories share objects
        via alternates. E.g. if repositoryB is a shared clone of repositoryA
        (that is, it's been cloned using ``git clone -s repositoryA``), the
        manifest will mention the referencing repository, so grokmirror will
        mirror repositoryA first, and then mirror repositoryB with a
        ``--reference`` flag. This greatly reduces the bandwidth and disk use
        for large repositories.
        
        See man git-clone_ for more info.
        
        .. _git-clone: https://www.kernel.org/pub/software/scm/git/docs/git-clone.html
        
        SERVER SETUP
        ------------
        Install grokmirror on the server using your preferred way.
        
        **IMPORTANT: Currently, only bare git repositories are supported.**
        
        You will need to add a hook to each one of your repositories that would
        update the manifest upon repository modification. This can either be a
        post-receive hook, or a post-update hook. The hook must call the
        following command::
        
            /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos -n `pwd`
        
        The **-m** flag is the path to the manifest.js file. The git process must be
        able to write to it and to the directory the file is in (it creates a
        manifest.js.randomstring file first, and then moves it in place of the
        old one for atomicity).
        
        The **-t** flag is to help grokmirror trim the irrelevant toplevel disk
        path. E.g. if your repository is in /var/lib/git/repository.git, but it
        is exported as git://server/repository.git, then you specify ``-t
        /var/lib/git``.
        
        The **-n** flag tells grokmirror to use the current timestamp instead of the
        exact timestamp of the commit (much faster this way).
        
        Before enabling the hook, you will need to generate the manifest.js of
        all your git repositories. In order to do that, run the same command,
        but omit the -n and the \`pwd\` argument. E.g.::
        
            /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos
        
        The last component you need to set up is to automatically purge deleted
        repositories from the manifest. As this can't be added to a git hook,
        you can either run the ``--purge`` command from cron::
        
            /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos -p
        
        Or add it to your gitolite's ``D`` command using the ``--remove`` flag::
        
            /usr/bin/grok-manifest -m /repos/manifest.js.gz -t /repos -x $repo.git
        
        If you would like grok-manifest to honor the ``git-daemon-export-ok``
        magic file and only add to the manifest those repositories specifically
        marked as exportable, pass the ``--check-export-ok`` flag. See
        ``git-daemon(1)`` for more info on ``git-daemon-export-ok`` file.
        
        MIRROR SETUP
        ------------
        Install grokmirror on the mirror using your preferred way.
        
        Locate repos.conf and modify it to reflect your needs. The default
        configuration file is heavily commented.
        
        Add a cronjob to run as frequently as you like. For example, add the
        following to ``/etc/cron.d/grokmirror.cron``::
        
            # Run grok-pull every minute as user "mirror"
            * * * * * mirror /usr/bin/grok-pull -p -c /etc/grokmirror/repos.conf
        
        Make sure the user "mirror" (or whichever user you specified) is able to
        write to the toplevel and log locations specified in repos.conf.
        
        If you already have a bunch of repositories in the hierarchy that
        matches the upstream mirror and you'd like to reuse them instead of
        re-downloading everything from the master, you can pass the ``-r`` flag
        to tell grok-pull that it's okay to reuse existing repos. This will
        delete any existing remotes defined in the repository and set the new
        origin to match what is configured in the repos.conf.
        
        GROK-FSCK
        ---------
        Git repositories can get corrupted whether they are frequently updated
        or not, which is why it is useful to routinely check them using "git
        fsck". Grokmirror ships with a "grok-fsck" utility that will run "git
        fsck" on all mirrored git repositories. It is supposed to be run
        nightly from cron, and will do its best to randomly stagger the checks
        so only a subset of repositories is checked each night. Any errors will
        be sent to the user set in MAILTO.
        
        To enable grok-fsck, first locate the fsck.conf file and edit it to
        match your setup -- e.g., it must know where you keep your local
        manifest. Then, add the following to ``/etc/cron.d/grok-fsck.cron``::
        
            # Make sure MAILTO is set, for error reports
            MAILTO=root
            # Run nightly repacks to optimize the repos
            0 2 1-6 * * mirror /usr/bin/grok-fsck -c /etc/grokmirror/fsck.conf --repack-only
            # Run weekly fsck checks on Sunday
            0 2 0 * * mirror /usr/bin/grok-fsck -c /etc/grokmirror/fsck.conf
        
        You can force a full run using the ``-f`` flag, but unless you only have
        a few smallish git repositories, it's not recommended, as it may take
        several hours to complete. See the man page for other flags grok-fsck
        supports.
        
        Before it runs, grok-fsck will put an advisory lock for the git-directory
        being checked (.repository.git.lock). Grok-pull will recognize the lock
        and will postpone any incoming updates to that repository until the lock
        is freed.
        
        FAQ
        ---
        Why is it called "grok mirror"?
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Because it's developed at kernel.org and "grok" is a mirror of "korg".
        Also, because it groks git mirroring.
        
        Why not just use rsync?
        ~~~~~~~~~~~~~~~~~~~~~~~
        Rsync is extremely inefficient for the purpose of mirroring git trees
        that mostly consist of a lot of small files that very rarely change.
        Since rsync must calculate checksums on each file during each run, it
        mostly results in a lot of disk thrashing.
        
        Additionally, if several repositories share objects between each-other,
        unless the disk paths are exactly the same on both the remote and local
        mirror, this will result in broken git repositories.
        
        It is also a bit silly, considering git provides its own extremely
        efficient mechanism for specifying what changed between revision X and
        revision Y.
        
        Why not just run "git pull" from cron every minute?
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        This is not a complete mirroring strategy, as this won't notify you when
        the remote mirror adds new repositories. It is also not very nice to the
        remote server, especially the one that carries hundreds of repositories.
        
        Additionally, this will not automatically take care of shared
        repositories for you. See "Shared repositories" under "CONCEPTS".
        
Keywords: git,mirroring,repositories
Platform: UNKNOWN
Description-Content-Type: text/x-rst
