子git存储库作为主存储库的子集 [英] Child git repository as subset of a main repository

查看:57
本文介绍了子git存储库作为主存储库的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法来建立git存储库,该存储库包括较大存储库中文件的子集,并从该主存储库继承历史记录.我的主要动机是能够通过GitHub共享代码的子集.

I'm looking for a way to set up git respositories that include subsets of files from a larger repository, and inherit the history from that main repository. My primary motivation is to be able to share subsets of the code via GitHub.

我目前通过单个git存储库管理与研究相关的代码(主要是Matlab).代码本身被松散地组织到几个文件夹中,这些代码依赖性经常跨越文件夹.我不想上传整个存储库的远程副本,因为它包含许多混合项目,而其他任何人都不希望完整.

I currently manage my research-related (mostly Matlab) code via a single git repository. The code itself is loosely organized into a handful of folders, with code dependencies that often cross over folders. I don't want to upload a remote copy of the whole repository, because it includes a lot of mixed projects that no one else would want in its entirety.

我对此的构想涉及为每个项目建立单独的存储库,该存储库仅跟踪该项目的相关文件,但从主存储库继承所有提交.理想情况下,我希望能够在这些子存储库中将版本标记为与主存储库分开,但这不是必需的.我研究了git子模块,子树和gitslave,但是所有这些似乎都假设子项目是文件的隔离集合,而在我的情况下,许多子项目与其他子项目共享文件.我还尝试创建不相关文件的git rm特定于项目的分支,但是当我需要将更改从主分支合并到项目分支中时,该分支就崩溃了(由于项目更改而造成的混乱冲突-删除的文件).

My mental picture of this involves a separate repository for each project that tracks only the relevant files for that project, but inherits all the commits from the main repository. Ideally, I'd like to be able to tag versions within these sub-repositories separate from the main one, but that's not a necessity. I've looked into git submodules, subtrees, and gitslave, but all of these seem to assume that the subprojects are isolated collections of files, while in my case many subprojects share files with other subprojects. I also attempted to create a project-specific branch, git rm-ing irrelevant files, but that fell apart as soon as I needed to merge changes from the main branch into the project branch (a mess of conflicts due to changes in project-deleted files).

统计数据:

  • 主存储库中的8096个文件
  • 我要共享的14个子项目
  • 这些子项目中的
  • 总共394个文件
  • 276个文件仅属于1个项目,即57到2、60到3和1到6.
  • 8096 files in main repository
  • 14 subprojects I want to share
  • 394 total files in those subprojects
  • 276 files belong to only 1 project, 57 to 2, 60 to 3, and 1 to 6.

我目前通过简单地将每个项目的相关文件定期复制到新文件夹中来共享代码.但这意味着新副本没有附加提交历史记录.是否存在一种更健壮的方法来共享这些代码的各个子集,并使它们与我所做的更改保持最新?

I currently share code by simply copying the relevant files to a new folder periodically for each project. But this means that the new copies have no commit history attached. Is there a more robust method of sharing these various subsets of code, and keeping them up to date with changes I make?

推荐答案

据我了解您的问题

  • 您有一个包含多个子项目的大型仓库
  • 您要提取并共享每个子项目作为其自己的存储库,仍然包含(仅)该子项目的历史记录/提交
  • 这些子项目共享一些文件=>这意味着一个子项目使用的文件并不严格包含在一个子目录中,因为一个文件可以在多个子项目中使用,这就是为什么您不能简单地使用git subtreegit submodules
  • you have one big repo containing multiple subprojects
  • you want to extract and share each subproject as its own repository, still containing the history/commits for (only) that subproject
  • the subprojects share some files => this implies that the files used by one subproject are not strictly contained in a single subdirectory since one file may be used in multiple subprojects and this is why you can't simply use git subtree or git submodules

将文件的一部分子集的历史记录提取到专用分支(然后可以将其推送到专用存储库)的一种方法是使用git filter-branch:

One way to extract the history of just a subset of the files into a dedicated branch (which you then can push into a dedicated repository) is using git filter-branch:

# regex to match the files included in this subproject, used below
file_list_regex='^subproject1/|^shared_file1$|^lib/shared_lib2$'

git checkout -b subproject1 # create new branch from current HEAD

git filter-branch --prune-empty \
  --index-filter "git ls-files --cached | grep -v -E '$file_list_regex' | xargs -r git rm --cached" \
  HEAD

这将

  • 首先根据当前的HEAD(git checkout -b subproject1)创建一个新的分支subproject1
  • 遍历整个历史(git filter-branch [...] HEAD)
  • 删除不属于子项目(git ls-files --cached | grep -v -E '$file_list_regex')的所有文件(xargs -r git rm --cached)
  • 所有未触及子项目文件之一的提交都将从该分支(--prune-empty)中删除.
  • 此操作不会签出每个修订,而是仅对索引(--index-filter/--cached)进行操作.
  • first create a new branch subproject1 based on the current HEAD (git checkout -b subproject1)
  • traverse its whole history (git filter-branch [...] HEAD)
  • remove all files (xargs -r git rm --cached) that are not part of the subproject (git ls-files --cached | grep -v -E '$file_list_regex')
  • All commits that did not touch one of the subproject files will be dropped from that branch (--prune-empty).
  • This operation does not checkout each revision but operates only on the index (--index-filter/--cached).

这是一次性操作,但是据我了解您的问题,您想用新的提交不断更新提取的子项目存储库/分支. 好消息是,您可以简单地重复此命令,因为git filter-branch将始终为子项目分支生成相同的提交/历史记录-假设您不手动更改它们或重写主分支.

This is a one-time operation though but as I understand your question you want to continously update the extracted subproject repositories/branches with new commit. The good news is you could simply repeat this command since git filter-branch will always produce the same commits/history for your subproject branches - given that you don't manually alter them or rewrite your master branch.

这样做的缺点是,每次都会filter-branch完成完整历史,并且对于每个子项目一次又一次. 假设您只想将master分支的最后5次提交添加到现有subproject1分支的尖端,则可以修改以下命令:

The drawback of this is that this would filter-branch the complete history each time and for each subproject again and again. Given that you only want to add the last 5 commits of the master branch to the tip of your existing subproject1 branch you could adapt the commands like this:

# get the full commit ids for the commits we consider
# to be equivalent in master and subproject1 branch
common_base_commit="$(git rev-parse master~6)"
subproject_tip="$(git rev-parse subproject1)"

# checkout a detached HEAD so we don't change the master branch
git checkout --detach master

git filter-branch --prune-empty \
  --index-filter "git ls-files --cached | grep -v -E '$file_list_regex' | xargs -r git rm --cached" \
  --parent-filter "sed s/${common_base_commit}/${subproject_tip}/g" \
  ${common_base_commit}..HEAD

# force reset subproject1 branch to current HEAD
git branch -f subproject1

说明:

  • 这只会重写直到master~6的最后5次提交(git filter-branch [...] ${common_base_commit}..HEAD),我们认为这是对subproject1当前提示的等同提交.
  • 对于(第一次)提交,它会将其父级从master~6重写为subproject1(--parent-filter 'sed s/${common_base_commit}/${subproject_tip}/g'),从而有效地将5个重写的提交重新编入subproject1之上.
  • 最后,我们只需要更新subproject1以在其顶部包括新提交即可.
  • This will only rewrite the last 5 commits (git filter-branch [...] ${common_base_commit}..HEAD) up to master~6 which we consider to be the equivalent commit to subproject1s current tip.
  • For (the first of) those commits it will rewrite its parent from master~6 to subproject1 (--parent-filter 'sed s/${common_base_commit}/${subproject_tip}/g') effectively rebasing the 5 rewritten commits on top of subproject1.
  • Finally we only need to update subproject1 to include the new commits on top of it.

进一步的优化/自动化:

Further optimazation/automation:

  • 实施更好的逻辑,以列出要包含($file_list_regex)或实际上要从给定子项目中排除(git ls-files --cached | grep -v -E '$file_list_regex')的文件
  • 使要包含的文件列表取决于当前提交($GIT_COMMIT),或者将列表检入到存储库本身,以防每个子项目要包含的文件随时间变化
  • 找到一种自动方法来查找当前主项目中子项目分支提示的等效"提交
  • 使用漂亮的git别名组合所有内容,因此您只需使用git update-project subproject1
  • implement a better logic to list the files you want to include ($file_list_regex) or actually to exclude (git ls-files --cached | grep -v -E '$file_list_regex') from a given subproject
  • make the list of files to include depend on the current commit ($GIT_COMMIT) or check-in the list to the repository itself in case the files to include per subproject may change over time
  • find an automated way to find the 'equivalent' commit of a subproject branches tip in the current master
  • combine all of it in a nice git alias so you can simply use git update-project subproject1

这篇关于子git存储库作为主存储库的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆