取/拉超大型知识库的一部分? [英] Fetch/Pull Part of Very Large Repository?

查看:91
本文介绍了取/拉超大型知识库的一部分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能很明显,并且以前曾以多种方式被问过,但在搜索一段时间后,我一直无法找到答案。



假设如下所示:


  • 我在本地有一个500GB的磁盘;

  • 我有一个100太字节的远程仓库;因此克隆整个存储库的成本根本不可行;用于创建远程存储库的工作目录由1000个顶级目录DIR001,DIR002,... DIR00N组成(例如DIR001 / subdir1 / fileA1 ... DIR001 / subf1 / fileAN和DIR001 / subdir2 / fileB1 ... DIR001 / subdir2 / fileBN,...
  • $ b包含多个子目录$ b
  • 我没有明确地标记或分支目录DIR001,DIR002,... DIR00N或其他任何有关此事的信息

  • 我初始化一个全新的本地git存储库



如何有效地提取或提取最后提交的版本,例如DIR001 / subdir2 / fileB1 ... DIR001 / subdir2 / fileBN从远程存储库中,没有别的?



AND



只是最后提交的版本来自远程存储库的DIR001 / subdir2 / fileB1 ... DIR001 / subdir2 / fileBN中的单个文件,没有其他内容了?



AND

如何有效地提取或提取以前提交的所有文件子集的版本,而不是其他的?



也许fetch / pull不是正确的命令。

解决方案

部分克隆可以帮助您开始尝试浅层克隆。<
但它会受到限制:


  • 到某一深度,和/或某些分支,

  • ,但不适用于某些文件或目录(您可以通过稀疏检出来获取文件或目录,但您仍然必须首先获得完整的回购!)

  • 即使某个提交。

    (Git 2.5(2015年第二季度)支持单个提取提交!请参阅从远程git存储库提取特定提交)。


真正的解决方案是将巨大的远程repo分离成子模块,但是。
请参阅什么是Git限制用于说明这种情况的二进制文件的Git样式备份






更新4月2015:





Git Large File Storage(LFS)
可以提高拉/ p>该项目为 git-lfs (请参阅 git-lfs.github.com )并通过支持它的服务器进行测试: lfs-test-server

只能将元数据存储在git仓库中,其他地方的文件




This is probably obvious and has been asked many times in different ways before, but I have not been able to find the answer after searching for some time.

Assume the following:

  • I have, say, a 500GB disk at the local end;
  • I have a 100 terabyte remote repository; therefore, the cost of cloning the entire repository is simply not feasible;
  • the working directory used to create the remote repository was composed of 1000 top level directories DIR001, DIR002, ... DIR00N each containing multiple subdirectories with files only under the leaf subdirectories (Ex. DIR001/subdir1/fileA1 ... DIR001/subf1/fileAN and DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN, ...
  • I did NOT explicitly tag or branch directories DIR001, DIR002, ... DIR00N or anything else for that matter
  • I init a brand new local git repository

How do I efficiently pull or fetch the last committed versions of, say, DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN from the remote repository and nothing else?

AND

just the last committed version of a single file from DIR001/subdir2/fileB1 ... DIR001/subdir2/fileBN from the remote repository and nothing else?

AND

How do I efficiently pull or fetch a previously committed version of a subset of said files and nothing else?

Maybe fetch/pull is not the correct command for this.

解决方案

The answer to "Partial cloning" can help you start experimenting with shallow clones.
But it will be limited:

  • to a certain depth, and/or to certain branches,
  • but not to certain files or directories (you can get a file or directory though sparse checkout, but you still have to get the full repo first!)
  • Even a certain commit.
    (Git 2.5 (Q2 2015) supports a single fetch commit! See "Pull a specific commit from a remote git repository").

The real solution would be to separate the huge remote repo into submodules though.
See What are Git limits or Git style backup of binary files for illustrating this kind of situation.


Update April 2015:

Git Large File Storage (LFS) would make pull/fetch much more efficient (by GitHub, April 2015).

The project is git-lfs (see git-lfs.github.com) and tested with server supporting it: lfs-test-server:
You can store metadata only in the git repo, and the large file elsewhere.

这篇关于取/拉超大型知识库的一部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆