Git Large File Storage的存储机制是什么? [英] What is the storage mechanism behind Git Large File Storage?

查看:191
本文介绍了Git Large File Storage的存储机制是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Github最近推出了一个扩展到git以不同的方式存储大文件。 扩展名所代表的意思是用Git中的文本指针替换大文件

您可以在 git-lfs来源查看text pointeris defined

 类型指针结构{
版本字符串
Oid字符串
大小int64
OidType字符串

smudge and clean 来源意味着 git-lfs 可以使用 内容过滤驱动程序 以便:


  • 在结账时下载实际文件

  • 在提交时将其存储在其外部源中。


请参阅指针规格


Git LFS的核心思想是,而不是将大型Blob写入Git存储库,仅写入指针文件




 版本https://git-lfs.github.com/spec/v1 
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 12345
(ending \\\




Git LFS需要一个URL端点才能与远程服务器进行通信。

Git存储库可以为不同的远程使用不同的Git LFS端点。


blockquot e>

实际的文件上传到服务器或从服务器下载,该服务器尊重 Git-LFS API git-lfs 手册页,其中提到:


实际文件被推送到 Git LFS API




您需要一个实现该API的Git服务器,以支持上传和下载二进制内容。




关于内容过滤器驱动程序(它在Git中很久以前就存在了,在lfs之前,在这里用l fs添加这个大文件管理功能),这是大部分工作发生的地方:


涂抹过滤器在文件正在从Git存储库检出到工作目录时运行。

Git将Git blob的内容作为STDIN发送,并期望内容作为STDOUT写入工作目录。



读取100个字节。如果内容是ASCII并且与指针文件格式相匹配:

在文件中查找文件 git / lfs / objects / {OID}。


  • 如果不存在,请从服务器下载。 STDOUT


  • 否则,只需将STDIN通过STDOUT传递出去即可。 b $ b

    清理过滤器在文件添加到存储库时运行。

    Git发送添加为STDIN的文件的内容,并期望内容写入Git as STDOUT。




    • 在计算SHA-256签名时,将STDIN中的二进制内容传输到临时文件中。 $ b
    • 检查 .git / lfs / objects / {OID} 中的文件。

    • 如果它不存在:


      • 排队上传的OID。

      • 将临时文件移动到 .git / lfs / objects / {OID}


    • 删除te


    • $ b
      将指针文件写入STDOUT。 hr>

      Git 2.11(Nov. 2016)有一个提交详细说明这是如何工作的: commit edcc858 ,由马丁帮助-Louis Bright并签署了:Lars Schneider。
      $ b


      convert :add filter。<驱动程序> .process 选项



      Git的clean / smudge机制调用$ b的外部过滤器进程$ b每个受过滤器影响的blob。如果Git过滤了许多
      blob,那么外部过滤器进程的启动时间可能会变成
      a,这是整个Git执行时间的重要部分。



      在一个初步的性能测试中,这个开发者使用一个用golang编写的干净/污染的
      过滤器来过滤12,000个文件。这个过程花费364美元b $ b与现有的过滤器机制和5s与新的机制。请参阅
      的详细信息: git-lfs / git-lfs#1382



      此修补程序添加过滤器<驱动程序> .process 如果使用
      ,则使外部过滤器进程保持运行,并使用基于标准输入的数据包格式( pkt-line )协议处理所有blob
      ,并使用
      标准输出


      完整的协议在 Documentation / gitattributes.txt 中进行了详细解释。



      几个关键的决定:


      • 长时间运行的过滤器过程被称为过滤器协议
        版本2,因为现有的单次拍摄过滤器调用是
        认为是版本1.

      • Git发送欢迎消息并希望在
        外部过滤器进程启动后立即做出响应。这可以确保如果版本1过滤器与
        过滤器错误使用,则Git不会
        挂起。<驱动程序> .process
        版本2选项过滤器。另外,
        Git可以检测到这种错误并警告用户。
      • 过滤操作的状态(例如success或error)被设置为

        响应之后重新设置实际响应和(如有必要!)这种两步状态响应的优点是,如果
        过滤器提前检测到错误,则过滤器可以将
        this并且Git甚至不需要创建结构来读取
        响应。
      • 所有状态响应都是pkt行列表,以flush
        数据包。这允许我们在将来使用相同的
        协议发送其他状态字段。

      这在Git 2.12(2017年第一季度)中设置了一个警告

      请参阅 commit 7eeda8b (2016年12月18日)和提交c6b0831 (2016年12月3日)作者: Lars Schneider( larsxschneider

      (由 Junio C Hamano - gitster - 提交08721a0 ,2016年12月27日)
      $ b


      docs :在干净/污点过滤器过程值中警告可能的 = '

      路径名code> key = value pair can
      包含' = '字符(在 edcc858 中介绍)。

      让用户知道在文档中添加相应的t est案例,并在 contrib 中的示例实现的过滤器值解析器中解决问题。



      Github recently introduced a extension to git for storing large files in a different way. What exactly they mean by extension replaces large files with text pointers inside Git ?

      解决方案

      You can see in the git-lfs sources how a "text pointer" is defined:

      type Pointer struct {
          Version string
          Oid     string
          Size    int64
          OidType string
      } 
      

      The smudge and clean sources means git-lfs can use a content filter driver in order to:

      • download the actual files on checkout
      • store them in their external source on commit.

      See the pointer specs:

      The core Git LFS idea is that instead of writing large blobs to a Git repository, only a pointer file is written.

      version https://git-lfs.github.com/spec/v1
      oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
      size 12345
      (ending \n)
      

      Git LFS needs a URL endpoint to talk to a remote server.
      A Git repository can have different Git LFS endpoints for different remotes.

      The actual file is upload to or downloaded from a server which respect the Git-LFS API.

      This is confirmed by the git-lfs man page, which mentions:

      The actual file gets pushed to a Git LFS API

      You need a Git server which implements that API in order to support for uploading and downloading binary content.


      Regarding the content filter driver (which exists in Git for a long time, well before lfs, and is used here by lfs to add this "large file management" feature), this is where the bulk of the work happens:

      The smudge filter runs as files are being checked out from the Git repository to the working directory.
      Git sends the content of the Git blob as STDIN, and expects the content to write to the working directory as STDOUT.

      Read 100 bytes.

      • If the content is ASCII and matches the pointer file format:
        Look for the file in .git/lfs/objects/{OID}.

      • If it's not there, download it from the server.
        Read its contents to STDOUT

      • Otherwise, simply pass the STDIN out through STDOUT.

      The clean filter runs as files are added to repositories.
      Git sends the content of the file being added as STDIN, and expects the content to write to Git as STDOUT.

      • Stream binary content from STDIN to a temp file, while calculating its SHA-256 signature.
      • Check for the file at .git/lfs/objects/{OID}.
      • If it does not exist:
        • Queue the OID to be uploaded.
        • Move the temp file to .git/lfs/objects/{OID}.
      • Delete the temp file.
      • Write the pointer file to STDOUT.


      Git 2.11 (Nov. 2016) has a commit detailing even more how this works: commit edcc858, helped by Martin-Louis Bright and signed-off by: Lars Schneider.

      convert: add filter.<driver>.process option

      Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time.

      In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: git-lfs/git-lfs#1382

      This patch adds the filter.<driver>.process string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output.
      The full protocol is explained in detail in Documentation/gitattributes.txt.

      A few key decisions:

      • The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1.
      • Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user.
      • The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response.
      • All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future.

      This has for consequence a warning set in Git 2.12 (Q1 2017)

      See commit 7eeda8b (18 Dec 2016), and commit c6b0831 (03 Dec 2016) by Lars Schneider (larsxschneider).
      (Merged by Junio C Hamano -- gitster -- in commit 08721a0, 27 Dec 2016)

      docs: warn about possible '=' in clean/smudge filter process values

      A pathname value in a clean/smudge filter process "key=value" pair can contain the '=' character (introduced in edcc858).
      Make the user aware of this issue in the docs, add a corresponding test case, and fix the issue in filter process value parser of the example implementation in contrib.

      这篇关于Git Large File Storage的存储机制是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆