源代码管理 (TFS) 中的大文件 [英] Large Files in Source Control (TFS)

查看:26
本文介绍了源代码管理 (TFS) 中的大文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近在办公室,我们一直在讨论将大文件放入我们的 TFS 存储库.文件本身是 XML,大小通常为 100-200MB,有时大到 1GB.我们将它们用作自动化测试的数据,而且它们大多是静态的(每年左右都会进行一次小调整).无论如何,有一种观点认为将这样的文件放入存储库是不允许的,因为它们大"并且会使事情变慢"(在原始签入/签出之外),但我们并没有真正有任何证据支持这一点.

Recently at the office we have been talking about placing large files into our TFS repository. The files themselves are XML, usually 100-200MB in size, and sometimes as large as 1GB. We use them as data for automated testing and they are mostly static (one gets a minor tweak every year or so). Anyway, there is a notion that putting files like this into the repository is a no-no because they are "big" and that will make things "slow" (outside of the original check-in/out) but we don't really have any evidence to back this up.

所以我的问题是,将大型静态文件放入像 TFS(或 SVN、Git 等)这样的源代码存储库的利弊是什么?它会填满服务器"还是有其他一些可怕的后果?

So my question is, what are the pros / cons / implications of putting large static files into a source code repository like TFS (or SVN, Git, etc. for that matter) Is it OK? Will it "fill up the server" or have some other dire consequence?

推荐答案

tl;dr:TFS 旨在优雅地处理大文件.您必须面对的最大障碍是上传/下载文件的网络带宽.第二个问题是服务器上的存储空间.假设你已经考虑了这两个问题,你应该不会有任何其他问题.

tl;dr: TFS is designed to handle large files gracefully. The largest hurdle you'll have to face is network bandwidth to upload/download the files. The second issue is that of storage space on the server. Assuming you've considered these two issues, you shouldn't have any other problems.

网络带宽:检入或获取文件的开销非常小,应该与典型的 HTTP 上传或下载一样快.如果您的客户端在网络方面远离服务器,他们可能会受益于在其本地网络上安装 TFS 源代码控制代理以加快下载速度.

Network bandwidth: There is very little overhead in checking in or getting files, it should be as fast as a typical HTTP upload or download. If your clients are remote from the server, network-wise, they may benefit by having a TFS source control proxy on their local network to speed up downloads.

请注意,与某些版本控制系统不同,TFS 在上传或下载新内容时不计算和传输增量.也就是说,如果客户端有一个大文本文件的修订版 4,而修订版 5 在末尾添加了几行,一些版本控制工具会优化这种体验,只发送更改的行.TFS 没有做这个优化,所以如果你的文件经常变化,客户端每次都需要下载整个文件.

Note that unlike some version control systems, TFS does not compute and transmit deltas when uploading or downloading new content. That is to say, if a client had revision 4 of a large text file, and revision 5 had added a few lines at the end, some version control tools optimize this experience to only send the changed lines. TFS does not do this optimization, so if your files change frequently, clients will need to download the entirety of the file each time.

服务器存储:服务器上的磁盘空间相当简单 - 您需要足够的空间来保存文件,除此之外几乎没有开销.TFS 不会因为您的存储库包含大文件而变慢.

Server storage: Disk space on the server is fairly straightforward - you'll need enough space to hold the files, there's little overhead beyond that. TFS will not slow down just because your repository contains large files.

如果这些文件被频繁修改,您还需要考虑修订所使用的磁盘空间.TFS 存储文件修订之间的增量" - 即两个版本之间的二进制差异.因此,如果文件的内容在修订之间的变化很小,就像在文本文件的典型用例中一样,存储成本应该很低.但是,如果整个内容都像图像或 DLL 这样的二进制文件一样发生了变化,那么您将需要足够的磁盘空间来存储每个修订版.(当然,您可以销毁以前的修订版以重新获得该空间.)

If these files get modified frequently, you will need to account for the disk space used by the revisions, also. TFS stores "deltas" between file revisions - that is, a binary difference between two versions. So if the file's contents change minimally between revisions as in the typical use case with text files, the storage cost should be inexpensive. However, if the entirety of the contents change as would be typical with binary files like images or DLLs, then you'll need enough disk space to store each revision. (Of course, you can destroy previous revisions in order to regain that space.)

关于 TFS 中的增量的一个说明:为了减少签入时的开销,不会立即计算修订之间的增量,有一个后台增量化"作业每晚运行以计算增量以修剪空间.在那之前,每个修订版都完整地存储在数据库中.因此,如果您有一个非常大的文本文件,并且每天都在进行大量修订,则您的磁盘空间要求需要考虑到这一点.

One note on deltas in TFS: to reduce overhead at check-in time, the deltas between revisions are not computed immediately, there's a background "deltafication" job that runs nightly to compute the deltas to trim space. Until that point, each revision is stored in its entirety in the database. So if you have a very large text file with a lot of revisions happening daily, your disk space requirements will need to take this into account.

客户端存储:客户端还需要有足够的磁盘空间来包含这些文件(尽管仅限于他们下载的修订版.)这可以在您的工作区映射中得到缓解,以便如果不需要大文件,它们会被隐藏起来(或以其他方式不包含在您的工作区中).

Client storage: Clients will need to have enough disk space to contain these files also (although only at the revision that they've downloaded.) This can be mitigated in your workspace mappings such that the large files are cloaked (or otherwise not included in your workspace) if they're not needed.

警告:获取历史版本:如果您发现自己经常请求大文件的历史版本(例如:我想要七个更改集前的 ISO 映像),那么您将要制作服务器应用增量链以返回到该修订版.如果您有多个客户端同时执行此操作,这可能会占用您的内存.

Caveat: Getting Historic Versions: If you find yourself requesting historical versions of large files frequently (for example: I want an ISO image seven changesets ago), then you're going to make the server apply the delta chain to get back to that revision. If you have multiple clients doing this concurrently, this could tax your memory.

这篇关于源代码管理 (TFS) 中的大文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆