文本文件的GitHub 100MB文件大小限制有什么好的解决方法吗? [英] Are there any good workarounds to the GitHub 100MB file size limit for text files?

查看:187
本文介绍了文本文件的GitHub 100MB文件大小限制有什么好的解决方法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个190 MB的纯文本文件,我想在github上进行跟踪。



文本文件是我们的文本到语音引擎的发音词典文件。我们经常在文本文件中添加和修改行,并且diffs非常小,所以它对于这种意义上的git来说是完美的。



然而,GitHub有严格的100 MB文件大小限制到位。我已经尝试了GitHub大型文件存储服务,但是每次更改时都会上传整个190 MB文件的新版本 - 所以如果我沿着这条路径快速增长到很多千兆字节。



我希望将文件保存为一个文件而不是将其分开,因为这是我们的工作流程当前的方式,需要一些编码才能允许多个文本文件作为输入/输出在我们的工具中(并且我们没有太多的开发资源)。

我有一个想法是,也许可以设置一些前置和后置处理器,提交钩子自动拆分和连接大文件?这可能吗?



其他想法?



编辑:我知道在StackOverflow中的类似问题中描述的100 MB文件大小限制的问题,但我不认为我的问题是重复的,因为我要求提供比较小且频繁的特定情况(我没有尝试上传一个大的ZIP文件或任何东西)。然而,我的理解是,git-lfs只适用于很少更改的文件,并且正常的git将非常适合我所描述的文件类型;除了GitHub具有文件大小限制之外。



更新:我昨天尝试创建一个小型跨平台程序,用于分割和连接文件使用git钩子进入更小的文件。它有些作品,但并不令人满意。你需要让你的大文本文件被.gitignore排除,这让git不知道它是否已经改变。分割文件最初没有被 git status git commit 检测到,并导致与本文中所述相同的问题问题,这是相当恼人的:前-commit脚本创建mysqldump文件,但是无法提交(工作目录清理)?
设置一个cron作业(linux)和计划任务(windows)以定期自动重新生成分割文件可能会解决这个问题,但自动设置并不容易,可能会导致用户计算机出现性能问题,而且这不是一个非常优雅的解决方案。一些冒险的解决方案,例如动态修改.gitignore也可能是需要的,绝不会得到实际文本文件的差异,只有分割文件(尽管这可能是可以接受的,因为它们会非常相似)。



所以,睡了之后,今天我认为git hook方法毕竟不是一个好的选择,因为它有太多的怪癖。正如@PyRulez所建议的那样,我想我不得不看看GitHub以外的其他服务(不幸的是,因为我喜欢github)。托管的解决方案可能更适合避免管理我们自己的服务器。我还希望它可以公开发布...



更新2 :我已经查看了GitHub的一些替代方法,倾向于使用GitLab。我已经联系了GitHub支持提高100MB限制的可能性,但是如果他们不这样做,我只需切换到GitLab来完成这个特定的项目。

解决方案

清洁和涂抹



您可以使用干净和涂抹来压缩文件。通常情况下,这是没有必要的,因为git会在内部压缩它,但是因为gitHub的行为很奇怪,所以它可能会有所帮助。主要的命令如下所示:

  git config filter.compress.clean gzip 
git config filter.compress.smudge gzip -d

GitHub会将其视为压缩文件,但在每台计算机上,是一个文本文件。



请参阅 https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes 了解更多详情。



或者,您可以将干净的帖子粘贴到在线pastebin中,并从pastebin中进行涂抹,例如 http://pastebin.com/ 。许多其他组合都可以用干净和涂抹。


I have a 190 MB plain text file that I want to track on github.

The text file is a pronounciation lexicon file for our text-to-speech engine. We regularly add and modify lines in the text files, and the diffs are fairly small, so it's perfect for git in that sense.

However, GitHub has a strict 100 MB file size limit in place. I have tried the GitHub Large File Storage service, but that uploads a new version of the entire 190 MB file every time it changes - so that would quickly grow to many gigabytes if I go down that path.

I would like to keep the file as one file instead of splitting it because that's how our workflow is currently and it would require some coding to allow multiple text files as input/output in our tools (and we don't have much development resources).

One idea I've had is that maybe it's possible to set up some pre- and post-commit hooks to split and concatenate the big file automatically? Would that be possible?

Other ideas?

Edit: I am aware of the 100 MB file size limitation described in the similar questions here on StackOverflow, but I don't consider my question a duplicate because I'm asking for the specific case where the diffs are small and frequent (I'm not trying to upload a big ZIP file or anything). However, my understanding is that git-lfs is only appropriate for files that rarely change, and that normal git would be the perfect fit for the kind of file I'm describing; except that GitHub has a file size restriction.

Update: I spent yesterday experimenting with creating a small cross-platform program that splits and joins files into smaller files using git hooks. It kind of works but not really satisfactory. You will need to have your big text file excluded by .gitignore, which makes git unaware about whether or not it has changed. The split files are not initially detected by git status or git commit and leads to the same issue as described in this SO question, which is quite annoying: Pre-commit script creates mysqldump file, but "nothing to commit (working directory clean)"? Setting up a cron job (linux) and scheduled task (windows) to automatically regenerate the split files regularly might fix that, but it's not easy to automatically set up, might cause performance issues on the users computer, and is just not a very elegant solution. Some hacky solutions like dynamically modifying .gitignore might also be needed, and in no way would you get a diff of the actual text files, only the split files (although that might be acceptable as they would be very similar).

So, having slept on it, today I think the git hook approach is not a good option after all as it has too many quirks. As has been suggested by @PyRulez, I think I'll have to look at other services than GitHub (unfortunately, since I love github). A hosted solution would be preferable to avoid having to manage our own server. I'd also like it to be publically available...

Update 2: I've looked at some alternatives to GitHub and currently I'm leaning towards using GitLab. I've contacted GitHub support about the possibility of raising the 100MB limit, but if they won't do that I'll just switch to GitLab for this particular project.

解决方案

Clean and Smudge

You can use clean and smudge to compress your file. Normally, this isn't necessary, since git will compress it internally, but since gitHub is acting weird, it may help. The main commands would be like:

git config filter.compress.clean gzip
git config filter.compress.smudge gzip -d

GitHub will see this as a compressed file, but on each computer, it will appear to be a text file.

See https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes for more details.

Alternatively, you could have clean post to an online pastebin, and smudge fetch from the pastebin, such as http://pastebin.com/. Many other combinations are possible with clean and smudge.

这篇关于文本文件的GitHub 100MB文件大小限制有什么好的解决方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆