将遗留代码库从cvs传输到分布式存储库(例如git或mercurial)。初始仓库设计所需的建议 [英] Transferring legacy code base from cvs to distributed repository (e.g. git or mercurial). Suggestions needed for initial repository design

查看:122
本文介绍了将遗留代码库从cvs传输到分布式存储库(例如git或mercurial)。初始仓库设计所需的建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简介和背景



我们正在改变源代码管理系统,我们正在评估git和mercurial。总代码库是大约600万行代码,所以不是很大,也不是很小。



让我首先开始一个很简单的介绍,如何当前存储库设计看起来。



我们为完整的代码库提供了一个基本文件夹,在该级别之下,在多个不同的上下文中使用了各种模块。例如,dllproject1和dllproject2可以作为完全独立的项目来查看。



我们正在开发的软件是我们称之为配置器,可以定制化满足不同客户的需求。总共我们可能有50个不同的版本。但是,他们有一个共同点。他们都共享一些强制性模块(mandatory_module1 ..)。这些文件夹基本上包含内核/核心代码和通用语言资源等。然后,所有的定制都可以是其他模块(module1 ..)之间的任何组合。



由于我们目前是使用cvs,我们在CVSROOT / modules文件中添加了别名。他们可能看起来像:

  core -a mandatory_module1 mandatory_module2 mandatory_module3 
project_x -a module1 module3 module5 core

所以如果有人决定在project_x上工作,他/她可以快速检查所需的模块:

  base> cvs co project_x 



问题



直观地,将基本文件夹作为单个存储库感到错误。作为程序员,您应该能够查看正在使用的当前项目所需的确切代码子集。你有什么想法?



另一方面,将这些模块的每个都放在单独的存储库中感觉更为正确。但是,这使程序员难以查看他们需要的模块。你应该可以通过一个命令来做到这一点。所以我的问题是:在git / mercurial中是否有类似的方法定义别名?



任何其他问题,建议,指针都非常受欢迎!



PS。我已经搜索过类似的问题,但没有觉得任何一个都适用于我的情况。

解决方案

评论提醒你:




  • 这些迁移通常提供重组资源的机会,而不是模块(每个都有一个存储库),而是沿着功能域拆分(将同一给定功能域的几个模块放在同一个存储库中)。



然后子模块,作为定义配置






[...] CVS,也就是说,它真的最终导致一个一个文件
一次的模式。



哪个是好的,你可以有一百万个文件,然后只检查
中的几个 - 你永远不会看到其他
999,995文件的影响。 / p>

Git
根本从来没有真正看起来比整个回购。即使你
限制了一些事情(即检出一部分,或者历史记录到
回来一点),git最终仍然总是关心整个事情,
和携带知识。



所以,如果强制将其视为一个
巨大的存储库,那么git将会非常严重。我不认为这部分是真的可以修复,虽然我们
可能会改善它。



是的,那么有大文件的问题。我真的不知道什么到
做巨大的文件。我们吮吸他们,我知道。







上述两点提倡为大型系统提供更多面向组件的方法和大型旧版存储库)。



使用 Git子模块,您可以在项目中检出它们(即使是两步过程)。但是,您可以使用工具,使子模块管理更容易( git .rake )。







当我想修复在几个项目之间共享的模块中的错误,我只是修复错误并提交它,并且只是执行更新。


那就是我在供应商分行中作为系统方法所描述的内容:大家对于最新的(HEAD)工作,对少量项目是有效的。

对于大量的模块,模块的概念仍然非常有用,但它的管理是与DVCS不一样:




  • 对于紧密相关的模块(又称在同一功能域,如所有模块相关到PNL - 利润a Nd损失或金融领域的风险分析),您需要使用所涉及的所有组件的最新(HEAD)。

    这将通过使用子树策略,不是为了你在其他子模块上发布(推出)更正,但是跟踪其他团队完成的作品。

    Git允许使用额外的奖金,这种跟踪不必在您的存储库和一个中央存储库,但也可以发生在您和另一个团队的本地存储库之间,允许在类似性质的项目之间进行非常快速的前后整合和测试。


  • 但是,对于不直接在功能域中的模块,子模块是一个更好的选择,因为它们是指模块的修订版本(提交):

    当一个低级框架发生变化时,你不想要它要立即传播,因为它会影响所有其他团队,然后他们将不得不放弃他们正在做的事情,以便将他们的代码调整到新版本(你希望尽管所有其他团队都是意识到这个新版本,以便他们不要忘记更新低级组件或模块)。

    这使您只能使用官方稳定的其他模块的标准版本,而不是潜在的未稳定或未完全测试的HEAD。



Introduction and Background

We are in the process of changing source control system and we are currently evaluating git and mercurial. The total code base is around 6 million lines of code, so not massive and not really small either.

Let me first start off with a very brief introduction to how the current repository design looks.

We have one base folder for the complete code base, and beneath that level there are all sorts modules used in several different contexts. For example "dllproject1" and "dllproject2" can be looked at as completely separate projects.

The software we are developing is something we call a configurator, which can be customized endlessly for different customer needs. At total we probably have 50 different versions of them. However, they have one thing in common. They all share a couple of mandatory modules (mandatory_module1 ..). These folders basically contain kernel/core code and common language resources etc. All customizations can then be any combination between the other modules (module1 ..).

Since we currently are using cvs we've added aliases in the CVSROOT/modules file. They might look something like:

core –a mandatory_module1 mandatory_module2 mandatory_module3
project_x –a module1 module3 module5 core

So if someone decides to work on project_x, he/she can quickly checkout the modules needed by:

base>cvs co project_x

Questions

Intuitively it just feels wrong to have the base folder as a single repository. As a programmer you should be able to check out the exact code sub set needed for the current project you are working with. What are your thoughts on this?

On the other hand it feels more right to have each of these modules in separate repositories. But this makes it harder for programmers to check out the modules that they need. You should be able to do this by a single command. So my question is: Are there similar ways of defining aliases in git/mercurial?

Any other questions, suggestions, pointers are highly welcome!

PS. I have searched for similar questions but didn’t feel that any of them applied 100% to my situation.

解决方案

Just a quick comment to remind you that:

  • those migrations often offer the opportunity to reorganize the sources, not along modules (each with one repositories) but rather along a functional domain split (several modules for a same given functional domain being put in the same repository).

Then submodules are to be used, as a way to define a configuration.

[...] CVS, ie it really ends up being pretty much oriented to a "one file at a time" model.

Which is nice in that you can have a million files, and then only check out a few of them - you'll never even see the impact of the other 999,995 files.

Git fundamentally never really looks at less than the whole repo. Even if you limit things a bit (ie check out just a portion, or have the history go back just a bit), git ends up still always caring about the whole thing, and carrying the knowledge around.

So git scales really badly if you force it to look at everything as one huge repository. I don't think that part is really fixable, although we can probably improve on it.

And yes, then there's the "big file" issues. I really don't know what to do about huge files. We suck at them, I know.


Those two aforementioned points advocate for a more component-oriented approach for large system (and large legacy repository).

With Git submodule, you can checkout them in your project (even if it is a two-steps process). You have however tools than can make the submodule management easier (git.rake for instance).


When I'm thinking of fixing a bug in a module that's shared between several projects, I just fix the bug and commit it and all just do their updates

That is what I describe in the post Vendor Branch as the "system approach": everyone works on the latest (HEAD) of everything, and it is effective for small number of projects.
For a large number of modules though, the notion of "module" is still very useful, but its management is not the same with DVCS:

  • for closely related modules (aka "in the same functional domain", like "all modules related to PNL - Profit aNd Losses - or "Risk analysis", in a financial domain), you do need to work with the latest (HEAD) of all components involved.
    That would be achieved with the use of a subtree strategy, not in order for you to publish (push) corrections on those other submodules, but to track works done by other teams.
    Git allows that with the extra-bonus that this "tracking" does not have to take place between your repository and one "central" repository, but can also take place between you and the local repository of the other team, allowing for a very quick back-and-forth integration and testing between projects of similar nature.

  • however, for modules which are not directly in your functional domain, submodules are a better option, because they refer to a fix version of a module (a commit):
    when a low-level framework changes, you do not want it to be propagated instantaneously, since it would impact all the other teams, which would then have to drop what they were doing to adapt their code to that new version (you do want though all the other teams to be aware of this new version, in order for them to not forget to update that low-level component or "module").
    That allows you to work only with official stable identified versions of other modules, and not potentially un-stabled or not fully tested HEADs.

这篇关于将遗留代码库从cvs传输到分布式存储库(例如git或mercurial)。初始仓库设计所需的建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆