github 是如何判断一个项目的语言的? [英] How does github figure out a project's language?
问题描述
我最近在用 JavaScript 和 C++ 开发一个 github 项目,并注意到 github 将项目标记为 C++.如果您必须选择一种语言,这可能是正确的命名,因为 C++ 代码被编译为 JavaScript 库,但这让我想知道……github 是如何确定为每个项目标记哪种语言的?
2013 年 4 月更新,来自 nuclearsandwich(GitHub 支持)团队或supportocat"):
帮助页面 "我的存储库被标记为错误的语言"提到现在使用 linguist 库 来确定语法高亮和 repo 统计信息的文件语言.语言学家将从统计中排除某些文件名和路径,排除某些供应商文件和目录.
帮助页面 "为什么我最喜欢的语言没有被识别?"补充:
如果您想要的语言没有收到语法高亮显示,您可以贡献给语言学家库来添加它.
(原答案,2012 年 10 月)
此 GitHub 支持上的主题 对此进行了解释:><块引用>
它只是总结了每个扩展名的文件大小.最大的一个获胜".
我们希望避免打开文件并解析它们的内容,因为两者都会减慢进程……但这可能是解决此类冲突的唯一方法.
由于这不是 100% 准确,因此导致一些人补充:
<块引用>对于猜测错误的情况,我也会投票支持一个简单的手动覆盖开关.
注意:正如 Mark Rushakoff 在 他的回答(赞成),从那以后随着语言学家项目(2011 年 6 月开源).
您可以看到仍然存在问题:GitHub 语言学家问题.
请参阅此处了解更多详情:
一旦检测到语言,就会将其传递给Albino,一个Pygments 包装器,它执行实际的语法突出显示.
并且您可以在 .gitattributes 文件中添加语言学指令.
I was recently working on a github project in both JavaScript and C++, and noticed that github tagged the project as C++. If you have to pick a single language, this is probably the correct designation since the C++ code is compiled as a JavaScript library, but this made me wonder... how does github figure out what language to tag each project?
Update April 2013, by nuclearsandwich (GitHub support team or "supportocat"):
the help page "My repository is marked as the wrong language" mentions using now the linguist library to determine file language for syntax highlighting and repo statistics. Linguist will exclude certain file names and paths from statistic, excluding certain vendor files and directories.
the help page "Why isn't my favorite language recognized?" adds:
If your desired language is not receiving syntax highlighting you can contribute to the Linguist library to add it.
(Original answer, Oct. 2012)
This thread on GitHub support explains it:
It just sums up file sizes for each extension. Largest one "wins".
We'd like to avoid opening files up and parsing their content, as both would slow down the process... but that might be the only method of resolving conflicts like this one.
Since this is not 100% accurate, that had lead some to add:
I, too, would vote for a simple manual-override switch for the cases where the guess is wrong.
Note: as Mark Rushakoff mentions in his answer (upvoted), the guessing got better since then with the linguist project (open-sourced from June 2011).
You can see there are still issues though: GitHub Linguist Issues.
See here for more details:
Once the language has been detected, it is passed to Albino, a Pygments wrapper, which does the actual syntax highlighting.
And you can add linguist directives in a .gitattributes file.
这篇关于github 是如何判断一个项目的语言的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!