如何查找过去约会的100个最大的GitHub存储库? [英] How to find the 100 largest GitHub repositories for a past date?

查看:116
本文介绍了如何查找过去约会的100个最大的GitHub存储库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解GitHub上最大的100个知识库的演变。使用GitHub搜索功能或GithubArchive.org,我可以轻松访问当今最大的100个存储库(按贡献者,星号,分支或LOC的总数量度量)。然而,我想看看历史上给定数据(比如说2011年4月1日)的100个最大的数据库,以便我可以跟踪它们的增长情况(或下降)。我怎样才能确定GitHub上100个最大的仓库(以星星,叉子或LOC为单位)过去的某个日期?

解决方案

我认为GitHub存档项目可以提供帮助: http://www.githubarchive.org/

它存储来自GitHub时间轴的所有公共事件并将它们公开以供处理。这些事件包含有关存储库的信息,所以你应该能够将数据从那里拉出以适合你的用例。

例如,我刚刚使用过BigQuery控制台中的以下查询( https://bigquery.cloud.google.com/?pli = 1 )来查找2012-03-15日期的joyent / node存储库的分支数量:

  SELECT repository_forks,created_at FROM [publicdata:samples.github_timeline] WHERE(repository_url =https://github.com/joyent/node)AND(created_at CONTAINS2012-03-15)LIMIT 1 

在这里是结果:

 排列的货叉created_at 
1 1579 2012-03-15 07:49:54

明显地,你会使用BigQuery API来做类似的事情(提取你想要的数据,获取一系列日期的数据等)。

这里是一个查询获取单个最大的存储库(通过分叉)给定日期:

  SELECT repository_forks,repository_url FROM [publicdata:samples.github_timeline] WHERE created_at CONTAINS2012-03-15)ORDER BY repository_forks DESC LIMIT 1 

结果:

 排长叉repository_url 
1 6341 https://github.com/octocat/Spoon-Knife

以下是查询以获取特定日期的前100个存储库:

  SELECT MAX(repository_forks)as forks,repository_url FROM [publicdata:samples.github_timeline] WHERE(created_at CONTAINS2012-03-15)GROUP BY repository_url ORDER BY叉子DESC结果:

$ pre $ > 行叉库_url
1 6341 https://github.com/octocat/Spoon-Knife
2 4452 https://github.com/twitter/bootstrap
3 3647 https://github.com/mxcl/home brew
4 2888 https://github.com/rails/rails
...


I am trying to understand the evolution of the 100 largest repositories on GitHub. I can easily access the 100 largest repositories as of today (as measured per total number of contributors, stars, forks or LOC) using the GitHub search function or GithubArchive.org.

However, I would like to look at the 100 largest repositories at a given data in history (say, 1st of April 2011), so that I can track their growth (or decline) from that point on. How can I identify the 100 largest repositories on GitHub (as measured per stars, forks, or LOC) for a date in the past?

解决方案

I think the GitHub archive project can be of help: http://www.githubarchive.org/

It stores all the public events from the GitHub timeline and exposes them for processing. The events contain info about the repositories, so you should be able to pull the data out of there to fit your use-case.

For example, I've just used the following query in the BigQuery console ( https://bigquery.cloud.google.com/?pli=1 ) to find out the number of forks of the joyent/node repository for the date 2012-03-15:

SELECT repository_forks, created_at FROM [publicdata:samples.github_timeline] WHERE (repository_url = "https://github.com/joyent/node") AND (created_at CONTAINS "2012-03-15") LIMIT 1

At here are the results:

Row forks   created_at   
1   1579    2012-03-15 07:49:54  

Obiously, you would use the BigQuery API to do something similar (extract the data you want, fetch data for a range of dates, etc.).

And here is a query for fetching the single largest repository (by forks) for a given date:

SELECT repository_forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") ORDER BY repository_forks DESC LIMIT 1

Result:

Row forks   repository_url   
1   6341    https://github.com/octocat/Spoon-Knife   

And here is the query to fetch the top 100 repositories by forks for a given date:

SELECT MAX(repository_forks) as forks, repository_url FROM [publicdata:samples.github_timeline] WHERE (created_at CONTAINS "2012-03-15") GROUP BY repository_url ORDER BY forks DESC LIMIT 100

Result:

Row forks   repository_url   
1   6341    https://github.com/octocat/Spoon-Knife   
2   4452    https://github.com/twitter/bootstrap     
3   3647    https://github.com/mxcl/homebrew     
4   2888    https://github.com/rails/rails
...

这篇关于如何查找过去约会的100个最大的GitHub存储库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆