获取远程Git存储库中前N个提交的元数据 [英] Get the metadata for the first N commits in a remote Git repository
问题描述
使用以下GitHub API,可以从最新到最旧的顺序获取存储库中提交的元数据
Using the following GitHub API it is possible to get the metadata for the commits in a repository, ordered from the latest to the oldest
https://api.github.com/repos/git/git/commits
是否有一种方法可以获取相似的元数据,但以提交的时间顺序相反,即从存储库中最早的提交开始?
Is there a way to obtain similar metadata but in the reverse chronological order of commits, that is, starting with the oldest commits in the repository?
注意:我想获取此类元数据而不必下载完整的存储库.
NOTE: I want to obtain such metadata without having to download the full repository.
谢谢
推荐答案
使用 GraphQL的变通办法可以实现API .此方法与在存储库中获取第一个提交
获取最后提交,然后返回totalCount
和endCursor
:
{
repository(name: "linux", owner: "torvalds") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 1) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
它为光标和pageInfo
对象返回类似的内容:
It returns something like that for the cursor and pageInfo
object :
"totalCount": 950329,
"pageInfo": {
"endCursor": "b961f8dc8976c091180839f4483d67b7c2ca2578 0"
}
我没有有关游标字符串格式b961f8dc8976c091180839f4483d67b7c2ca2578 0
的任何资料,但是我已经对其他一些存储库进行了超过1000次提交的测试,看来它总是像这样格式化:
I don't have any source about the cursor string format b961f8dc8976c091180839f4483d67b7c2ca2578 0
but I've tested with some other repository with more than 1000 commits and it seems that it's always formatted like:
<static hash> <incremented_number>
为了从第一次提交迭代到最新一次,您将需要从totalCount - 1 - <number_perpage>*<page>
开始,从第1页开始:
In order to iterate from the first commit to the newest, you will need to start from totalCount - 1 - <number_perpage>*<page>
starting from page 1:
例如,为了从linux系统信息库中获取前20次提交:
For example in order to get the first 20 commits from the linux repository :
{
repository(name: "linux", owner: "torvalds") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(first: 20, after: "fc4f28bb3daf3265d6bc5f73b497306985bb23ab 950308") {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
请注意,此回购中的总提交计数随时间变化,因此您需要在运行查询之前获取总计数值.
Note that this total commit count change over time in this repo, so you need to get the total count value before running the query.
这是一个 python 的示例,其中第一个Linux仓库的300次提交(从最旧的开始):
Here is a python example iterating the first 300 commits of the Linux repository (starting from the oldest):
import requests
token = "YOUR_ACCESS_TOKEN"
name = "linux"
owner = "torvalds"
branch = "master"
iteration = 3
per_page = 100
commits = []
query = """
query ($name: String!, $owner: String!, $branch: String!){
repository(name: $name, owner: $owner) {
ref(qualifiedName: $branch) {
target {
... on Commit {
history(first: %s, after: %s) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
totalCount
pageInfo {
endCursor
}
}
}
}
}
}
}
"""
def getHistory(cursor):
r = requests.post("https://api.github.com/graphql",
headers = {
"Authorization": f"Bearer {token}"
},
json = {
"query": query % (per_page, cursor),
"variables": {
"name": name,
"owner": owner,
"branch": branch
}
})
return r.json()["data"]["repository"]["ref"]["target"]["history"]
#in the first request, cursor is null
history = getHistory("null")
totalCount = history["totalCount"]
if (totalCount > 1):
cursor = history["pageInfo"]["endCursor"].split(" ")
for i in range(1, iteration + 1):
cursor[1] = str(totalCount - 1 - i*per_page)
history = getHistory(f"\"{' '.join(cursor)}\"")
commits += history["nodes"][::-1]
else:
commits = history["nodes"]
print(commits)
这篇关于获取远程Git存储库中前N个提交的元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!