youtube api v3 页面令牌 [英] youtube api v3 page tokens

查看:29
本文介绍了youtube api v3 页面令牌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用搜索 API 并使用 nextpagetoken 对结果进行分页.但我无法以这种方式检索所有结果.我只能从大约 455000 个结果中获得 500 个结果.

I'm using the search api and using the nextpagetoken to paginate through the results. But I'm not able to retrieve all the results this way. I'm only able to get 500 results out of approximately 455000 results.

这是获取搜索结果的java代码:

Here's the java code to fetch the search results:

youtube = new YouTube.Builder(Auth.HTTP_TRANSPORT, Auth.JSON_FACTORY, new HttpRequestInitializer() {public void initialize(HttpRequest request) throws IOException {}           }).setApplicationName("youtube-search").build();

YouTube.Search.List search = youtube.search().list("id,snippet");
String apiKey = properties.getProperty("youtube.apikey");
search.setKey(apiKey);
search.setType("video");
search.setMaxResults(50);
search.setQ(queryTerm);
boolean allResultsRead = false;
while (! allResultsRead){
SearchListResponse searchResponse = search.execute();
System.out.println("Printed " +  searchResponse.getPageInfo().getResultsPerPage() + " out of " + searchResponse.getPageInfo().getTotalResults() + ". Current page token: " + search.getPageToken() + "Next page token: " + searchResponse.getNextPageToken() + ". Prev page token" + searchResponse.getPrevPageToken());
if (searchResponse.getNextPageToken() == null)
{
    allResultsRead = true;                          
    search = youtube.search().list("id,snippet");
    search.setKey(apiKey);
    search.setType("video");
    search.setMaxResults(50);
}
else
{
   search.setPageToken(searchResponse.getNextPageToken());
}}

输出为

Printed 50 out of 455085. Current page token: null Next page token: CDIQAA. Prev page token null
Printed 50 out of 454983. Current page token: CDIQAA Next page token: CGQQAA. Prev page token CDIQAQ
Printed 50 out of 455081. Current page token: CGQQAA Next page token: CJYBEAA. Prev page token CGQQAQ
Printed 50 out of 454981. Current page token: CJYBEAA Next page token: CMgBEAA. Prev page token CJYBEAE
Printed 50 out of 455081. Current page token: CMgBEAA Next page token: CPoBEAA. Prev page token CMgBEAE
Printed 50 out of 454981. Current page token: CPoBEAA Next page token: CKwCEAA. Prev page token CPoBEAE
Printed 50 out of 455081. Current page token: CKwCEAA Next page token: CN4CEAA. Prev page token CKwCEAE
Printed 50 out of 454980. Current page token: CN4CEAA Next page token: CJADEAA. Prev page token CN4CEAE
Printed 50 out of 455081. Current page token: CJADEAA Next page token: CMIDEAA. Prev page token CJADEAE
Printed 50 out of 455081. Current page token: CMIDEAA Next page token: null. Prev page token CMIDEAE

while 循环经过 10 次迭代后,它退出,因为下一页标记为空.

After 10 iterations through the while loop, it exits because the next page token is null.

我是 Yotube API 的新手,不确定我在这里做错了什么.我有两个问题:1. 我如何获得所有结果?2. 为什么第 3 页的上一页 token 和第 2 页的当前 token 不一样?

I'm new to the Yotube API and not sure what I'm doing wrong here. I have two questions: 1. How do I get all the results? 2. Why is the previous page token for page 3 not the same as the current token of page 2?

任何帮助将不胜感激.谢谢!

Any help will be appreciated. Thanks!

推荐答案

您正在体验预期的效果;使用 nextPageToken,最多只能得到 500 个结果.如果您对它是如何产生的发展感兴趣,可以通读此线程:

You're experiencing what is intended; using the nextPageToken, you can only get up to 500 results. If you're interested in the development of how this came about, you could read through this thread:

https://code.google.com/p/gdata-issues/issues/detail?id=4282

但作为该主题的总结,它基本上归结为这样一个事实,即 YouTube 上有如此多的数据,搜索算法与大多数人认为的完全不同.这不仅仅是在字段中进行简单的数据库搜索,而且正在处理数量惊人的信号以使结果相关,并且在大约 500 个结果之后,算法开始失去使结果有价值的能力.

But as a summary of that thread, it basically comes down to the fact that, with so much data on YouTube, the search algorithms are radically different than most people think they are. This isn't just just doing simple database searching for content in fields, but there are an incredible number of signals that are being processed to make the results relevant, and after about 500 results the algorithms start to lose the ability to make the results worthwhile.

帮助我解决这个问题的一件事是意识到,当 YouTube 谈论搜索时,他们谈论的是概率而不是匹配,因此结果会根据您的参数根据它们出现的可能性进行排序与您的查询相关.然后,当您进行分页时,您最终会达到一个点,从统计上讲,相关性的概率足够低,以至于在计算上不值得让这些结果返回.所以 500 是决定的限制.

One thing that has helped me wrap my mind around this is to realize that when YouTube talks about search, they are talking about probability rather than matching, so the results are ordered, based on your parameters, in terms of their likelihood to be relevant to your query. As you paginate through, then, you eventually reach a point where, statistically speaking, the probability of relevance is low enough that it isn't computationally worth it to allow those results to come back. So 500 is the decided upon limit.

(另请注意,结果"的数量不是匹配的近似值,而是潜在匹配的近似值,但是当您开始检索它们时,许多可能的匹配被丢弃,因为根本不相关......所以这个数字并不真正代表人们认为它的作用.谷歌搜索也是如此.)

(Also note that the number of "results" isn't an approximation of matches, it's an approximation of potential matches, but then as you start to retrieve them many of those possible matches get cast aside as not being relevant at all ... so that number doesn't really mean what people think it does. Google search is the same way.)

您可能想知道为什么 YouTube 搜索以这种方式运行,而不是进行更传统的字符串/数据匹配;有如此多的搜索量,如果他们真的要对每个查询的所有数据进行完整的搜索,那么您一次要等待几分钟,甚至更多.这真的是一个技术奇迹,如果你仔细想想,当算法在预测、概率等方面发挥作用时,它们如何能够为前 500 个案例获得如此相关的结果.

You might wonder why YouTube search functions in this way rather than doing more traditional string/data matching; with so much search volume, if they were to actually do a complete search of all the data for every query, you'd be waiting minutes at a time if not more. It's really a technical marvel, if you think about it, how the algorithms are able to get such relevant results for the top 500 cases when they're functioning on prediction, probability, and such.

关于您的第二个问题,页面标记不代表一组唯一的结果,而是代表一种算法状态,因此是指向您的查询、查询进度和查询方向的指针...因此,例如,迭代 3 被迭代 2 的 nextPageToken 和迭代 4 的 prevPageToken 引用,但这两个标记略有不同,因此它们可以指示它们来自的方向.

As to your second question, the page tokens don't represent a unique set of results but instead represent a sort of algorithmic state, and are thus pointers to your query, the progress of the query, and the direction of the query ... so iteration 3, for example, is referenced by both the nextPageToken of iteration 2 and the prevPageToken of iteration 4, but those two tokens are slightly different so they can indicate the direction they came from.

这篇关于youtube api v3 页面令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆