如何使用 Jsoup (Java) 检索 Youtube 的自动完成结果? [英] How do I retrieve Youtube's autocomplete results using Jsoup (Java)?

查看:32
本文介绍了如何使用 Jsoup (Java) 检索 Youtube 的自动完成结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如图所示,我想使用 Jsoup 检索自动完成搜索结果.我已经在使用视频 ID 检索视频网址、视频标题和缩略图,但我一直无法从搜索结果中检索它们.

As shown in this image I want to retrieve autocomplete search results using Jsoup. I'm already retrieving the video URL, video title and thumbnail using the video id, but I am stuck at retrieving them from the search results.

我必须在不使用 Youtube 的 Data Api 并且只使用 Jsoup 的情况下完成此操作.

I have to complete this without using Youtube's Data Api and only using Jsoup.

任何可以为我指明正确方向的建议将不胜感激.

Any suggestions that can point me in the right direction would be appreciated.

推荐答案

搜索结果是通过 JavaScript 动态生成的.这意味着它们无法被 Jsoup 解析,因为 Jsoup 只能看到"页面中嵌入的静态代码.但是,我们可以直接从 API 中获取结果.

The search results are generated dynamically, via JavaScript. That means that they can not be parsed by Jsoup, because Jsoup only "sees" the static code embedded in the page. However, we can get the results directly from the API.

YouTube 的自动完成搜索结果来自网络服务(由 Google 提供).每次我们在搜索栏中添加一个字母时,都会在后台向该服务发出请求,并在页面上呈现响应.我们可以通过浏览器的开发者工具发现这些 API.例如,我通过以下过程找到了这个 API:

YouTube's autocomplete search results are aquired from a web service (provided by Google). Every time we add a letter in the search bar, in the background, a request is made to that service and the response is rendered on the page. We can discover such APIs with the Developer Tools of a browser. For example, I found this API with the following procedure:

  • 在浏览器中打开 YouTube.
  • 打开开发者控制台.(Ctrl + Shift + I).
  • 转到网络标签.在这里,我们可以找到有关浏览器与网络服务器连接的详细信息.
  • 在 YouTube 的搜索栏中添加一个字母.此时,我们可以看到对 https://clients1.google.com/complete/search 的新 GET 请求.
  • 点击该请求并转到右侧的框,在那里我们可以更仔细地检查请求-响应.在 Headers 选项卡中,我们看到 URL 包含我们的搜索查询;在 Response 选项卡中,响应正文包含自动完成结果.
  • Open YouTube in a browser.
  • Open the Developer Console. (Ctrl + Shift + I).
  • Go to the Network tab. Here we can find detailed information about our browser's connections to web-servers.
  • Add a letter in YouTube's search bar. At this point, we can see a new GET request to https://clients1.google.com/complete/search.
  • Click on that request and go to the box on the right, where we can examine the request-response more carefully. In the Headers tab, we see that the URL contains our search query; in the Response tab, the response body contains the autocomplete results.

响应是一个 JavaScript 片段,其中包含我们在数组中的数据,并且可以使用正则表达式进行解析.Jsoup 可以用于 HTTP 请求,但任何 HTTP 客户端都可以.

The response is a JavaScript snippet that contains our data in an array, and it can be parsed with Regular expressions. Jsoup can be used for the HTTP request, but any HTTP client will do.

public static ArrayList<String> autocompleteResults(String query) 
        throws IOException, UnsupportedEncodingException, PatternSyntaxException {
    String url = "https://clients1.google.com/complete/search?client=youtube&hl=en&gs_rn=64&gs_ri=youtube&ds=yt&cp=10&gs_id=b2&q=";
    String re = "\\[\"(.*?)\",";

    Response resp = Jsoup.connect(url + URLEncoder.encode(query, "UTF-8")).execute();
    Matcher match = Pattern.compile(re, Pattern.DOTALL).matcher(resp.body());

    ArrayList<String> data = new ArrayList<String>();
    while (match.find()) {
        data.add(match.group(1));
    }
    return data;
}

所提供的代码是在 VScode、Java8、Windows 上创建和测试的,但它也应该适用于 Android Studio.

The code provided was created and tested on VScode, Java8, Windows, but it should also work on Android Studio.

这篇关于如何使用 Jsoup (Java) 检索 Youtube 的自动完成结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆