如何使用XMLHttpRequest在后台下载HTML页面并从中提取文本元素？ [英] How to use XMLHttpRequest to download an HTML page in the background and extract a text element from it?

查看：350 发布时间：2019/1/25 18:29:22 javascript xmlhttprequest cross-domain greasemonkey tampermonkey

本文介绍了如何使用XMLHttpRequest在后台下载HTML页面并从中提取文本元素？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想创建一个Greasemonkey脚本，当你在URL_1中时，脚本会在后台解析URL_2的整个HTML网页，以便从中提取文本元素。

I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.

具体来说，我想在后台下载整个页面的HTML代码（烂番茄页面）并将其存储在变量中，然后使用 getElementsByClassName [0] 以便从类名为critic_consensus的元素中提取我想要的文本。

To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0] in order to extract the text I want from the element with class name "critic_consensus".

我在MDN中发现了这个： HTMLH in XMLHttpRequest 所以，我最终在这个不幸的非工作代码中结束了：

I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:

var xhr = new XMLHttpRequest();
xhr.onload = function() {
  alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "http://www.rottentomatoes.com/m/godfather/",true);
xhr.responseType = "document";
xhr.send();

当我在Firefox Scratchpad中运行它时显示此错误消息：

It shows this error message when I run it in Firefox Scratchpad:

阻止跨源请求：同源策略不允许在远程资源/ godfather /rel =nofollow> http://www.rottentomatoes.com/m/godfather/ 。
这可以通过将资源移动到同一域或
启用CORS来修复。

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://www.rottentomatoes.com/m/godfather/. This can be fixed by moving the resource to the same domain or enabling CORS.

PS。我不使用烂番茄API的原因是他们已经删除了批评者的共识。

推荐答案

对于跨源请求，其中获取的网站没有帮助设置许可 CORS政策，Greasemonkey提供 GM_xmlhttpRequest（）功能。（大多数其他用户脚本引擎也提供此功能。）

For cross-origin requests, where the fetched site has not helpfully set a permissive CORS policy, Greasemonkey provides the GM_xmlhttpRequest() function. (Most other userscript engines also provide this function.)

GM_xmlhttpRequest 明确设计为允许跨源请求。

GM_xmlhttpRequest is expressly designed to allow cross-origin requests.

要获取目标信息，请在结果上创建 DOMParser 。不要使用jQuery方法，因为这会导致无关的图像，脚本和对象加载，减慢速度或崩溃页面。

To get your target information create a DOMParser on the result. Do not use jQuery methods as this will cause extraneous images, scripts and objects to load, slowing things down, or crashing the page.

这是完整的脚本说明了这个过程：

// ==UserScript==
// @name        _Parse Ajax Response for specific nodes
// @include     http://stackoverflow.com/questions/*
// @require     http://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js
// @grant       GM_xmlhttpRequest
// ==/UserScript==

GM_xmlhttpRequest ( {
    method: "GET",
    url:    "http://www.rottentomatoes.com/m/godfather/",
    onload: function (response) {
        var parser  = new DOMParser ();
        /* IMPORTANT!
            1) For Chrome, see
            https://developer.mozilla.org/en-US/docs/Web/API/DOMParser#DOMParser_HTML_extension_for_other_browsers
            for a work-around.

            2) jQuery.parseHTML() and similar are bad because it causes images, etc., to be loaded.
        */
        var doc         = parser.parseFromString (response.responseText, "text/html");
        var criticTxt   = doc.getElementsByClassName ("critic_consensus")[0].textContent;

        $("body").prepend ('<h1>' + criticTxt + '</h1>');
    },
    onerror: function (e) {
        console.error ('**** error ', e);
    },
    onabort: function (e) {
        console.error ('**** abort ', e);
    },
    ontimeout: function (e) {
        console.error ('**** timeout ', e);
    }
} );

这篇关于如何使用XMLHttpRequest在后台下载HTML页面并从中提取文本元素？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用XMLHttpRequest在后台下载HTML页面并从中提取文本元素？ [英] How to use XMLHttpRequest to download an HTML page in the background and extract a text element from it?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何使用XMLHttpRequest在后台下载HTML页面并从中提取文本元素？ [英] How to use XMLHttpRequest to download an HTML page in the background and extract a text element from it?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭