Chrome扩展程序 - 在后台从网站的单独页面获取HTML [英] Chrome extension - Get html from a separate page of a website in the background

查看:1129
本文介绍了Chrome扩展程序 - 在后台从网站的单独页面获取HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一个扩展程序,可以跟踪漫画网站上的漫画人物,并在他们的收藏夹页面列出他们上一次阅读的章节。而且我最近提出了一个有用的功能来使扩展更好一点。我想让用户选择只能追踪他们在网站上收藏的漫画。所以当他们正在阅读时,扩展会不断地在后台检查它是否在他们的收藏夹中,如果是,那么保存它,如果不是则不保存。



该网站有一个收藏夹页面,其中包含一个人收到的所有漫画列表。我希望能够在用户隐藏的背景中不断获取该页面上列出的每个漫画的名称。



所以我的问题是,有没有办法在后台抓取特定页面的html,并不断获取特定数据(如某些元素的文本)以保存到数组中,而用户不必实际位于收藏夹页面上?

编辑:
解决方案

  var barray = []; 
函数getbm(回调){
var xhr = new XMLHttpRequest();
xhr.onreadystatechange = function(data){
if(xhr.readyState == 4){
if(xhr.status == 200){
var data = xhr。 responseText的;
回调(数据);
} else {
callback(null);
}
}
}
var url ='http://mangafox.me/bookmark/index.php?status=all';
xhr.open('GET',url,true);
xhr.send();
};
function res(data){
var parsed = $ .parseHTML(data);
parsed = $('< div />').append(parsed);
parsed.find('h2.title')。each(function(){
var bmanga = $(this).children('a.title')。text();
barray.push({manga:bmanga});
});
chrome.storage.local.set({'bData':barray})
};
getbm(res);


解决方案



如果页面是静态的(HTTP响应包含您需要的数据),然后通过 > XMLHttpRequest 是要走的路。

如果页面是动态的(最初没有数据,并且页面上的JavaScript会查询服务器来填充它),那么XHR路由将不起作用。您可以尝试观察该网页发出的网络请求并复制它们。



值得注意的是:尽管不太可能,但请检查网站是否有公共API。这将为您节省逆向工程的工作量,并让您避免自动化数据刮擦的灰色地带。




此外,请参阅如果您可以通过某种方式从您正在追踪的页面中检查该项目是否受欢迎。这将比刮另一页更容易。


I have made an extension that will track what manga a person reads on a manga site and list what chapter they last read for it in their favorites page. And I've recently come up with a useful feature to make the extension a little bit better. I would like to give the user the option to be able to track only manga that they have Favorited on the site. So as they are reading, the extension will constantly check in the background if it is in their favorites and if so then save it and if not don't save it.

The website has a favorites page that holds a list of all of the manga a person has Favorited. I would like to be able to constantly grab the names of each manga listed on that page in the background hidden from the user.

So my question is, is there any way to grab the html of a specific page in the background and constantly grab specific data such as text of certain elements to save to an array, without the user having to actually be on the favorites page?

Edit: Solution

var barray = [];
function getbm(callback) {
    var xhr = new XMLHttpRequest();
    xhr.onreadystatechange = function(data) {
        if (xhr.readyState == 4) {
            if (xhr.status == 200) {
                var data = xhr.responseText;
                callback(data);
            } else {
                callback(null);
            }
        }
    }
    var url = 'http://mangafox.me/bookmark/index.php?status=all';
    xhr.open('GET', url, true);
    xhr.send();
};
function res(data) {
    var parsed  = $.parseHTML(data);
    parsed = $('<div />').append(parsed);
    parsed.find('h2.title').each(function(){
        var bmanga = $(this).children('a.title').text();
        barray.push({"manga": bmanga});
    });
    chrome.storage.local.set({'bData': barray})
};
getbm(res);

解决方案

It heavily depends on how the page in question is constructed.

If the page is static (HTTP response includes the data you need), then scraping the page via XMLHttpRequest is the way to go.

If the page is dynamic (no data initially, and JavaScript on the page then queries the server to fill it), then XHR route will not work. You can try to observe network requests made by that page and replicate them.

Of note: while it's unlikely, check if the site has a public API. That will save you the reverse-engineering efforts and lets you avoid the grey area of automated data scraping.


Also, see if you can somehow check from the page you're normally tracking if the item is favourited or not. It will be easier than scraping another page.

这篇关于Chrome扩展程序 - 在后台从网站的单独页面获取HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆