通过AJAX加载SPA网页 [英] Load a SPA webpage via AJAX

查看:186
本文介绍了通过AJAX加载SPA网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过插入URL来使用JavaScript获取整个网页。但是,该网站构建为单页应用程序(SPA),使用JavaScript / 或到 eval()每个脚本标签的内容,但不是看起来足够强大,可以实际加载页面:

  jQuery.get(url,function(data) {
var $ page = $(< div>)。html(data)
$ page.find(script)。each(function(){
var scriptContent = $(this).html(); //获取此标记的内容
eval(scriptContent); //执行内容
});
console.log(%c✖ :,color:red;,$ page.find(。page-title)。text()。trim());
console.log(%c✔:,color:绿色;,$ page.find(footer .details)。tex 。T()修剪());
});

问:完全加载可以通过JavaScript报废的网页的任何选项?

解决方案

您永远无法完全复制任意(SPA)页面确实。



我看到的唯一方法就是使用无头浏览器,例如 PhantomJS 无头Chrome 无头火狐



我想尝试无头Chrome,所以让我们看看它能对你的页面做些什么:



使用内部REPL进行快速检查



使用Chrome Headless加载该页面(在Mac / Linux上需要Chrome 59,在Windows上需要Chrome 60),并使用REPL中的JavaScript查找页面标题:

 %chrome --headless --disable-gpu --repl https://connect.garmin.com/modern/activity/1915361012 
[0830 / 171405.025582:INFO:headless_shell。 cc(303)]键入要评估的Javascript表达式或退出退出。
>>> $('body')。find('。page-title')。text()。trim()
{result:{type:string,value:每日英里 - 第2轮 - 第27天}}

注意:获得 chrome 在Mac上运行的命令行我事先这样做了:

  alias chrome ='/ Applications / Google Chrome .app / Contents / MacOS / Google Chrome'



以编程方式使用Node& Puppeteer




Puppeteer 是一个Node库(由Google Chrome开发人员提供),它提供了一个高级API,可通过DevTools协议控制无头Chrome。它也可以配置为使用完整(非无头)Chrome。


(步骤0:安装节点& 纱线如果你没有它们)



在新目录中:

  yarn init 
yarn add puppeteer

创建 index.js with this:

  const puppeteer = require('puppeteer'); 
(async()=> {
const url ='https://connect.garmin.com/modern/activity/1915361012';
const browser = await puppeteer.launch() ;
const page = await browser.newPage();
//转到URL并等待页面加载
await page.goto(url,{waitUntil:'networkidle'});
//等待结果显示
await page.waitForSelector('。page-title');
//从页面中提取结果
const text = await page.evaluate(()=> {
const title = document.querySelector('。page-title');
return title.innerText.trim();
});
console.log(`Found:$ {text}`);
browser.close();
})();

结果:

  $ node index.js 
找到:每日英里 - 第2轮 - 第27天


I'm trying to fetch an entire webpage using JavaScript by plugging in the URL. However, the website is built as a Single Page Application (SPA) that uses JavaScript / backbone.js to dynamically load most of it's contents after rendering the initial response.

So for example, when I route to the following address:

https://connect.garmin.com/modern/activity/1915361012

And then enter this into the console (after the page has loaded):

var $page = $("html")
console.log("%c✔: ", "color:green;", $page.find(".inline-edit-target.page-title-overflow").text().trim());
console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());

Then I'll get the dynamically loaded activity title as well as the statically loaded page footer:


However, when I try to load the webpage via an AJAX call with either $.get() or .load(), I only get delivered the initial response (the same as the content when over view-source):

view-source:https://connect.garmin.com/modern/activity/1915361012

So if I use either of the the following AJAX calls:

// jQuery.get()
var url = "https://connect.garmin.com/modern/activity/1915361012";
jQuery.get(url,function(data) {
    var $page = $("<div>").html(data)
    console.log("%c✖: ", "color:red;",   $page.find(".page-title").text().trim());
    console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
});

// jQuery.load()
var url = "https://connect.garmin.com/modern/activity/1915361012";
var $page = $("<div>")
$page.load(url, function(data) {
    console.log("%c✖: ", "color:red;",   $page.find(".page-title").text().trim()    );
    console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
});

I'll still get the initial footer, but won't get any of the other page contents:


I've tried the solution here to eval() the contents of every script tag, but that doesn't appear robust enough to actually load the page:

jQuery.get(url,function(data) {
    var $page = $("<div>").html(data)
    $page.find("script").each(function() {
        var scriptContent = $(this).html(); //Grab the content of this tag
        eval(scriptContent); //Execute the content
    });
    console.log("%c✖: ", "color:red;",   $page.find(".page-title").text().trim());
    console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
});

Q: Any options to fully load a webpage that will scrapable over JavaScript?

解决方案

You will never be able to fully replicate by yourself what an arbitrary (SPA) page does.

The only way I see is using a headless browser such as PhantomJS or Headless Chrome, or Headless Firefox.

I wanted to try Headless Chrome so let's see what it can do with your page:

Quick check using internal REPL

Load that page with Chrome Headless (you'll need Chrome 59 on Mac/Linux, Chrome 60 on Windows), and find page title with JavaScript from the REPL:

% chrome --headless --disable-gpu --repl https://connect.garmin.com/modern/activity/1915361012
[0830/171405.025582:INFO:headless_shell.cc(303)] Type a Javascript expression to evaluate or "quit" to exit.
>>> $('body').find('.page-title').text().trim() 
{"result":{"type":"string","value":"Daily Mile - Round 2 - Day 27"}}

NB: to get chrome command line working on a Mac I did this beforehand:

alias chrome="'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'"

Using programmatically with Node & Puppeteer

Puppeteer is a Node library (by Google Chrome developers) which provides a high-level API to control headless Chrome over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome.

(Step 0 : Install Node & Yarn if you don't have them)

In a new directory:

yarn init
yarn add puppeteer

Create index.js with this:

const puppeteer = require('puppeteer');
(async() => {
    const url = 'https://connect.garmin.com/modern/activity/1915361012';
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    // Go to URL and wait for page to load
    await page.goto(url, {waitUntil: 'networkidle'});
    // Wait for the results to show up
    await page.waitForSelector('.page-title');
    // Extract the results from the page
    const text = await page.evaluate(() => {
        const title = document.querySelector('.page-title');
        return title.innerText.trim();
    });
    console.log(`Found: ${text}`);
    browser.close();
})();

Result:

$ node index.js 
Found: Daily Mile - Round 2 - Day 27

这篇关于通过AJAX加载SPA网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆