通过AJAX加载SPA网页 [英] Load a SPA webpage via AJAX
问题描述
我正在尝试通过插入URL来使用JavaScript获取整个网页。但是,该网站构建为单页应用程序(SPA),使用JavaScript / 或到 eval()
每个脚本
标签的内容,但不是看起来足够强大,可以实际加载页面:
jQuery.get(url,function(data) {
var $ page = $(< div>)。html(data)
$ page.find(script)。each(function(){
var scriptContent = $(this).html(); //获取此标记的内容
eval(scriptContent); //执行内容
});
console.log(%c✖ :,color:red;,$ page.find(。page-title)。text()。trim());
console.log(%c✔:,color:绿色;,$ page.find(footer .details)。tex 。T()修剪());
});
问:完全加载可以通过JavaScript报废的网页的任何选项?
您永远无法完全复制任意(SPA)页面确实。
我看到的唯一方法就是使用无头浏览器,例如 PhantomJS 或无头Chrome 或无头火狐。
我想尝试无头Chrome,所以让我们看看它能对你的页面做些什么:
使用内部REPL进行快速检查
使用Chrome Headless加载该页面(在Mac / Linux上需要Chrome 59,在Windows上需要Chrome 60),并使用REPL中的JavaScript查找页面标题:
%chrome --headless --disable-gpu --repl https://connect.garmin.com/modern/activity/1915361012
[0830 / 171405.025582:INFO:headless_shell。 cc(303)]键入要评估的Javascript表达式或退出退出。
>>> $('body')。find('。page-title')。text()。trim()
{result:{type:string,value:每日英里 - 第2轮 - 第27天}}
注意:获得 chrome
在Mac上运行的命令行我事先这样做了:
alias chrome ='/ Applications / Google Chrome .app / Contents / MacOS / Google Chrome'
以编程方式使用Node& Puppeteer
Puppeteer 是一个Node库(由Google Chrome开发人员提供),它提供了一个高级API,可通过DevTools协议控制无头Chrome。它也可以配置为使用完整(非无头)Chrome。
在新目录中:
yarn init
yarn add puppeteer
创建 index.js
with this:
const puppeteer = require('puppeteer');
(async()=> {
const url ='https://connect.garmin.com/modern/activity/1915361012';
const browser = await puppeteer.launch() ;
const page = await browser.newPage();
//转到URL并等待页面加载
await page.goto(url,{waitUntil:'networkidle'});
//等待结果显示
await page.waitForSelector('。page-title');
//从页面中提取结果
const text = await page.evaluate(()=> {
const title = document.querySelector('。page-title');
return title.innerText.trim();
});
console.log(`Found:$ {text}`);
browser.close();
})();
结果:
$ node index.js
找到:每日英里 - 第2轮 - 第27天
I'm trying to fetch an entire webpage using JavaScript by plugging in the URL. However, the website is built as a Single Page Application (SPA) that uses JavaScript / backbone.js to dynamically load most of it's contents after rendering the initial response.
So for example, when I route to the following address:
https://connect.garmin.com/modern/activity/1915361012
And then enter this into the console (after the page has loaded):
var $page = $("html")
console.log("%c✔: ", "color:green;", $page.find(".inline-edit-target.page-title-overflow").text().trim());
console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
Then I'll get the dynamically loaded activity title as well as the statically loaded page footer:
However, when I try to load the webpage via an AJAX call with either $.get()
or .load()
, I only get delivered the initial response (the same as the content when over view-source):
view-source:https://connect.garmin.com/modern/activity/1915361012
So if I use either of the the following AJAX calls:
// jQuery.get()
var url = "https://connect.garmin.com/modern/activity/1915361012";
jQuery.get(url,function(data) {
var $page = $("<div>").html(data)
console.log("%c✖: ", "color:red;", $page.find(".page-title").text().trim());
console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
});
// jQuery.load()
var url = "https://connect.garmin.com/modern/activity/1915361012";
var $page = $("<div>")
$page.load(url, function(data) {
console.log("%c✖: ", "color:red;", $page.find(".page-title").text().trim() );
console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
});
I'll still get the initial footer, but won't get any of the other page contents:
I've tried the solution here to eval()
the contents of every script
tag, but that doesn't appear robust enough to actually load the page:
jQuery.get(url,function(data) {
var $page = $("<div>").html(data)
$page.find("script").each(function() {
var scriptContent = $(this).html(); //Grab the content of this tag
eval(scriptContent); //Execute the content
});
console.log("%c✖: ", "color:red;", $page.find(".page-title").text().trim());
console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
});
Q: Any options to fully load a webpage that will scrapable over JavaScript?
You will never be able to fully replicate by yourself what an arbitrary (SPA) page does.
The only way I see is using a headless browser such as PhantomJS or Headless Chrome, or Headless Firefox.
I wanted to try Headless Chrome so let's see what it can do with your page:
Quick check using internal REPL
Load that page with Chrome Headless (you'll need Chrome 59 on Mac/Linux, Chrome 60 on Windows), and find page title with JavaScript from the REPL:
% chrome --headless --disable-gpu --repl https://connect.garmin.com/modern/activity/1915361012
[0830/171405.025582:INFO:headless_shell.cc(303)] Type a Javascript expression to evaluate or "quit" to exit.
>>> $('body').find('.page-title').text().trim()
{"result":{"type":"string","value":"Daily Mile - Round 2 - Day 27"}}
NB: to get chrome
command line working on a Mac I did this beforehand:
alias chrome="'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'"
Using programmatically with Node & Puppeteer
Puppeteer is a Node library (by Google Chrome developers) which provides a high-level API to control headless Chrome over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome.
(Step 0 : Install Node & Yarn if you don't have them)
In a new directory:
yarn init
yarn add puppeteer
Create index.js
with this:
const puppeteer = require('puppeteer');
(async() => {
const url = 'https://connect.garmin.com/modern/activity/1915361012';
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Go to URL and wait for page to load
await page.goto(url, {waitUntil: 'networkidle'});
// Wait for the results to show up
await page.waitForSelector('.page-title');
// Extract the results from the page
const text = await page.evaluate(() => {
const title = document.querySelector('.page-title');
return title.innerText.trim();
});
console.log(`Found: ${text}`);
browser.close();
})();
Result:
$ node index.js
Found: Daily Mile - Round 2 - Day 27
这篇关于通过AJAX加载SPA网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!