如何用刮链接phantomjs [英] how to scrape links with phantomjs

查看：177 发布时间：2016/8/5 18:56:51 javascript beautifulsoup phantomjs casperjs

本文介绍了如何用刮链接phantomjs的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想搜索的Etsy的参观足月的所有环节。在Python中，我知道如何做到这一点（与BeautifulSoup），但今天我想看看我能不能做同样的PhantomJS。我没有得到很远。

I am trying to search on Etsy and visit all the links in term. In Python, I know how to do this (with BeautifulSoup) but today I want to see if I can do the same with PhantomJS. I'm not getting very far.

该脚本应搜索的Etsy的凯蒂猫，并返回所有产品
＆LT;一类=挂牌拇指的href = ...＆GT;＆LT; / A＆GT; 并在控制台打印出来。理想情况下我会去拜访他们以后得到我需要的信息。现在，它只是冻结。任何想法？

This script should search "hello kitty" on Etsy and return all the of products <a class="listing-thumb" href=...></a> and print them in the console. Ideally I'd visit them later on and get the information I need. Right now it just freezes. Any ideas?

var page = require('webpage').create();
var url = 'http://www.etsy.com/search?q=hello%20kitty';

page.open(url, function(status){
    // list all the a.href links in the hello kitty etsy page
    var link = page.evaluate(function() {
        return document.querySelectorAll('a.listing-thumb');
    });
    for(var i = 0; i < link.length; i++){ console.log(link[i].href); }
    phantom.exit();
});

我已经玩弄使用 CasperJS ，这可能为这个更好的设计。

I have toyed with using CasperJS, which may be better designed for this.

推荐答案

PhantomJS 评估（）不能序列化和返回象HTML元素或的NodeLists复杂的对象，所以你必须映射他们之前序列化的东西：

PhantomJS evaluate() cannot serialize and return complex objects like HTMLElements or NodeLists, so you have to map them to serializable things before:

var page = require('webpage').create();
var url = 'http://www.etsy.com/search?q=hello%20kitty';

page.open(url, function(status) {
    // list all the a.href links in the hello kitty etsy page
    var links = page.evaluate(function() {
        return [].map.call(document.querySelectorAll('a.listing-thumb'), function(link) {
            return link.getAttribute('href');
        });
    });
    console.log(links.join('\n'));
    phantom.exit();
});

请注意：在这里我们使用 [] map.call（）为了治疗节点列表作为。标准阵列。

Note: here we use [].map.call() in order to treat a NodeList as a standard Array.

这篇关于如何用刮链接phantomjs的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何用刮链接phantomjs [英] how to scrape links with phantomjs

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何用刮链接phantomjs [英] how to scrape links with phantomjs

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭