如何用刮链接phantomjs [英] how to scrape links with phantomjs

查看:177
本文介绍了如何用刮链接phantomjs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以 PhantomJS 中使用的到的 BeautifulSoup

我想搜索的Etsy的参观足月的所有环节。在Python中,我知道如何做到这一点(与BeautifulSoup),但今天我想看看我能不能做同样的PhantomJS。我没有得到很远。

I am trying to search on Etsy and visit all the links in term. In Python, I know how to do this (with BeautifulSoup) but today I want to see if I can do the same with PhantomJS. I'm not getting very far.

该脚本应搜索的Etsy的凯蒂猫,并返回所有产品
<一类=挂牌拇指的href = ...>< / A> 并在控制台打印出来。理想情况下我会去拜访他们以后得到我需要的信息。现在,它只是冻结。任何想法?

This script should search "hello kitty" on Etsy and return all the of products <a class="listing-thumb" href=...></a> and print them in the console. Ideally I'd visit them later on and get the information I need. Right now it just freezes. Any ideas?

var page = require('webpage').create();
var url = 'http://www.etsy.com/search?q=hello%20kitty';

page.open(url, function(status){
    // list all the a.href links in the hello kitty etsy page
    var link = page.evaluate(function() {
        return document.querySelectorAll('a.listing-thumb');
    });
    for(var i = 0; i < link.length; i++){ console.log(link[i].href); }
    phantom.exit();
});

我已经玩弄使用 CasperJS ,这可能为这个更好的设计。

I have toyed with using CasperJS, which may be better designed for this.

推荐答案

PhantomJS 评估()不能序列化和返回象HTML元素或的NodeLists复杂的对象,所以你必须映射他们之前序列化的东西:

PhantomJS evaluate() cannot serialize and return complex objects like HTMLElements or NodeLists, so you have to map them to serializable things before:

var page = require('webpage').create();
var url = 'http://www.etsy.com/search?q=hello%20kitty';

page.open(url, function(status) {
    // list all the a.href links in the hello kitty etsy page
    var links = page.evaluate(function() {
        return [].map.call(document.querySelectorAll('a.listing-thumb'), function(link) {
            return link.getAttribute('href');
        });
    });
    console.log(links.join('\n'));
    phantom.exit();
});

请注意:在这里我们使用 [] map.call()为了治疗节点列表作为。标准阵列

Note: here we use [].map.call() in order to treat a NodeList as a standard Array.

这篇关于如何用刮链接phantomjs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆