使用Node.js从网页中抓取URL [英] Scraping URLs from a web page with Node.js

查看:52
本文介绍了使用Node.js从网页中抓取URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网站上抓取所有URL,并将它们放入数组中。我对数组索引有疑问。如果我在数组[2]中添加一个类似2的索引号,则命令行将以 undefined答复。如果删除索引并打印整个数组,它将逐行打印所有URL。我希望每个URL都是自己的索引,例如:

I'm trying to scrape all URLs from a website and put them into an array. I have a question about an array index. If I add an index number like 2 into array[2], the command line replies with "undefined". If I remove the index and print the whole array, it prints all the URLs line by line. I want each URL to be its own index like:


  • array [0] =找到第一个URL

  • array [1] =找到第二个URL

  • array [2] =找到第三个URL等。

有人能指出我正确的方向吗?谢谢。

Can anyone point me in the right direction? Thank you.

  var request = require('request');
    var cheerio = require('cheerio');

   var url = 'http://www.hobo-web.co.uk/';

    request(url, function(err, resp, body){
      $ = cheerio.load(body);
      links = $('a'); //use your CSS selector here
      $(links).each(function(i, link){
        var array = $(link).attr('href');
        console.log(array[2]);

      });
    });``


推荐答案

您需要首先将数组创建为在 .each 循环中可访问的变量,然后继续推送href值

You need to initially create the array as a variable accessible within the .each loop, then keep pushing the href values to it.

var request = require('request');
var cheerio = require('cheerio');

var url = 'http://www.hobo-web.co.uk/';

var array = [];

request(url, function(err, resp, body){
  $ = cheerio.load(body);
  links = $('a');
  $(links).each(function(i, link){
    var href = $(link).attr('href');
    array.push(href);
  });
});

这篇关于使用Node.js从网页中抓取URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆