如何获得'< img src =''>'的绝对路径来自response.body的节点 [英] How do I get the absolute path for '<img src=''>' in node from the a response.body

查看:170
本文介绍了如何获得'< img src =''>'的绝对路径来自response.body的节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我想使用request-promise来拉取页面的主体。一旦我有了页面,我想收集所有标签并获得这些图像的src数组。假设页面上的src属性具有相对路径和绝对路径。我想要一个页面上的imgs绝对路径数组。我知道我可以使用一些字符串操作和npm路径来构建绝对路径,但我想找到一种更好的方法。

So I want to use request-promise to pull the body of a page. Once I have the page I want to collect all the tags and get an array of src's of those images. Assume the src attributes on a page have both relative and absolute paths. I want an array of absolute paths for imgs on a page. I know I can use some string manipulation and the npm path to build the absolute path but I wanted to find a better way of doing it.

var rp = require('request-promise'),
    cheerio = require('cheerio');

var options = {
    uri: 'http://www.google.com',
    method: 'GET',
    resolveWithFullResponse: true
};

rp(options)
  .then (function (response) {
    $ = cheerio.load(response.body);
    var relativeLinks = $("img");
    relativeLinks.each( function() {
        var link = $(this).attr('src');
        console.log(link);
        if (link.startsWith('http')){
            console.log('abs');
        }
        else {
            console.log('rel');
        }
   });
});

结果

  /logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif
  rel


推荐答案

要获取场景中的图像链接数组,可以使用 url.resolve 解析 src 属性 img 带有请求URL的标签,产生绝对URL。数组传递给最后的然后;如果需要,你可以使用 console.log 以外的数组做其他事情。

To get an array of image links in your scenario, you can use url.resolve to resolve relative src attributes of img tags with the request URL, resulting in an absolute URL. The array is passed to the final then; you can do other things with the array other than console.log if so desired.

var rp = require('request-promise'),
    cheerio = require('cheerio'),
    url = require('url'),
    base = 'http://www.google.com';

var options = {
    uri: base,
    method: 'GET',
    resolveWithFullResponse: true
};

rp(options)
    .then (function (response) {
        var $ = cheerio.load(response.body);

        return $('img').map(function () {
            return url.resolve(base, $(this).attr('src'));
        }).toArray();
    })
    .then(console.log);

url.resolve 适用于绝对值或相对URL(当从请求URL解析为相对路径时,它会解析并返回组合的绝对URL,但是当从请求URL解析为绝对URL时,它只返回绝对URL)。例如,Google上的 img 标记 /logos/cat.gif https: //test.com/dog.gif 作为 src 属性,这将输出:

This url.resolve will work for absolute or relative URLs (it resolves and returns the combined absolute URL when resolving from your request URL to a relative path, but when resolving from your request URL to an absolute URL it just returns the absolute URL). For example, with img tags on google with /logos/cat.gif and https://test.com/dog.gif as the src attributes, this would output:

[ 
    'http://www.google.com/logos/cat.gif',
    'https://test.com/dog.gif'
]

这篇关于如何获得'< img src =''>'的绝对路径来自response.body的节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆