如何获得'< img src =''>'的绝对路径来自response.body的节点 [英] How do I get the absolute path for '<img src=''>' in node from the a response.body
问题描述
所以我想使用request-promise来拉取页面的主体。一旦我有了页面,我想收集所有标签并获得这些图像的src数组。假设页面上的src属性具有相对路径和绝对路径。我想要一个页面上的imgs绝对路径数组。我知道我可以使用一些字符串操作和npm路径来构建绝对路径,但我想找到一种更好的方法。
So I want to use request-promise to pull the body of a page. Once I have the page I want to collect all the tags and get an array of src's of those images. Assume the src attributes on a page have both relative and absolute paths. I want an array of absolute paths for imgs on a page. I know I can use some string manipulation and the npm path to build the absolute path but I wanted to find a better way of doing it.
var rp = require('request-promise'),
cheerio = require('cheerio');
var options = {
uri: 'http://www.google.com',
method: 'GET',
resolveWithFullResponse: true
};
rp(options)
.then (function (response) {
$ = cheerio.load(response.body);
var relativeLinks = $("img");
relativeLinks.each( function() {
var link = $(this).attr('src');
console.log(link);
if (link.startsWith('http')){
console.log('abs');
}
else {
console.log('rel');
}
});
});
结果
/logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif
rel
推荐答案
要获取场景中的图像链接数组,可以使用 url.resolve
解析 src
属性 img
带有请求URL的标签,产生绝对URL。数组传递给最后的然后
;如果需要,你可以使用 console.log
以外的数组做其他事情。
To get an array of image links in your scenario, you can use url.resolve
to resolve relative src
attributes of img
tags with the request URL, resulting in an absolute URL. The array is passed to the final then
; you can do other things with the array other than console.log
if so desired.
var rp = require('request-promise'),
cheerio = require('cheerio'),
url = require('url'),
base = 'http://www.google.com';
var options = {
uri: base,
method: 'GET',
resolveWithFullResponse: true
};
rp(options)
.then (function (response) {
var $ = cheerio.load(response.body);
return $('img').map(function () {
return url.resolve(base, $(this).attr('src'));
}).toArray();
})
.then(console.log);
此 url.resolve
适用于绝对值或相对URL(当从请求URL解析为相对路径时,它会解析并返回组合的绝对URL,但是当从请求URL解析为绝对URL时,它只返回绝对URL)。例如,Google上的 img
标记 /logos/cat.gif
和 https: //test.com/dog.gif
作为 src
属性,这将输出:
This url.resolve
will work for absolute or relative URLs (it resolves and returns the combined absolute URL when resolving from your request URL to a relative path, but when resolving from your request URL to an absolute URL it just returns the absolute URL). For example, with img
tags on google with /logos/cat.gif
and https://test.com/dog.gif
as the src
attributes, this would output:
[
'http://www.google.com/logos/cat.gif',
'https://test.com/dog.gif'
]
这篇关于如何获得'< img src =''>'的绝对路径来自response.body的节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!