相对 URL 到绝对 URL Scrapy [英] Relative URL to absolute URL Scrapy

查看：53 发布时间：2021/7/16 21:51:12 scrapy

本文介绍了相对 URL 到绝对 URL Scrapy的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要帮助在 Scrapy 蜘蛛中将相对 URL 转换为绝对 URL.

I need help to convert relative URL to absolute URL in Scrapy spider.

我需要将起始页上的链接转换为绝对 URL，以获取起始页上的潦草项目的图像.我尝试了不同的方法来实现这一目标，但没有成功，但我被卡住了.有什么建议吗?

I need to convert links on my start pages to absolute URL to get the images of the scrawled items, which are on the start pages. I unsuccessfully tried different ways to achieve this and I'm stuck. Any suggestion?

class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = [
        "http://www.example.com/billboard",
        "http://www.example.com/billboard?page=1"
    ]

def parse(self, response):
    image_urls = response.xpath('//div[@class="content"]/section[2]/div[2]/div/div/div/a/article/img/@src').extract()
    relative_url = response.xpath(u'''//div[contains(concat(" ", normalize-space(@class), " "), " content ")]/a/@href''').extract()

    for image_url, url in zip(image_urls, absolute_urls):
        item = ExampleItem()
        item['image_urls'] = image_urls

    request = Request(url, callback=self.parse_dir_contents)
    request.meta['item'] = item
    yield request

推荐答案

主要有以下三种方式来实现:

There are mainly three ways to achieve that:

使用 urllib 中的 urljoin 函数:

from urllib.parse import urljoin
# Same as: from w3lib.url import urljoin

url = urljoin(base_url, relative_url)

使用响应的 urljoin 包装方法，如 Steve 所述.

Using the response's urljoin wrapper method, as mentioned by Steve.

url = response.urljoin(relative_url)

如果您还想从该链接产生一个请求，您可以使用少数响应的 follow 方法:

# It will create a new request using the above "urljoin" method
yield response.follow(relative_url, callback=self.parse)

这篇关于相对 URL 到绝对 URL Scrapy的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

相对 URL 到绝对 URL Scrapy [英] Relative URL to absolute URL Scrapy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

相对 URL 到绝对 URL Scrapy [英] Relative URL to absolute URL Scrapy

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭