Scrapy 修改链接以包含域名 [英] Scrapy Modify Link to include Domain Name

查看：42 发布时间：2021/7/16 21:56:39 python scrapy

本文介绍了Scrapy 修改链接以包含域名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个项目，item['link']，形式如下:

I have an item, item['link'], of this form:

item['link'] = site.select('div[2]/div/h3/a/@href').extract()

它提取的链接是这种形式:

The links it extracts are of this form :

'link': [u'/watch?v=1PTw-uy6LA0&list=SP3DB54B154E6D121D&index=189'],

我希望他们这样:

'link': [u'http://www.youtube.com/watch?v=1PTw-uy6LA0&list=SP3DB54B154E6D121D&index=189'],

是否可以直接在scrapy中执行此操作，而不是事后重新编辑列表?

Is it possible to do this directly, in scrapy, instead of reediting the list afterwards?

推荐答案

是的，每次获取链接时，我都必须使用 urlparse.urljoin 方法.

Yeah, everytime I'm grabbing a link I have to use the method urlparse.urljoin.

def parse(self, response):
       hxs = HtmlXPathSelector(response)
       urls = hxs.select('//a[contains(@href, "content")]/@href').extract()  ## only grab url with content in url name
       for i in urls:
           yield Request(urlparse.urljoin(response.url, i[1:]),callback=self.parse_url)

我想象你试图抓取整个 url 来解析它，对吗?如果是这种情况，一个简单的两种方法系统将适用于 basespider.parse 方法找到链接，将其发送到 parse_url 方法，该方法将您提取的内容输出到管道

I imagine your trying to grab the entire url to parse it right? if that's the case a simple two method system would work on a basespider. the parse method finds the link, sends it to the parse_url method which outputs what you're extracting to the pipeline

def parse(self, response):
       hxs = HtmlXPathSelector(response)
       urls = hxs.select('//a[contains(@href, "content")]/@href').extract()  ## only grab url with content in url name
       for i in urls:
           yield Request(urlparse.urljoin(response.url, i[1:]),callback=self.parse_url)


def parse_url(self, response):
   hxs = HtmlXPathSelector(response)
   item = ZipgrabberItem()
   item['zip'] = hxs.select("//div[contains(@class,'odd')]/text()").extract() ## this grabs it
   return item

这篇关于Scrapy 修改链接以包含域名的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Scrapy 修改链接以包含域名 [英] Scrapy Modify Link to include Domain Name

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scrapy 修改链接以包含域名 [英] Scrapy Modify Link to include Domain Name

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭