从div中获取href [英] Scrapy getting href out of div

查看:36
本文介绍了从div中获取href的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始在一个小项目中使用 Scrapy,但无法提取链接.每次找到类时,我只得到[]"而不是 url.我是否遗漏了一些明显的东西?

sel = 选择器(响应)对于 sel.xpath("//div[@class='recipe-description']") 中的条目:打印 entry.xpath('href').extract()

来自网站的示例:

<a href="http://www.url.com/"><h2 class="rows-2"><span>SomeText</span></h2></a>

解决方案

您的 xpath 查询错误

 for entry in sel.xpath("//div[@class='recipe-description']"):

在这一行中,您实际上是在迭代没有任何 Href 属性的 div

为了使其正确,您应该在 div 中选择 achor 元素:

 for entry in sel.xpath("//div[@class='recipe-description']/a"):打印 entry.xpath('href').extract()

最好的解决方案是直接在for循环中提取href属性

for href in sel.xpath("//div[@class='recipe-description']/a/@href").extract():打印href

为简单起见,您也可以使用 css 选择器

for href in sel.css("div.recipe-description a::attr(href)").extract():打印href

I started to use Scrapy for a small project and I fail to extract the link. Instead of the url I get only "[]" for each time the class is found. Am I missing something obvious?

sel = Selector(response)
for entry in sel.xpath("//div[@class='recipe-description']"):
    print entry.xpath('href').extract()

Sample from the website:

<div class="recipe-description">
    <a href="http://www.url.com/">
        <h2 class="rows-2"><span>SomeText</span></h2>
    </a>
</div>

解决方案

your xpath query is wrong

for entry in sel.xpath("//div[@class='recipe-description']"):

in this line you are actually iterating our divs that doesn't have any Href attribute

for making it correct you should select achor elements in div:

for entry in sel.xpath("//div[@class='recipe-description']/a"):
    print entry.xpath('href').extract()

best possible solution is extract href attribute in for loop directly

for href in sel.xpath("//div[@class='recipe-description']/a/@href").extract():
    print href

for simplicity you can also use css selectors

for href in sel.css("div.recipe-description a::attr(href)").extract():
    print href

这篇关于从div中获取href的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆