从div中获取href [英] Scrapy getting href out of div
问题描述
我开始在一个小项目中使用 Scrapy,但无法提取链接.每次找到类时,我只得到[]"而不是 url.我是否遗漏了一些明显的东西?
sel = 选择器(响应)对于 sel.xpath("//div[@class='recipe-description']") 中的条目:打印 entry.xpath('href').extract()
来自网站的示例:
<a href="http://www.url.com/"><h2 class="rows-2"><span>SomeText</span></h2></a>
您的 xpath 查询错误
for entry in sel.xpath("//div[@class='recipe-description']"):
在这一行中,您实际上是在迭代没有任何 Href 属性的 div
为了使其正确,您应该在 div
中选择 achor
元素:
for entry in sel.xpath("//div[@class='recipe-description']/a"):打印 entry.xpath('href').extract()
最好的解决方案是直接在for
循环中提取href
属性
for href in sel.xpath("//div[@class='recipe-description']/a/@href").extract():打印href
为简单起见,您也可以使用 css 选择器
for href in sel.css("div.recipe-description a::attr(href)").extract():打印href
I started to use Scrapy for a small project and I fail to extract the link. Instead of the url I get only "[]" for each time the class is found. Am I missing something obvious?
sel = Selector(response)
for entry in sel.xpath("//div[@class='recipe-description']"):
print entry.xpath('href').extract()
Sample from the website:
<div class="recipe-description">
<a href="http://www.url.com/">
<h2 class="rows-2"><span>SomeText</span></h2>
</a>
</div>
your xpath query is wrong
for entry in sel.xpath("//div[@class='recipe-description']"):
in this line you are actually iterating our divs that doesn't have any Href attribute
for making it correct you should select achor
elements in div
:
for entry in sel.xpath("//div[@class='recipe-description']/a"):
print entry.xpath('href').extract()
best possible solution is extract href
attribute in for
loop directly
for href in sel.xpath("//div[@class='recipe-description']/a/@href").extract():
print href
for simplicity you can also use css selectors
for href in sel.css("div.recipe-description a::attr(href)").extract():
print href
这篇关于从div中获取href的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!