添加//时,xpath 仅适用于一项
[英] xpath works for just one item when add // in it
本文介绍了添加//时,xpath 仅适用于一项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这个 html 页面
<div class="results-list"><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div>
</页面>
在每个项目付费特色项目"中,我有这个:
<div class="somethign"><div class="title">这是标题
<div class="anotherthing">
我想使用 xpath 提取标题.
我尝试了什么
Container = "//div[@class='results-list']"对于容器中的项目:title = "//div[@class='title']/text()"
我有 8 个标题,但每个标题都是第一项的标题.
我确定那是因为我使用了//
请问我该怎么办?
第一次
我不想使用 css 选择器,因为它在我的工作中是不允许的
第二个
我不想使用 class="something"
因为这个 div 并不总是存在于我的页面中.
第三个
我在 python 中使用scrapy
第四
感谢您的帮助
解决方案
说你的页面看起来像 (page.html
):
<div id="结果列表"><div class="itempaid-featured-item"><div class="something"><div class="title">标题 1</div>
<div class="anotherthing"></div>
<div class="itempaid-featured-item"><div class="something"><div class="title">标题 2</div>
<div class="anotherthing"></div>
<div class="itempaid-featured-item"><div class="something"><div class="title">标题 3</div>
<div class="anotherthing"></div>
<div class="itempaid-featured-item"><div class="something"><div class="title">标题 4</div>
<div class="anotherthing"></div>
</页面>
要提取每个标题,您可以:
from scrapy.selector import Selectorsel = Selector(text=open('page.html').read())container = sel.xpath('//div[@id="results-list"]')items = container.xpath('.//div[@class="itempaid-featured-item"]')对于项目中的项目:# *extracted* 是包含标题的单项列表.extract = item.xpath('.//div[@class="title"]/text()').extract()标题 = 提取[0]印刷标题
这将输出:
标题 1标题 2标题 3标题 4
I have this html page
<page>
<div class="results-list">
<div class="item paid-featured-item"></div>
<div class="item paid-featured-item"></div>
<div class="item paid-featured-item"></div>
<div class="item paid-featured-item"></div>
<div class="item paid-featured-item"></div>
<div class="item paid-featured-item"></div>
<div class="item paid-featured-item"></div>
<div class="item paid-featured-item"></div>
</div>
</page>
and Inside each "item paid-featured-item", I have this:
<div class="item paid-featured-item">
<div class="somethign">
<div class="title">
This is the title
</div>
</div>
<div class="anotherthing">
</div>
</div>
I want to extract the title using xpath.
what I have tried
Container = "//div[@class='results-list']"
for item in Container:
title = "//div[@class='title']/text()"
I get 8 titles but each one is the title of the first item.
i am sure that is because i used //
what should I do please?
first
I don't want to use css selectors because it is not allowed in my work
second
I don't want to use class="something"
because this div not always exist in my page.
third
i am using scrapy with python
fourth
appreciate your help
解决方案
Say your page looks like (page.html
):
<page>
<div id="results-list">
<div class="item paid-featured-item">
<div class="something">
<div class="title">Title 1</div>
</div>
<div class="anotherthing"></div>
</div>
<div class="item paid-featured-item">
<div class="something">
<div class="title">Title 2</div>
</div>
<div class="anotherthing"></div>
</div>
<div class="item paid-featured-item">
<div class="something">
<div class="title">Title 3</div>
</div>
<div class="anotherthing"></div>
</div>
<div class="item paid-featured-item">
<div class="something">
<div class="title">Title 4</div>
</div>
<div class="anotherthing"></div>
</div>
</div>
</page>
To extract each title, you do:
from scrapy.selector import Selector
sel = Selector(text=open('page.html').read())
container = sel.xpath('//div[@id="results-list"]')
items = container.xpath('.//div[@class="item paid-featured-item"]')
for item in items:
# *extracted* is a single-item list containing the title.
extracted = item.xpath('.//div[@class="title"]/text()').extract()
title = extracted[0]
print title
This will output:
Title 1
Title 2
Title 3
Title 4
这篇关于添加//时,xpath 仅适用于一项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文