添加//时,xpath 仅适用于一项 [英] xpath works for just one item when add // in it

查看:55
本文介绍了添加//时,xpath 仅适用于一项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个 html 页面

<div class="results-list"><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div><div class="itempaid-featured-item"></div>

</页面>

在每个项目付费特色项目"中,我有这个:

<div class="anotherthing">

我想使用 xpath 提取标题.

我尝试了什么

Container = "//div[@class='results-list']"对于容器中的项目:title = "//div[@class='title']/text()"

我有 8 个标题,但每个标题都是第一项的标题.

我确定那是因为我使用了//

请问我该怎么办?

第一次

我不想使用 css 选择器,因为它在我的工作中是不允许的

第二个

我不想使用 class="something" 因为这个 div 并不总是存在于我的页面中.

第三个

我在 python 中使用scrapy

第四

感谢您的帮助

解决方案

说你的页面看起来像 (page.html):

<div id="结果列表"><div class="itempaid-featured-item"><div class="something"><div class="title">标题 1</div>

<div class="anotherthing"></div>

<div class="itempaid-featured-item"><div class="something"><div class="title">标题 2</div>

<div class="anotherthing"></div>

<div class="itempaid-featured-item"><div class="something"><div class="title">标题 3</div>

<div class="anotherthing"></div>

<div class="itempaid-featured-item"><div class="something"><div class="title">标题 4</div>

<div class="anotherthing"></div>

</页面>

要提取每个标题,您可以:

from scrapy.selector import Selectorsel = Selector(text=open('page.html').read())container = sel.xpath('//div[@id="results-list"]')items = container.xpath('.//div[@class="itempaid-featured-item"]')对于项目中的项目:# *extracted* 是包含标题的单项列表.extract = item.xpath('.//div[@class="title"]/text()').extract()标题 = 提取[0]印刷标题

这将输出:

标题 1标题 2标题 3标题 4

I have this html page

<page>
<div class="results-list">

    <div class="item paid-featured-item"></div>
    <div class="item paid-featured-item"></div>
    <div class="item paid-featured-item"></div>
    <div class="item paid-featured-item"></div>
    <div class="item paid-featured-item"></div>
    <div class="item paid-featured-item"></div>
    <div class="item paid-featured-item"></div>
    <div class="item paid-featured-item"></div>

</div>
</page>

and Inside each "item paid-featured-item", I have this:

<div class="item paid-featured-item">
    <div class="somethign">
        <div class="title">
            This is the title
        </div>
    </div>
    <div class="anotherthing">
    </div>
</div>

I want to extract the title using xpath.

what I have tried

Container = "//div[@class='results-list']"

for item in Container:
    title = "//div[@class='title']/text()"

I get 8 titles but each one is the title of the first item.

i am sure that is because i used //

what should I do please?

first

I don't want to use css selectors because it is not allowed in my work

second

I don't want to use class="something" because this div not always exist in my page.

third

i am using scrapy with python

fourth

appreciate your help

解决方案

Say your page looks like (page.html):

<page>
  <div id="results-list">
    <div class="item paid-featured-item">
      <div class="something">
        <div class="title">Title 1</div>
      </div>
      <div class="anotherthing"></div>
    </div>
    <div class="item paid-featured-item">
      <div class="something">
        <div class="title">Title 2</div>
      </div>
      <div class="anotherthing"></div>
    </div>
    <div class="item paid-featured-item">
      <div class="something">
        <div class="title">Title 3</div>
      </div>
      <div class="anotherthing"></div>
    </div>
    <div class="item paid-featured-item">
      <div class="something">
        <div class="title">Title 4</div>
      </div>
      <div class="anotherthing"></div>
    </div>
  </div>
</page>

To extract each title, you do:

from scrapy.selector import Selector
sel = Selector(text=open('page.html').read())

container = sel.xpath('//div[@id="results-list"]')
items = container.xpath('.//div[@class="item paid-featured-item"]')
for item in items:
    # *extracted* is a single-item list containing the title.
    extracted = item.xpath('.//div[@class="title"]/text()').extract()
    title = extracted[0]
    print title

This will output:

Title 1
Title 2
Title 3
Title 4

这篇关于添加//时,xpath 仅适用于一项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
其他开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆