拼凑得到同班的第n个孩子的文字 [英] scrapy get nth-child text of same class
问题描述
我附了一张照片.我面临的问题是获取同一个类的第一个元素.我在努力 .adxHeader
> .adxExtraInfo(第1个)
> .adxExtraInfoPart(第1个)
> a :: text
I've attached a picture.
The problem I'm facing is that getting the first element of same class. I'm trying to get
.adxHeader
> .adxExtraInfo (1st one)
> .adxExtraInfoPart (1st one)
> a::text
我编写了以下代码,但没有用.有想法吗?
I wrote the following code but not working. Any Idea?
response.css('div.adxViewContainer div.adxHeader div.adxExtraInfo:nth-child(1)div.adxExtraInfoPart:nth-child(1)a :: text').extract_first()
预期输出:الرياض
<div class="adxHeader">
<h3 itemprop="name"> » درج داخلي للاجار جديد حي المونسيه</h3>
<div class="adxExtraInfo">
<div class="adxExtraInfoPart"><a href="/city/الرياض"><i class="fa fa-map-marker"></i> الرياض</a></div>
<div class="adxExtraInfoPart"><a href="/users/ابو نوره"><i class="fa fa-user"></i> ابو نوره</a></div>
</div>
<div class="adxExtraInfo">
<div class="adxExtraInfoPart"> قبل ساعه و 27 دقيقه</div>
<div class="adxExtraInfoPart">#20467014</div>
</div>
<div class="moveLeft">
<a href="www.google.com" class="nextad"> ← التالي </a>
<br />
</div>
</div>
推荐答案
您定位的< div class ="adxExtraInfo">
不是其的第一个孩子; div class ="adxHeader">
父级.< h3>
是.因此 div.adxExtraInfo:nth-child(1)
将与您输入的内容不匹配:
The <div class="adxExtraInfo">
that you are targetting is not the 1st child of its <div class="adxHeader">
parent. The <h3>
is.
So div.adxExtraInfo:nth-child(1)
will not match anything in your input:
>>> s = scrapy.Selector(text='''<div class="adxHeader">
... <h3 itemprop="name"> » درج داخلي للاجار جديد حي المونسيه</h3>
...
... <div class="adxExtraInfo">
... <div class="adxExtraInfoPart"><a href="/city/الرياض"><i class="fa fa-map-marker"></i> الرياض</a></div>
... <div class="adxExtraInfoPart"><a href="/users/ابو نوره"><i class="fa fa-user"></i> ابو نوره</a></div>
... </div>
...
... <div class="adxExtraInfo">
... <div class="adxExtraInfoPart"> قبل ساعه و 27 دقيقه</div>
... <div class="adxExtraInfoPart">#20467014</div>
... </div>
... <div class="moveLeft">
...
...
... <a href="www.google.com" class="nextad"> ← التالي </a>
... <br />
...
... </div>
...
... </div>''')
>>> s.css('div.adxHeader > div.adxExtraInfo:nth-child(1)').extract()
[]
>>> s.css('div.adxHeader > *:nth-child(1)').extract()
[u'<h3 itemprop="name"> \xbb \u062f\u0631\u062c \u062f\u0627\u062e\u0644\u064a \u0644\u0644\u0627\u062c\u0627\u0631 \u062c\u062f\u064a\u062f \u062d\u064a \u0627\u0644\u0645\u0648\u0646\u0633\u064a\u0647</h3>']
>>>
但是在这种情况下,您可能希望使用< h3>
锚定.org/TR/css3-selectors/#adjacent-sibling-combinators"rel =" nofollow noreferrer>相邻的同级组合器(换句话说,就是< div class =" adxExtraInfo>
之后紧跟< h3>
):
But you may want to anchor div.adxExtraInfo
with the <h3>
in that case, using the Adjacent sibling combinator (in other words, the <div class="adxExtraInfo">
immediately following the <h3>
):
>>> print(
... s.css('''div.adxHeader
... > h3:nth-child(1) + div.adxExtraInfo
... div.adxExtraInfoPart:nth-child(1) a::text''').extract_first())
الرياض
>>>
这篇关于拼凑得到同班的第n个孩子的文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!