Scrapy add.xpath或连接xpath [英] Scrapy add.xpath or join xpath

查看:135
本文介绍了Scrapy add.xpath或连接xpath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望每个人都做得很好。

我有这个代码(它的一部分)是一个蜘蛛,现在这是最后一部分的抓取,在这里它开始刮,然后写入csv文件,所以我得到了这个怀疑,可以加入或添加xpath与文件中打印的结果,例如:

 < h5> Soundbooster< / h5> <峰; br><峰; br> 
< p class =details>
< b> Filtro attuale< / b>
< / p>
< blockquote>
< p>
< b>目录:< / b>
Aliant< / br>
< b> Marca e Modello:< / b>
马自达 - 3< / br>
< b>版本:< / b>
(第三代)2013至今(汽油)


< / blockquote>

我想为csv文件中的一个字段加入以下内容,应该是这样的:


$ b Soundbooster 每个马自达3 - (第3代)2013-now(Petrol)



这里是我迷失的地方,有可能吗?我不知道是否必须使用add.xpath或join或其他方法以及如何正确使用它。



这是我的代码的一部分:

  def parse_content_details(self,response):
$ b $ exists = os.path.isfile(ntp / ntp_aliant。 (SiteID =意大利|国家= IT |货币)csvfile:
fieldnames = ['csv)
with open(ntp / ntp_aliant.csv,a +,newline ='') = EUR | Version = 745 | CC = UTF-8)','* Category','* Title','Model','ConditionID','PostalCode',\
'VATPercent','* C :产品:EAN','* C:MPN','PicURL','描述','*格式','*持续时间','StartPrice','*数量','PayPalAccepted','PayPalEmailAddress ',\
'PaymentInstructions','* Location','ShippingService-1:FreeShipping','ShippingService-1:Option','ShippingService-1:Cost','ShippingService-1:Priority',\\ \\
'ShippingService- 2:选项','ShippingService-2:成本','ShippingService-2:优先级','ShippingService-3:选项','ShippingService-3:成本',\
'ShippingService-3:优先级' ,'ShippingService-4:Option','ShippingService-4:Cost','ShippingService-4:Priority','* DispatchTimeMax',\
'* ReturnsAcceptedOption','ReturnsWithinOption','RefundOption' ShippingCostPaidByOption']
writer = csv.DictWriter(csvfile,fieldnames = fieldnames)
如果不存在:
writer.writeheader()

for ntp in response.css ('div.content-1col-nobox'):

name = ntp.xpath('normalize-space(// h5 / text())')。extract_first()
brand = ntp.xpath('normalize-space(// div / blockquote [1] / p / text()[4])')。extract_first()
version = ntp.xpath('normalize-space(/ / div / blockquote [1] / p / text()[6])')。extract_first()
result = response.xpath(name +per+ brand + - + version)
MPN = ntp.xpath('normalize-space(// tr [2] / td [1] /文本())')。extract_first()
description = ntp.xpath('normalize-space(// div [6] / div [1] / div [2] / div / blockquote [2] / p / text())')。extract_first()
price = ntp.xpath('normalize-space(// tr [2] / td [@ id =right_cell] [1])')。extract ()[0] .split(None,1)[0] .replace(,,。)
picUrl = response.urljoin(ntp.xpath('// div / p [3] / (SiteID = Italy | Country = IT | Currency = EUR | Version = 745 | CC = UTF-8)':'Add',\
'* Category':'30895',\
'* Title':name,\
'Model':result, \
'ConditionID':'1000',\
'PostalCode':'154',\
'VATPercent':'22',\
'* C:Marca':'优先配件',\
'产品:EAN':'',\
'* C :MPN':MPN,\
'PicURL':picUrl,\
'Description':description,\
'* Format':'FixedPrice',\
'* Duration':'GTC',\
'StartPrice':price,\
'* Quantity':'3',\
'PayPalAccepted':'1' ,\
'PayPalEmailAddress':'your @ gmail.com',\
'PaymentInstructions':'your @ gmail.com',\
'* Location':'Italia ',\
'ShippingService-1:FreeShipping':'1',\
'ShippingService-1:Option':'IT_OtherCourier3To5Days',\
'ShippingServ冰块1:成本':'10',\
'ShippingService-1:优先级':'1',\
'ShippingService-2:Option':'IT_QuickPackage3',\
'ShippingService-2:Cost':'15',\
'ShippingService-2:Priority':'2',\
'ShippingService-3:Option':'IT_QuickPackage1' ,\
'ShippingService-3:Cost':'12',\
'ShippingService-3:Priority':'3',\
'ShippingService-4:Option' :'IT_Pickup',\
'ShippingService-4:Cost':'0',\
'ShippingService-4:Priority':'4',\
'* DispatchTimeMax ':'5',\
'* ReturnsAcceptedOption':'ReturnsAccepted',\
'ReturnsWithinOption':'Days_14',\
'Re fundOption':'MoneyBackOrExchange',\
'ShippingCostPaidByOption':'Buyer'})

任何帮助将不胜感激。
干杯。
Valter。

解决方案

最后@Casper是正确的,在评论中我们看到了正确的答案


{} per {} - {}。格式(名称,品牌,版本)

这是最终结果:

  name = ntp.xpath('normalize-空格(// h5 / text())')。extract_first()
brand = ntp.xpath('normalize-space(// div / blockquote [1] / p // text()[4]) ').extract_first()
version = ntp.xpath('normalize-space(// div / blockquote [1] / p // text()[6])')。extract_first()
结果=({} per {} - {}。格式(名称,品牌,版本))

writer.writerow({

'* Title':result ,\


I hope everyone is doing well.

I have this code(part of it) for a spider, now this is the last part of the scraping, here it start to scrape and then write in the csv file, so I got this doubdt, it is possible to join or add xpath with the result printed in the file, for example:

        <h5>Soundbooster</h5> <br><br>
          <p class="details">
            <b>Filtro attuale</b>
          </p>
          <blockquote>
            <p>
              <b>Catalogo:</b> 
                Aliant</br>
              <b>Marca e Modello:</b> 
                Mazda - 3 </br>
              <b>Versione:</b> 
                (3th gen) 2013-now (Petrol)
            </p>
          </blockquote>

I want to join the following for one field in the csv file, should be something like this:

Soundbooster per Mazda - 3 - (3th gen) 2013-now (Petrol)

And here it is where I am lost, It is possible? I don't know if I have to use add.xpath or join or another method and how to use it right.

This is part of my code:

def parse_content_details(self, response):

        exists = os.path.isfile("ntp/ntp_aliant.csv")
        with open("ntp/ntp_aliant.csv", "a+", newline='') as csvfile:
            fieldnames = ['*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)','*Category','*Title','Model','ConditionID','PostalCode',\
            'VATPercent','*C:Marca','Product:EAN','*C:MPN','PicURL', 'Description','*Format','*Duration','StartPrice','*Quantity','PayPalAccepted','PayPalEmailAddress',\
            'PaymentInstructions','*Location','ShippingService-1:FreeShipping', 'ShippingService-1:Option','ShippingService-1:Cost', 'ShippingService-1:Priority',\
             'ShippingService-2:Option','ShippingService-2:Cost','ShippingService-2:Priority','ShippingService-3:Option','ShippingService-3:Cost',\
             'ShippingService-3:Priority','ShippingService-4:Option','ShippingService-4:Cost','ShippingService-4:Priority','*DispatchTimeMax',\
             '*ReturnsAcceptedOption','ReturnsWithinOption','RefundOption','ShippingCostPaidByOption']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            if not exists:               
                writer.writeheader()

            for ntp in response.css('div.content-1col-nobox'):

                name = ntp.xpath('normalize-space(//h5/text())').extract_first()
                brand = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[4])').extract_first()
                version = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[6])').extract_first()
                result = response.xpath(name + " per " + brand + " - " + version)
                MPN = ntp.xpath('normalize-space(//tr[2]/td[1]/text())').extract_first()
                description = ntp.xpath('normalize-space(//div[6]/div[1]/div[2]/div/blockquote[2]/p/text())').extract_first()
                price = ntp.xpath('normalize-space(//tr[2]/td[@id="right_cell"][1])').extract()[0].split(None,1)[0].replace(",",".")
                picUrl = response.urljoin(ntp.xpath('//div/p[3]/img/@src').extract_first())

                writer.writerow({
                '*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)':'Add',\
                '*Category':'30895',\
                '*Title': name,\
                'Model': result,\
                'ConditionID': '1000',\
                'PostalCode':'154',\
                'VATPercent':'22',\
                '*C:Marca':'Priority Parts',\
                'Product:EAN':'',\
                '*C:MPN': MPN,\
                'PicURL': picUrl,\
                'Description': description,\
                '*Format' : 'FixedPrice',\
                '*Duration': 'GTC',\
                'StartPrice' : price,\
                '*Quantity':'3',\
                'PayPalAccepted': '1',\
                'PayPalEmailAddress' : 'your@gmail.com',\
                'PaymentInstructions' : 'your@gmail.com',\
                '*Location' : 'Italia',\
                'ShippingService-1:FreeShipping' : '1',\
                'ShippingService-1:Option' : 'IT_OtherCourier3To5Days',\
                'ShippingService-1:Cost' : '10',\
                'ShippingService-1:Priority' : '1',\
                'ShippingService-2:Option' : 'IT_QuickPackage3',\
                'ShippingService-2:Cost' : '15',\
                'ShippingService-2:Priority' : '2',\
                'ShippingService-3:Option': 'IT_QuickPackage1',\
                'ShippingService-3:Cost' : '12',\
                'ShippingService-3:Priority' : '3',\
                'ShippingService-4:Option': 'IT_Pickup',\
                'ShippingService-4:Cost' : '0',\
                'ShippingService-4:Priority' : '4',\
                '*DispatchTimeMax' : '5',\
                '*ReturnsAcceptedOption' : 'ReturnsAccepted',\
                'ReturnsWithinOption' : 'Days_14',\
                'RefundOption' : 'MoneyBackOrExchange',\
                'ShippingCostPaidByOption' : 'Buyer'})

Any help will be appreciate it. Cheers. Valter.

解决方案

At the end @Casper was right, in the comments we see the right answer

"{} per {} - {}".format(name, brand, version)

This is the final result:

            name = ntp.xpath('normalize-space(//h5/text())').extract_first()
            brand = ntp.xpath('normalize-space(//div/blockquote[1]/p//text()[4])').extract_first()
            version = ntp.xpath('normalize-space(//div/blockquote[1]/p//text()[6])').extract_first()
            result = ("{} per {} - {}".format(name, brand, version))

            writer.writerow({

            '*Title': result,\

这篇关于Scrapy add.xpath或连接xpath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆