Scrapy add.xpath或连接xpath [英] Scrapy add.xpath or join xpath

查看：135 发布时间：2018/6/25 18:54:23 python html xpath scrapy

本文介绍了Scrapy add.xpath或连接xpath的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望每个人都做得很好。

我有这个代码（它的一部分）是一个蜘蛛，现在这是最后一部分的抓取，在这里它开始刮，然后写入csv文件，所以我得到了这个怀疑，可以加入或添加xpath与文件中打印的结果，例如：

 < h5> Soundbooster< / h5> <峰; br><峰; br> 
< p class =details> 
< b> Filtro attuale< / b> 
< / p> 
< blockquote> 
< p> 
< b>目录：< / b> 
 Aliant< / br> 
< b> Marca e Modello：< / b> 
马自达 -  3< / br> 
< b>版本：< / b> 
（第三代）2013至今（汽油）
 
 
< / blockquote>

我想为csv文件中的一个字段加入以下内容，应该是这样的：

$ b Soundbooster 每个马自达3 - （第3代）2013-now（Petrol）

这里是我迷失的地方，有可能吗？我不知道是否必须使用add.xpath或join或其他方法以及如何正确使用它。

这是我的代码的一部分：
def parse_content_details（self，response）： $ b $ exists = os.path.isfile（ntp / ntp_aliant。（SiteID =意大利|国家= IT |货币）csvfile： fieldnames = ['csv） with open（ntp / ntp_aliant.csv，a +，newline =''） = EUR | Version = 745 | CC = UTF-8）'，'* Category'，'* Title'，'Model'，'ConditionID'，'PostalCode'，\ 'VATPercent'，'* C ：产品：EAN'，'* C：MPN'，'PicURL'，'描述'，'*格式'，'*持续时间'，'StartPrice'，'*数量'，'PayPalAccepted'，'PayPalEmailAddress '，\ 'PaymentInstructions'，'* Location'，'ShippingService-1：FreeShipping'，'ShippingService-1：Option'，'ShippingService-1：Cost'，'ShippingService-1：Priority'，\\ \\ 'ShippingService- 2：选项'，'ShippingService-2：成本'，'ShippingService-2：优先级'，'ShippingService-3：选项'，'ShippingService-3：成本'，\ 'ShippingService-3：优先级' ，'ShippingService-4：Option'，'ShippingService-4：Cost'，'ShippingService-4：Priority'，'* DispatchTimeMax'，\ '* ReturnsAcceptedOption'，'ReturnsWithinOption'，'RefundOption' ShippingCostPaidByOption'] writer = csv.DictWriter（csvfile，fieldnames = fieldnames）如果不存在： writer.writeheader（） for ntp in response.css （'div.content-1col-nobox'）： name = ntp.xpath（'normalize-space（// h5 / text（））'）。extract_first（） brand = ntp.xpath（'normalize-space（// div / blockquote [1] / p / text（）[4]）'）。extract_first（） version = ntp.xpath（'normalize-space（/ / div / blockquote [1] / p / text（）[6]）'）。extract_first（） result = response.xpath（name +per+ brand + - + version） MPN = ntp.xpath（'normalize-space（// tr [2] / td [1] /文本（））'）。extract_first（） description = ntp.xpath（'normalize-space（// div [6] / div [1] / div [2] / div / blockquote [2] / p / text（））'）。extract_first（） price = ntp.xpath（'normalize-space（// tr [2] / td [@ id =right_cell] [1]）'）。extract （）[0] .split（None，1）[0] .replace（，，。） picUrl = response.urljoin（ntp.xpath（'// div / p [3] / （SiteID = Italy | Country = IT | Currency = EUR | Version = 745 | CC = UTF-8）'：'Add'，\ '* Category'：'30895'，\ '* Title'：name，\ 'Model'：result， \ 'ConditionID'：'1000'，\ 'PostalCode'：'154'，\ 'VATPercent'：'22'，\ '* C：Marca'：'优先配件'，\ '产品：EAN'：''，\ '* C ：MPN'：MPN，\ 'PicURL'：picUrl，\ 'Description'：description，\ '* Format'：'FixedPrice'，\ '* Duration'：'GTC'，\ 'StartPrice'：price，\ '* Quantity'：'3'，\ 'PayPalAccepted'：'1' ，\ 'PayPalEmailAddress'：'your @ gmail.com'，\ 'PaymentInstructions'：'your @ gmail.com'，\ '* Location'：'Italia '，\ 'ShippingService-1：FreeShipping'：'1'，\ 'ShippingService-1：Option'：'IT_OtherCourier3To5Days'，\ 'ShippingServ冰块1：成本'：'10'，\ 'ShippingService-1：优先级'：'1'，\ 'ShippingService-2：Option'：'IT_QuickPackage3'，\ 'ShippingService-2：Cost'：'15'，\ 'ShippingService-2：Priority'：'2'，\ 'ShippingService-3：Option'：'IT_QuickPackage1' ，\ 'ShippingService-3：Cost'：'12'，\ 'ShippingService-3：Priority'：'3'，\ 'ShippingService-4：Option' ：'IT_Pickup'，\ 'ShippingService-4：Cost'：'0'，\ 'ShippingService-4：Priority'：'4'，\ '* DispatchTimeMax '：'5'，\ '* ReturnsAcceptedOption'：'ReturnsAccepted'，\ 'ReturnsWithinOption'：'Days_14'，\ 'Re fundOption'：'MoneyBackOrExchange'，\ 'ShippingCostPaidByOption'：'Buyer'}）
任何帮助将不胜感激。
干杯。
Valter。

解决方案
最后@Casper是正确的，在评论中我们看到了正确的答案

{} per {} - {}。格式（名称，品牌，版本）

这是最终结果：

name = ntp.xpath（'normalize-空格（// h5 / text（））'）。extract_first（） brand = ntp.xpath（'normalize-space（// div / blockquote [1] / p // text（）[4]） '）.extract_first（） version = ntp.xpath（'normalize-space（// div / blockquote [1] / p // text（）[6]）'）。extract_first（）结果=（{} per {} - {}。格式（名称，品牌，版本）） writer.writerow（{ '* Title'：result ，\

I hope everyone is doing well.

I have this code(part of it) for a spider, now this is the last part of the scraping, here it start to scrape and then write in the csv file, so I got this doubdt, it is possible to join or add xpath with the result printed in the file, for example:
<h5>Soundbooster</h5> Filtro attuale <blockquote> Catalogo: Aliant Marca e Modello: Mazda - 3 Versione: (3th gen) 2013-now (Petrol) </blockquote>
I want to join the following for one field in the csv file, should be something like this:

Soundbooster per Mazda - 3 - (3th gen) 2013-now (Petrol)

And here it is where I am lost, It is possible? I don't know if I have to use add.xpath or join or another method and how to use it right.

This is part of my code:
def parse_content_details(self, response): exists = os.path.isfile("ntp/ntp_aliant.csv") with open("ntp/ntp_aliant.csv", "a+", newline='') as csvfile: fieldnames = ['*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)','*Category','*Title','Model','ConditionID','PostalCode',\ 'VATPercent','*C:Marca','Product:EAN','*C:MPN','PicURL', 'Description','*Format','*Duration','StartPrice','*Quantity','PayPalAccepted','PayPalEmailAddress',\ 'PaymentInstructions','*Location','ShippingService-1:FreeShipping', 'ShippingService-1:Option','ShippingService-1:Cost', 'ShippingService-1:Priority',\ 'ShippingService-2:Option','ShippingService-2:Cost','ShippingService-2:Priority','ShippingService-3:Option','ShippingService-3:Cost',\ 'ShippingService-3:Priority','ShippingService-4:Option','ShippingService-4:Cost','ShippingService-4:Priority','*DispatchTimeMax',\ '*ReturnsAcceptedOption','ReturnsWithinOption','RefundOption','ShippingCostPaidByOption'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) if not exists: writer.writeheader() for ntp in response.css('div.content-1col-nobox'): name = ntp.xpath('normalize-space(//h5/text())').extract_first() brand = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[4])').extract_first() version = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[6])').extract_first() result = response.xpath(name + " per " + brand + " - " + version) MPN = ntp.xpath('normalize-space(//tr[2]/td[1]/text())').extract_first() description = ntp.xpath('normalize-space(//div[6]/div[1]/div[2]/div/blockquote[2]/p/text())').extract_first() price = ntp.xpath('normalize-space(//tr[2]/td[@id="right_cell"][1])').extract()[0].split(None,1)[0].replace(",",".") picUrl = response.urljoin(ntp.xpath('//div/p[3]/img/@src').extract_first()) writer.writerow({ '*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)':'Add',\ '*Category':'30895',\ '*Title': name,\ 'Model': result,\ 'ConditionID': '1000',\ 'PostalCode':'154',\ 'VATPercent':'22',\ '*C:Marca':'Priority Parts',\ 'Product:EAN':'',\ '*C:MPN': MPN,\ 'PicURL': picUrl,\ 'Description': description,\ '*Format' : 'FixedPrice',\ '*Duration': 'GTC',\ 'StartPrice' : price,\ '*Quantity':'3',\ 'PayPalAccepted': '1',\ 'PayPalEmailAddress' : 'your@gmail.com',\ 'PaymentInstructions' : 'your@gmail.com',\ '*Location' : 'Italia',\ 'ShippingService-1:FreeShipping' : '1',\ 'ShippingService-1:Option' : 'IT_OtherCourier3To5Days',\ 'ShippingService-1:Cost' : '10',\ 'ShippingService-1:Priority' : '1',\ 'ShippingService-2:Option' : 'IT_QuickPackage3',\ 'ShippingService-2:Cost' : '15',\ 'ShippingService-2:Priority' : '2',\ 'ShippingService-3:Option': 'IT_QuickPackage1',\ 'ShippingService-3:Cost' : '12',\ 'ShippingService-3:Priority' : '3',\ 'ShippingService-4:Option': 'IT_Pickup',\ 'ShippingService-4:Cost' : '0',\ 'ShippingService-4:Priority' : '4',\ '*DispatchTimeMax' : '5',\ '*ReturnsAcceptedOption' : 'ReturnsAccepted',\ 'ReturnsWithinOption' : 'Days_14',\ 'RefundOption' : 'MoneyBackOrExchange',\ 'ShippingCostPaidByOption' : 'Buyer'})
Any help will be appreciate it. Cheers. Valter.
解决方案
At the end @Casper was right, in the comments we see the right answer

"{} per {} - {}".format(name, brand, version)

This is the final result:
name = ntp.xpath('normalize-space(//h5/text())').extract_first() brand = ntp.xpath('normalize-space(//div/blockquote[1]/p//text()[4])').extract_first() version = ntp.xpath('normalize-space(//div/blockquote[1]/p//text()[6])').extract_first() result = ("{} per {} - {}".format(name, brand, version)) writer.writerow({ '*Title': result,\

这篇关于Scrapy add.xpath或连接xpath的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Scrapy add.xpath或连接xpath [英] Scrapy add.xpath or join xpath

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Scrapy add.xpath或连接xpath [英] Scrapy add.xpath or join xpath

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭