Scrapy add.xpath或连接xpath [英] Scrapy add.xpath or join xpath
问题描述
我有这个代码(它的一部分)是一个蜘蛛,现在这是最后一部分的抓取,在这里它开始刮,然后写入csv文件,所以我得到了这个怀疑,可以加入或添加xpath与文件中打印的结果,例如:
< h5> Soundbooster< / h5> <峰; br><峰; br>
< p class =details>
< b> Filtro attuale< / b>
< / p>
< blockquote>
< p>
< b>目录:< / b>
Aliant< / br>
< b> Marca e Modello:< / b>
马自达 - 3< / br>
< b>版本:< / b>
(第三代)2013至今(汽油)
< / blockquote>
我想为csv文件中的一个字段加入以下内容,应该是这样的:
$ b Soundbooster 每个马自达3 - (第3代)2013-now(Petrol)
这里是我迷失的地方,有可能吗?我不知道是否必须使用add.xpath或join或其他方法以及如何正确使用它。
这是我的代码的一部分:
def parse_content_details(self,response):
$ b $ exists = os.path.isfile(ntp / ntp_aliant。 (SiteID =意大利|国家= IT |货币)csvfile:
fieldnames = ['csv)
with open(ntp / ntp_aliant.csv,a +,newline ='') = EUR | Version = 745 | CC = UTF-8)','* Category','* Title','Model','ConditionID','PostalCode',\
'VATPercent','* C :产品:EAN','* C:MPN','PicURL','描述','*格式','*持续时间','StartPrice','*数量','PayPalAccepted','PayPalEmailAddress ',\
'PaymentInstructions','* Location','ShippingService-1:FreeShipping','ShippingService-1:Option','ShippingService-1:Cost','ShippingService-1:Priority',\\ \\
'ShippingService- 2:选项','ShippingService-2:成本','ShippingService-2:优先级','ShippingService-3:选项','ShippingService-3:成本',\
'ShippingService-3:优先级' ,'ShippingService-4:Option','ShippingService-4:Cost','ShippingService-4:Priority','* DispatchTimeMax',\
'* ReturnsAcceptedOption','ReturnsWithinOption','RefundOption' ShippingCostPaidByOption']
writer = csv.DictWriter(csvfile,fieldnames = fieldnames)
如果不存在:
writer.writeheader()
for ntp in response.css ('div.content-1col-nobox'):
name = ntp.xpath('normalize-space(// h5 / text())')。extract_first()
brand = ntp.xpath('normalize-space(// div / blockquote [1] / p / text()[4])')。extract_first()
version = ntp.xpath('normalize-space(/ / div / blockquote [1] / p / text()[6])')。extract_first()
result = response.xpath(name +per+ brand + - + version)
MPN = ntp.xpath('normalize-space(// tr [2] / td [1] /文本())')。extract_first()
description = ntp.xpath('normalize-space(// div [6] / div [1] / div [2] / div / blockquote [2] / p / text())')。extract_first()
price = ntp.xpath('normalize-space(// tr [2] / td [@ id =right_cell] [1])')。extract ()[0] .split(None,1)[0] .replace(,,。)
picUrl = response.urljoin(ntp.xpath('// div / p [3] / (SiteID = Italy | Country = IT | Currency = EUR | Version = 745 | CC = UTF-8)':'Add',\
'* Category':'30895',\
'* Title':name,\
'Model':result, \
'ConditionID':'1000',\
'PostalCode':'154',\
'VATPercent':'22',\
'* C:Marca':'优先配件',\
'产品:EAN':'',\
'* C :MPN':MPN,\
'PicURL':picUrl,\
'Description':description,\
'* Format':'FixedPrice',\
'* Duration':'GTC',\
'StartPrice':price,\
'* Quantity':'3',\
'PayPalAccepted':'1' ,\
'PayPalEmailAddress':'your @ gmail.com',\
'PaymentInstructions':'your @ gmail.com',\
'* Location':'Italia ',\
'ShippingService-1:FreeShipping':'1',\
'ShippingService-1:Option':'IT_OtherCourier3To5Days',\
'ShippingServ冰块1:成本':'10',\
'ShippingService-1:优先级':'1',\
'ShippingService-2:Option':'IT_QuickPackage3',\
'ShippingService-2:Cost':'15',\
'ShippingService-2:Priority':'2',\
'ShippingService-3:Option':'IT_QuickPackage1' ,\
'ShippingService-3:Cost':'12',\
'ShippingService-3:Priority':'3',\
'ShippingService-4:Option' :'IT_Pickup',\
'ShippingService-4:Cost':'0',\
'ShippingService-4:Priority':'4',\
'* DispatchTimeMax ':'5',\
'* ReturnsAcceptedOption':'ReturnsAccepted',\
'ReturnsWithinOption':'Days_14',\
'Re fundOption':'MoneyBackOrExchange',\
'ShippingCostPaidByOption':'Buyer'})
任何帮助将不胜感激。
干杯。
Valter。
最后@Casper是正确的,在评论中我们看到了正确的答案
{} per {} - {}。格式(名称,品牌,版本)
这是最终结果:
name = ntp.xpath('normalize-空格(// h5 / text())')。extract_first()
brand = ntp.xpath('normalize-space(// div / blockquote [1] / p // text()[4]) ').extract_first()
version = ntp.xpath('normalize-space(// div / blockquote [1] / p // text()[6])')。extract_first()
结果=({} per {} - {}。格式(名称,品牌,版本))
writer.writerow({
'* Title':result ,\
I hope everyone is doing well.
I have this code(part of it) for a spider, now this is the last part of the scraping, here it start to scrape and then write in the csv file, so I got this doubdt, it is possible to join or add xpath with the result printed in the file, for example:
<h5>Soundbooster</h5> <br><br>
<p class="details">
<b>Filtro attuale</b>
</p>
<blockquote>
<p>
<b>Catalogo:</b>
Aliant</br>
<b>Marca e Modello:</b>
Mazda - 3 </br>
<b>Versione:</b>
(3th gen) 2013-now (Petrol)
</p>
</blockquote>
I want to join the following for one field in the csv file, should be something like this:
Soundbooster per Mazda - 3 - (3th gen) 2013-now (Petrol)
And here it is where I am lost, It is possible? I don't know if I have to use add.xpath or join or another method and how to use it right.
This is part of my code:
def parse_content_details(self, response):
exists = os.path.isfile("ntp/ntp_aliant.csv")
with open("ntp/ntp_aliant.csv", "a+", newline='') as csvfile:
fieldnames = ['*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)','*Category','*Title','Model','ConditionID','PostalCode',\
'VATPercent','*C:Marca','Product:EAN','*C:MPN','PicURL', 'Description','*Format','*Duration','StartPrice','*Quantity','PayPalAccepted','PayPalEmailAddress',\
'PaymentInstructions','*Location','ShippingService-1:FreeShipping', 'ShippingService-1:Option','ShippingService-1:Cost', 'ShippingService-1:Priority',\
'ShippingService-2:Option','ShippingService-2:Cost','ShippingService-2:Priority','ShippingService-3:Option','ShippingService-3:Cost',\
'ShippingService-3:Priority','ShippingService-4:Option','ShippingService-4:Cost','ShippingService-4:Priority','*DispatchTimeMax',\
'*ReturnsAcceptedOption','ReturnsWithinOption','RefundOption','ShippingCostPaidByOption']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
if not exists:
writer.writeheader()
for ntp in response.css('div.content-1col-nobox'):
name = ntp.xpath('normalize-space(//h5/text())').extract_first()
brand = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[4])').extract_first()
version = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[6])').extract_first()
result = response.xpath(name + " per " + brand + " - " + version)
MPN = ntp.xpath('normalize-space(//tr[2]/td[1]/text())').extract_first()
description = ntp.xpath('normalize-space(//div[6]/div[1]/div[2]/div/blockquote[2]/p/text())').extract_first()
price = ntp.xpath('normalize-space(//tr[2]/td[@id="right_cell"][1])').extract()[0].split(None,1)[0].replace(",",".")
picUrl = response.urljoin(ntp.xpath('//div/p[3]/img/@src').extract_first())
writer.writerow({
'*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)':'Add',\
'*Category':'30895',\
'*Title': name,\
'Model': result,\
'ConditionID': '1000',\
'PostalCode':'154',\
'VATPercent':'22',\
'*C:Marca':'Priority Parts',\
'Product:EAN':'',\
'*C:MPN': MPN,\
'PicURL': picUrl,\
'Description': description,\
'*Format' : 'FixedPrice',\
'*Duration': 'GTC',\
'StartPrice' : price,\
'*Quantity':'3',\
'PayPalAccepted': '1',\
'PayPalEmailAddress' : 'your@gmail.com',\
'PaymentInstructions' : 'your@gmail.com',\
'*Location' : 'Italia',\
'ShippingService-1:FreeShipping' : '1',\
'ShippingService-1:Option' : 'IT_OtherCourier3To5Days',\
'ShippingService-1:Cost' : '10',\
'ShippingService-1:Priority' : '1',\
'ShippingService-2:Option' : 'IT_QuickPackage3',\
'ShippingService-2:Cost' : '15',\
'ShippingService-2:Priority' : '2',\
'ShippingService-3:Option': 'IT_QuickPackage1',\
'ShippingService-3:Cost' : '12',\
'ShippingService-3:Priority' : '3',\
'ShippingService-4:Option': 'IT_Pickup',\
'ShippingService-4:Cost' : '0',\
'ShippingService-4:Priority' : '4',\
'*DispatchTimeMax' : '5',\
'*ReturnsAcceptedOption' : 'ReturnsAccepted',\
'ReturnsWithinOption' : 'Days_14',\
'RefundOption' : 'MoneyBackOrExchange',\
'ShippingCostPaidByOption' : 'Buyer'})
Any help will be appreciate it. Cheers. Valter.
At the end @Casper was right, in the comments we see the right answer
"{} per {} - {}".format(name, brand, version)
This is the final result:
name = ntp.xpath('normalize-space(//h5/text())').extract_first()
brand = ntp.xpath('normalize-space(//div/blockquote[1]/p//text()[4])').extract_first()
version = ntp.xpath('normalize-space(//div/blockquote[1]/p//text()[6])').extract_first()
result = ("{} per {} - {}".format(name, brand, version))
writer.writerow({
'*Title': result,\
这篇关于Scrapy add.xpath或连接xpath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!