写入csv python每次都水平追加 [英] Write to csv python Horizontally append Each time

查看:146
本文介绍了写入csv python每次都水平追加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了这段代码,使用页面URL为某些元素擦除亚马逊,现在我想添加一个csv函数,它允许我使用以下变量水平添加CSV列: - (Date_time,price,Merchant,Sellers_count)每个时间我运行代码这个列应该在右边添加而不删除任何现有的列。这是代码&要添加的表格式

I Wrote this Piece of code which scrapes Amazon for some elements using page URL, Now i want to add a csv function which enables me to append horizontally CSV columns With Following varibles :- ( Date_time, price, Merchant, Sellers_count ) Each time i run the code this columns should be added on right without removing any existing columns ..Here is code & table format to whom i want to add

# -*- coding: cp1252 -*-
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
import requests, csv, time, urllib2, gspread, os, ast, datetime
from scrapy import Selector as s
from lxml import html
from random import randint
from oauth2client.client import SignedJwtAssertionCredentials

x = lambda x: source.xpath(x).extract()

links = ['http://www.amazon.com/dp/B00064NZCK',
         'http://www.amazon.com/dp/B000CIU7F8',
         'http://www.amazon.com/dp/B000H5839I',
         'http://www.amazon.com/dp/B000LTLBHG',
         'http://www.amazon.com/dp/B000SDLXKU',
         'http://www.amazon.com/dp/B000SDLXNC',
         'http://www.amazon.com/dp/B000SPHPWI',
         'http://www.amazon.com/dp/B000UUMHRE']

driver = webdriver.Firefox()
#driver.set_page_load_timeout(30)

for Url in links:
    try:
        driver.get(Url)
    except:
        pass
    time.sleep(randint(1,3))
    try:
        html = driver.page_source
        source = s(text=html,type="html")
    except:
        pass
    try:
        Page_link = x('//link[@rel="canonical"]//@href')
    except:
        pass
    try:
        Product_Name = x('//span[@id="productTitle"]/text()')
    except:
        pass
    Product_Name = str(Product_Name).encode('utf-8'); Product_Name = Product_Name.replace("[u'","").replace("']","")
    try:
        price = x('//span[@id="priceblock_ourprice"]//text()')
    except:
        pass
    try:
        Merchant = x('//div[@id="merchant-info"]//a//text()')
    except:
        pass
    try:
        Sellers_count = x('//span[@class="olp-padding-right"]//a/text()')
    except:
        pass
    if Merchant == []:
        Merchant = 'Amazon'
    else:
        Merchant = Merchant[0]
    price = str(price).replace("[u'","").replace("']","")
    if len(Sellers_count)>0:
        Sellers_count = Sellers_count[0].encode('utf-8')
    else:
        Sellers_count = str(Sellers_count).encode('utf-8')
    try:
        Sellers_count = Sellers_count.replace(" new",""); Sellers_count = int(Sellers_count)-1
    except:
        pass
    if Sellers_count == []:
        Sellers_count = str(Sellers_count).replace("[]","")
    else:
        Sellers_count = Sellers_count
    Date_time = datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
    print Date_time, Product_Name, Url, price, Merchant, Sellers_count

我想要的现有表格格式附加到: -

The Existing table format I want to append to :-

ASIN    ID  PRODUCT URL
B00064NZCK  MG-5690 BigMouth Inc Over The Hill Parking Privelege Permit http://www.amazon.com/dp/B00064NZCK
B000CIU7F8  BM1102  BigMouth Inc Pocket Disgusting Sounds Machine   http://www.amazon.com/dp/B000CIU7F8
B000H5839I  MG-4774 BigMouth Inc All Occasion Over The Hill Cane    http://www.amazon.com/dp/B000H5839I
B000LTLBHG  BM1234  BigMouth Inc Beer Belt / 6 Pack Holster(Black)  http://www.amazon.com/dp/B000LTLBHG
B000SDLXKU  BM1103  BigMouth Inc Covert Clicker http://www.amazon.com/dp/B000SDLXKU
B000SDLXNC  BM1254  BigMouth Inc Inflatable John    http://www.amazon.com/dp/B000SDLXNC
B000SPHPWI  SO:AP   Design Sense Generic Weener Kleener Soap    http://www.amazon.com/dp/B000SPHPWI
B000UUMHRE  MG-5305 BigMouth Inc Over the Hill Rectal Thermometer   http://www.amazon.com/dp/B000UUMHRE


推荐答案

以下内容可以满足您的需求。它读入您现有的CSV文件并添加四个新的列标题。然后,对于每个URL,您的代码将获取新数据。然后将其添加到现有行的末尾(顺序无关紧要)。然后,创建更新的CSV文件:

The following should do what you need. It reads in your existing CSV file and adds the four new column headings. For each URL your code then obtains the new data. This is then added to the end of the existing rows (order does not matter). Afterwards, an updated CSV file is created:

import csv

links = ['http://www.amazon.com/dp/B00064NZCK',
         'http://www.amazon.com/dp/B000CIU7F8',
         'http://www.amazon.com/dp/B000H5839I',
         'http://www.amazon.com/dp/B000LTLBHG',
         'http://www.amazon.com/dp/B000SDLXKU',
         'http://www.amazon.com/dp/B000SDLXNC',
         'http://www.amazon.com/dp/B000SPHPWI',
         'http://www.amazon.com/dp/B000UUMHRE']

with open('existing.csv', 'r') as f_input:
    csv_input = csv.reader(f_input)

    # Read in the existing CSV file 
    headers = next(csv_input) + ["Date_time", "price", "Merchant", "Sellers_count"]
    rows = list(csv_input)

    # Create an index just in case the order changes or there are other entries
    url_indexes = {row[3] : index for index, row in enumerate(rows)}

    for url in links:
        # Insert your existing code here to get the actual data

        Date_time = "2015-08-27_12-34-56"
        price = "123.45"
        Merchant = "Def"
        Sellers_count = "42"

        rows[url_indexes[url]].extend([Date_time, price, Merchant, Sellers_count])

# Write the updated CSV to a new file

with open('updated.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(headers)
    csv_output.writerows(rows)

这篇关于写入csv python每次都水平追加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆