Python-将字节/unicode选项卡分隔的数据转换为csv文件 [英] Python - Convert bytes / unicode tab delimited data to csv file

查看:99
本文介绍了Python-将字节/unicode选项卡分隔的数据转换为csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从API中提取以下数据.数据以b前缀开头,该前缀将根据 Python进行指示3.3文档,我们正在处理字节字面量",其转义序列\t\n分别表示ASCII水平制表符(TAB)和ASCII换行符(LF).

I'm pulling the following line of data from an API. The data starts with a b prefix which would indicate according to the Python 3.3 documentation that we are dealing with "a bytes literal" with the escape sequences \t and \n representing the ASCII Horizontal Tab (TAB) and ASCII Linefeed (LF) respectively.

b'settlement-id\tsettlement-start-date\tsettlement-end-date\tdeposit-date\ttotal-amount\tcurrency\ttransaction-type\torder-id\tmerchant-order-id\tadjustment-id\tshipment-id\tmarketplace-name\tamount-type\tamount-description\tamount\tfulfillment-id\tposted-date\tposted-date-time\torder-item-code\tmerchant-order-item-id\tmerchant-adjustment-item-id\tsku\tquantity-purchased\n7293436482\t03.05.2018 09:10:07 UTC\t04.05.2018 20:30:23 UTC\t06.05.2018 20:30:23 UTC\t53,44\tEUR\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemPrice\tPrincipal\t179,99\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemFees\tCommission\t-32,40\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemPrice\tPrincipal\t-109,99\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tCommission\t19,80\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tRefundCommission\t-3,96\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n'

当我使用.decode("utf-8")将此数据转换为字符串时,我得到了对应的制表符分隔数据:

When I convert this data to a string using .decode("utf-8") I get the corresponding tab delimited data:

settlement-id   settlement-start-date   settlement-end-date deposit-date    total-amount    currency    transaction-type    order-id    merchant-order-id   adjustment-id   shipment-id marketplace-name    amount-type amount-description  amount  fulfillment-id  posted-date posted-date-time    order-item-code merchant-order-item-id  merchant-adjustment-item-id sku quantity-purchased
7293436482  03.05.2018 09:10:07 UTC 04.05.2018 20:30:23 UTC 06.05.2018 20:30:23 UTC 53,44   EUR                                                                 
7293436482                      Order   303-3746292-6119509         DRGC8lFbB   Amazon.de   ItemPrice   Principal   179,99  MFN 03.05.2018  03.05.2018 17:12:22 UTC 30407746733299          3700546702556-180412-chp-18c10347-1 1
7293436482                      Order   303-3746292-6119509         DRGC8lFbB   Amazon.de   ItemFees    Commission  -32,40  MFN 03.05.2018  03.05.2018 17:12:22 UTC 30407746733299          3700546702556-180412-chp-18c10347-1 1
7293436482                      Refund  305-1251749-5602732 305-1251749-5602732 amzn1:crow:YZkTuxs4RhO8FpZez3cGCg       Amazon.de   ItemPrice   Principal   -109,99 AFN 04.05.2018  04.05.2018 18:24:39 UTC 38048998219979      142721169810    3700546702082-180124-jpn-131N28-6   
7293436482                      Refund  305-1251749-5602732 305-1251749-5602732 amzn1:crow:YZkTuxs4RhO8FpZez3cGCg       Amazon.de   ItemFees    Commission  19,80   AFN 04.05.2018  04.05.2018 18:24:39 UTC 38048998219979      142721169810    3700546702082-180124-jpn-131N28-6   
7293436482                      Refund  305-1251749-5602732 305-1251749-5602732 amzn1:crow:YZkTuxs4RhO8FpZez3cGCg       Amazon.de   ItemFees    RefundCommission    -3,96   AFN 04.05.2018  04.05.2018 18:24:39 UTC 38048998219979      142721169810    3700546702082-180124-jpn-131N28-6   

但是,我似乎无法将这些数据保存到制表符分隔的csv文件中.我尝试了几种方法将此数据保存到csv文件中,但都失败了,包括以下内容:

However, I cannot seem to save this data to a tab delimited csv file. I have tried several methods to save this data to a csv file all of which have failed including the following:

with open("folder_GET_V2_SETTLEMENT_REPORT_DATA_FLAT_FILE_V2_/" + grl_id + ".csv", "w") as csv_file:
    writer = csv.writer(csv_file)
    for row in csv_file:
        print(row)

这给了我以下错误:

    for row in csv_file:
io.UnsupportedOperation: not readable

更新: 因此,事实证明问题出在其他地方.在进行各种测试时,我实际上设法生成了与您相同的文件,以为输出看起来不正确,因此无法正常工作.在excel中打开文件时,数据分为两列.

Update: So it turns out the problem lies elsewhere. I had actually managed to generate the same file as you during my various tests thought it wasn't working as the output looked wrong. When opening the file in excel the data was split into two columns.

我现在发现原因是使用欧洲表示小数的数字表示某些数字,这是一个逗号179,99.因此,Excel将其解释为定界符,而如果我在记事本中打开文件,它将正确读取.

I have now figured out that the reason for that is there are some numbers using the european way of noting decimals which is a coma 179,99. Excel is therefore interpreting this as a delimiter whereas if I open the file in Notepad it reads correctly.

推荐答案

好吧,您收到此错误是因为您希望将数据写入csv文件,但是在for循环中,您正尝试从该文件读取数据.如果我理解正确,则希望接收bytes对象,并将其很好地写到制表符分隔的csv文件中.下面的代码可以做到这一点:

Well you are getting the error because you wish to write the data to the csv file but in the for loop you are trying to read from the file. If I understand correctly, you wish to take in the bytes object, and write it nicely into a tab separated csv file. The following code would do that:

import csv, re

orig = b'settlement-id\tsettlement-start-date\tsettlement-end-date\tdeposit-date\ttotal-amount\tcurrency\ttransaction-type\torder-id\tmerchant-order-id\tadjustment-id\tshipment-id\tmarketplace-name\tamount-type\tamount-description\tamount\tfulfillment-id\tposted-date\tposted-date-time\torder-item-code\tmerchant-order-item-id\tmerchant-adjustment-item-id\tsku\tquantity-purchased\n7293436482\t03.05.2018 09:10:07 UTC\t04.05.2018 20:30:23 UTC\t06.05.2018 20:30:23 UTC\t53,44\tEUR\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemPrice\tPrincipal\t179,99\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tOrder\t303-3746292-6119509\t\t\tDRGC8lFbB\tAmazon.de\tItemFees\tCommission\t-32,40\tMFN\t03.05.2018\t03.05.2018 17:12:22 UTC\t30407746733299\t\t\t3700546702556-180412-chp-18c10347-1\t1\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemPrice\tPrincipal\t-109,99\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tCommission\t19,80\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n7293436482\t\t\t\t\t\tRefund\t305-1251749-5602732\t305-1251749-5602732\tamzn1:crow:YZkTuxs4RhO8FpZez3cGCg\t\tAmazon.de\tItemFees\tRefundCommission\t-3,96\tAFN\t04.05.2018\t04.05.2018 18:24:39 UTC\t38048998219979\t\t142721169810\t3700546702082-180124-jpn-131N28-6\t\n'

# Split the long string into a list of lines
data = orig.decode('utf-8').splitlines()

# Open the file for writing
with open("tmp.csv", "w") as csv_file:
    # Create the writer object with tab delimiter
    writer = csv.writer(csv_file, delimiter = '\t')
    for line in data:
        # Writerow() needs a list of data to be written, so split at all empty spaces in the line 
        writer.writerow(re.split('\s+',line))

这篇关于Python-将字节/unicode选项卡分隔的数据转换为csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆