使用python重新格式化csv文件? [英] Reformat csv file using python?

查看:76
本文介绍了使用python重新格式化csv文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个只有两个条目的csv文件.在这里:

I have this csv file with only two entries. Here it is:

Meat One,['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']

第一个是标题,第二个是业务标题.

First one is title and second is a business headings.

问题出在第二项.

这是我的代码:

import csv

with open('phonebookCOMPK-Directory.csv', "rt") as textfile:
    reader = csv.reader(textfile)

    for row in reader:
        row5 = row[5].replace("[", "").replace("]", "")
        listt = [(''.join(row5))]
        print (listt[0])

它打印:

'Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers'

我需要做的是创建一个包含这些单词的列表,然后使用for循环像这样打印它们以分别打印每个项目:

What i need to do is that i want to create a list containing these words and then print them like this using for loop to print every item separately:

Abattoirs
Exporters
Food Delivery
Butchers Retail
Meat Dealers-Retail
Meat Freezer
Meat Packers

实际上,我正在尝试重新格式化当前的csv文件并对其进行清理,以使其更加精确和易于理解.

Actually I am trying to reformat my current csv file and clean it so it can be more precise and understandable.

完整的csv第一行是这样的:

Complete 1st line of csv is this:

Meat One,+92-21-111163281,Al Shaheer Corporation,Retailers,2008,"['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']","[[' Outlets Address : Shop No. Z-10, Station Shopping Complex, MES Market, Malir-Cantt, Karachi. Landmarks : MES Market, Station Shopping Complex City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi. Landmarks : Nadra Chowrangi, Sky Garden, Tipu Sultan Road City : Karachi UAN : +92-21-111163281 '], ["" Outlets Address : Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi. Landmarks : Boat Basin, Jans Broast, Khayaban-e-Roomi City : Karachi UAN : +92-21-111163281 View Map ""], [' Outlets Address : Gulistan-e-Johar, Karachi. Landmarks : Perfume Chowk City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Tee Emm Mart, Creek Vista Appartments, Khayaban-e-Shaheen, Phase VIII, DHA, Karachi. Landmarks : Creek Vista Appartments, Nueplex Cinema, Tee Emm Mart, The Place City : Karachi Mobile : 0302-8333666 '], [' Outlets Address : Y-Block, DHA, Lahore. Landmarks : Y-Block City : Lahore UAN : +92-42-111163281 '], [' Outlets Address : Adj. PSO, Main Bhittai Road, Jinnah Supermarket, F-7 Markaz, Islamabad. Landmarks : Bhittai Road, Jinnah Super Market, PSO Petrol Pump City : Islamabad UAN : +92-51-111163281 ']]","Agriculture, fishing & Forestry > Farming equipment & services > Abattoirs in Pakistan"

First column is Name
Second column is Number
Third column is Owner
Forth column is Business type
Fifth column is Y.O.E
Sixth column is Business Headings
Seventh column is Outlets (List of lists containing every branch address)
Eighth column is classification

使用csv.reader没有任何限制,我可以使用任何可用于清理文件的技术.

There is no restriction of using csv.reader, I am open to any technique available to clean my file.

推荐答案

从两项独立的任务来思考:

Think of it in terms of two separate tasks:

  • 从肮脏"源(此CSV文件)中收集一些数据项
  • 将数据存储在某个地方,以便可以轻松地以编程方式访问和操作(根据您要对其进行的操作)

一种实现方法是使用函数 deserialize_business()从CSV中的每个传入行中提取结构化的业务信息.此功能可能很复杂,因为这是任务的本质,但还是建议将其拆分为自包含的较小函数(例如 get_outlets() get_headings(), 等等).此函数可以返回字典,但根据您的需要,它可以是[named]元组,自定义对象等.

One way to do this is to have a function deserialize_business() to distill structured business information from each incoming line in your CSV. This function can be complex because that’s the nature of the task, but still it’s advisable to split it into self-containing smaller functions (such as get_outlets(), get_headings(), and so on). This function can return a dictionary but depending on what you want it can be a [named] tuple, a custom object, etc.

此功能将是此特定CSV数据源的适配器".

This function would be an ‘adapter’ for this particular CSV data source.

反序列化功能示例:

def deserialize_business(csv_line):
    """
    Distills structured business information from given raw CSV line.
    Returns a dictionary like {name, phone, owner,
    btype, yoe, headings[], outlets[], category}.
    """

    pieces = [piece.strip("[[\"\']] ") for piece in line.strip().split(',')]

    name = pieces[0]
    phone = pieces[1]
    owner = pieces[2]
    btype = pieces[3]
    yoe = pieces[4]

    # after yoe headings begin, until substring Outlets Address
    headings = pieces[4:pieces.index("Outlets Address")]

    # outlets go from substring Outlets Address until category
    outlet_pieces = pieces[pieces.index("Outlets Address"):-1]

    # combine each individual outlet information into a string
    # and let ``deserialize_outlet()`` deal with that
    raw_outlets = ', '.join(outlet_pieces).split("Outlets Address")
    outlets = [deserialize_outlet(outlet) for outlet in raw_outlets]

    # category is the last piece
    category = pieces[-1]

    return {
        'name': name,
        'phone': phone,
        'owner': owner,
        'btype': btype,
        'yoe': yoe,
        'headings': headings,
        'outlets': outlets,
        'category': category,
    }

调用示例:

with open("phonebookCOMPK-Directory.csv") as f:
    lineno = 0

    for line in f:
        lineno += 1

        try:
            business = deserialize_business(line)

        except:
            # Bad line formatting?
            log.exception(u"Failed to deserialize line #%s!", lineno)

        else:
            # All is well
            store_business(business)

存储数据

您将使用 store_business()函数获取您的数据结构并将其写入某处.也许它将是另一个结构更好的CSV,也许是多个CSV,一个JSON文件,或者您可以利用SQLite关系数据库功能,因为Python内置了它.

Storing the data

You’ll have the store_business() function take your data structure and write it somewhere. Maybe it’ll be another CSV that’s better structured, maybe multiple CSVs, a JSON file, or you can make use of SQLite relational database facilities since Python has it built-in.

这取决于您以后要做什么.

It all depends on what you want to do later.

在这种情况下,您的数据将被拆分到多个表中.(我使用表"一词,但它可以是CSV文件,尽管您也可以使用SQLite DB,因为Python内置了该表.)

In this case your data would be split across multiple tables. (I’m using the word "table" but it can be a CSV file, although you can as well make use of an SQLite DB since Python has that built-in.)

标识所有可能的业务标题的表:

Table identifying all possible business headings:

business heading ID, name
1, Abattoirs
2, Exporters
3, Food Delivery
4, Butchers Retail
5, Meat Dealers-Retail
6, Meat Freezer
7, Meat Packers

该表标识了所有可能的类别:

Table identifying all possible categories:

category ID, parent category, name
1, NULL, "Agriculture, fishing & Forestry"
2, 1, "Farming equipment & services"
3, 2, "Abattoirs in Pakistan"

识别商家的表格:

business ID, name, phone, owner, type, yoe, category
1, Meat One, +92-21-111163281, Al Shaheer Corporation, Retailers, 2008, 3

描述其店铺的表格:

business ID, city, address, landmarks, phone
1, Karachi UAN, "Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi", "Nadra Chowrangi, Sky Garden, Tipu Sultan Road", +92-21-111163281
1, Karachi UAN, "Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi", "Boat Basin, Jans Broast, Khayaban-e-Roomi", +92-21-111163281

描述其标题的表:

business ID, business heading ID
1, 1
1, 2
1, 3
…

处理所有这些需要复杂的 store_business()函数.如果采用关系方式保存数据,可能值得研究SQLite和一些ORM框架.

Handling all this would require a complex store_business() function. It may be worth looking into SQLite and some ORM framework, if going with relational way of keeping the data.

这篇关于使用python重新格式化csv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆