使用python 3代码在CSV文件中查找特定的标头 [英] Find a specific header in a CSV file using python 3 code

查看:64
本文介绍了使用python 3代码在CSV文件中查找特定的标头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我有了Python 3代码,该代码在CSV文件中获取一列数据,根据空格将每个单元格中的短语分隔为单个单词,然后将数据导出回新的CSV文件中.

right now I have Python 3 code that takes a column of data within a CSV file, delimits the phrases in each cell into individual words based on spaces, then exports the data back into a new CSV file.

我想知道的是,是否有一种方法可以告诉python仅将格式设置代码应用于具有特定标头的特定列?

What I am wondering about is if there is a way to tell python to only apply the formatting code to a specific column with a particular header?

这是我的源数据的样子

Keyword              Source       Number 
Lions Tigers Bears     US          3
Dogs Zebra            Canada       5
Sharks Guppies         US          2

这是我的代码,它基于空格将每个单元格中的短语分隔成单个单词

and here is my code which delimits the phrases in each cell into individual words based on a space

with open(b'C:\Users\jk\Desktop\helloworld.csv', 'r') as datafile:
    data = []
    for row in datafile:
        data.extend(item.strip() for item in row.split())
with open('test.csv', 'w') as a_file:
    for result in data:
        result = ''.join(result)
        a_file.write(result + '\n')
        print(result)

使源数据变为

 Keywords         Source         Number
 Lions            US              3
 Tigers
 Bears
 Dogs             Canada          5

在这种情况下,我只需要将所有这些代码应用于标题为 Keyword 的一列即可.理想情况下,我正在尝试将来源"和编号"中找到的数据扩展到这些新创建的行(Lions US 3-Tigers US 3-Bears US 3等),但是我并没有真正找出那部分了!

In this case, I only need all of this code to apply to the one column with the heading Keyword. Ideally, what I am trying to do is also extend the data found in the "Source" and "Number" to these newly created rows (Lions US 3 -- Tigers US 3 -- Bears US 3 etc) but I haven't really figured out that part yet!

我在论坛上闲逛了一段时间,试图找到答案,我知道您可以告诉python读取CSV文件的第一行,其中放置了标题( headers = file.readline()),但除此之外,我迷路了.使用CSV阅读器会更容易吗?

I've been poking around the forum for awhile trying to find an answer and I know you can tell python to read the first line of the CSV file where the headers are placed (headers = file.readline()) but beyond that I am lost. Would this be an easier task using the CSV reader?

推荐答案

使用 csv 模块将数据分成几列.使用 csv.DictReader()对象可以更轻松地通过标题选择列:

Use the csv module to split your data into columns. Use the csv.DictReader() object to make it easier to select a column by the header:

import csv

source = r'C:\Users\jk\Desktop\helloworld.csv'
dest = 'test.csv'

with open(source, newline='') as inf, open(dest, 'w', newline='') as outf:
    reader = csv.DictReader(inf)
    writer = csv.DictWriter(outf, fieldnames=reader.fieldnames)
    for row in reader:
        words = row['Keyword'].split()
        row['Keyword'] = words[0]
        writer.writerow(row)
        writer.writerows({'Keyword': w} for w in words[1:])

DictReader()将从文件中读取第一行,并将其用作每一行生成的词典的键;所以一行看起来像:

The DictReader() will read the first row from your file and use it as the keys for the dictionaries produced for each row; so a row looks like:

{'Keyword': 'Lions Tigers Bears', 'Source': 'US', 'Number': '3'}

现在,您可以分别寻址每一列,并仅使用 Keyword 列的第一个单词来更新字典,然后为其余单词生成其他行.

Now you can address each column individually, and update the dictionary with just the first word of the Keyword column before producing additional rows for the remaining words.

我在这里假设您的文件用逗号分隔.如果需要其他定界符,则将 delimiter 参数设置为该字符:

I'm assuming here that your files are comma separated. If a different delimiter is needed, then set the delimiter argument to that character:

reader = csv.DictReader(inf, delimiter='\t')

用于制表符分隔的格式.有关各种选项,请参阅模块文档,包括称为方言的预定义格式组合.

for a tab-separated format. See the module documentation for the various options, including pre-defined format combinations called dialects.

演示:

>>> import sys
>>> import csv
>>> from io import StringIO
>>> sample = StringIO('''\
... Keyword,Source,Number
... Lions Tigers Bears,US,3
... Dogs Zebra,Canada,5
... Sharks Guppies,US,2
... ''')
>>> output = StringIO()
>>> reader = csv.DictReader(sample)
>>> writer = csv.DictWriter(output, fieldnames=reader.fieldnames)
>>> for row in reader:
...     words = row['Keyword'].split()
...     row['Keyword'] = words[0]
...     writer.writerow(row)
...     writer.writerows({'Keyword': w} for w in words[1:])
... 
12
15
13
>>> print(output.getvalue())
Lions,US,3
Tigers,,
Bears,,
Dogs,Canada,5
Zebras,,
Sharks,US,2
Guppies,,

这篇关于使用python 3代码在CSV文件中查找特定的标头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆