将带标题的列添加到制表符分隔的文本文件? [英] Add column with a header to a tab-delimited text file?
问题描述
我意识到有一种方法可以使用'awk'添加一个列。
但我不太熟悉这个选择,所以我虽然' d请问是否有办法使用Python为制表符分隔的文本文件添加列?
具体来说,我需要在以下位置添加列:
我有以下格式的数据(我意识到,格式可能不是那么清楚,但手机,电子邮件和网站对应不同的列):
姓名电话电子邮件网站
DG Albright MS
Lannister G. Cersei M.A.T.,CEP 111-222-3333 cersei@got.com www.got.com
Argle D. Bargle Ed.M.
Sam D. Man Ed.M. 000-000-1111 dman123@gmail.com www.daManWithThePlan.com
Sam D. Man Ed.M.
Sam D. Man Ed.M. 111-222-333 dman123@gmail.com www.daManWithThePlan.com
D G Bamf M.S.
Amy Tramy Lamy博士
我正在为第一列写一个解析器。我想添加实践领域,在这种情况下,ex将是CEP,到一个名为area的新列。我遍历该文件,并使用pop函数将该区域从第一列的其余部分中分离出来。
这是我的脚本:
def parse_ieca_gc(s):
### HANDLE NAME ELEMENT ######
degrees = ['MAT','Ph.D。','MA','JD',
'Ed.M.','MA','MBA',
' ,'M.Div。','M.Ed.',
'RN','BSEd。','MD','MS']
degrees_list = []
#检查名称字符串是否有
#由
执行的区域#检查是否有逗号分隔符
如果'['name']中的',':
#从名称
#和区域的单独练习领域绑定到var'area'
split_area_nmdeg = s ['name']。split(',')
area = split_area_nmdeg.pop()
#将名称和度除以空格。
#如果有一个deg,它将匹配一个
#的元素,并将存储deg列表。
#deg是被删除的name_deg列表
#和所有剩下的是名称。
split_name_deg = re.split('\s',split_area_nmdeg [0])
用于split_name_deg中的单词:
用于度数:
如果deg ==字:
degrees_list.append(split_name_deg.pop())
name ='。.join(split_name_deg)
预期输出
姓名手机电子邮件网站区域学位
DG Albright MA
Lannister G 。Cersei 111-222-3333 cersei@got.com www.got.com CEP MAT
Argle D. Bargle Ed.M.
Sam D. Man 000-000-1111 dman123@gmail.com www.daManWithThePlan.com Ed.M.
Sam D. Man Ed.M.
Sam D. Man 111-222-333 dman123@gmail.com www.daManWithThePlan.com Ed.M.
D G Bamf M.S.
Amy Tramy Lamy博士
此代码也无效:
fieldnames = ['name','degrees','area','phone','email','website']
with open('ieca_first_col_fake_text.txt', 'r')作为输入:
with open('new_col_dict.txt','w')as output:
dict_writer = csv.DictWriter(output,fieldnames,delimiter ='\t')
dict_reader = csv.DictReader(input,delimiter ='\t')
#dict_writer.writeheader(fieldnames)
对于dict_reader中的行:
print row
dict_writer .writerow(fieldnames)
dict_writer.writerow(row)
请参阅这里的答案,一个标签分隔文件就像CSV,以制表符作为分隔符。
I realize that there is a way to add a column using 'awk'.
But I'm not so familiar with this alternative, so I though I'd ask whether there's a way to add a column to a tab-delimited text file using Python?
Specifically, here's the scenario I need to add a column in:
I have data in the following format (I realize looking at it that the format may not be so clear, but the phone, email, and website correspond to different columns):
name phone email website
D G Albright M.S.
Lannister G. Cersei M.A.T., CEP 111-222-3333 cersei@got.com www.got.com
Argle D. Bargle Ed.M.
Sam D. Man Ed.M. 000-000-1111 dman123@gmail.com www.daManWithThePlan.com
Sam D. Man Ed.M.
Sam D. Man Ed.M. 111-222-333 dman123@gmail.com www.daManWithThePlan.com
D G Bamf M.S.
Amy Tramy Lamy Ph.D.
And I'm writing a parser for the first column. I want to add the 'area of practice', in this case an ex would be 'CEP', to a new column entitled 'area'. I iterate through the file, and use the pop function to separate out the area from the rest of the first column. Then I add this to a list, which just dies in the function because it's not added to the spreadsheet.
Here's my script:
def parse_ieca_gc(s):
### HANDLE NAME ELEMENT ######
degrees = ['M.A.T.','Ph.D.','MA','J.D.',
'Ed.M.', 'M.A.', 'M.B.A.',
'Ed.S.', 'M.Div.', 'M.Ed.',
'RN', 'B.S.Ed.', 'M.D.', 'M.S.']
degrees_list = []
# check whether the name string has
# an area of practice by
# checking if there's a comma separator
if ',' in s['name']:
# separate area of practice from name
# and degree and bind this to var 'area'
split_area_nmdeg = s['name'].split(',')
area = split_area_nmdeg.pop()
# Split the name and deg by spaces.
# If there's a deg, it will match with one
# of elements and will be stored deg list.
# The deg is removed name_deg list
# and all that's left is the name.
split_name_deg = re.split('\s',split_area_nmdeg[0])
for word in split_name_deg:
for deg in degrees:
if deg == word:
degrees_list.append(split_name_deg.pop())
name = ' '.join(split_name_deg)
Expected output
name phone email website area degrees
D G Albright M.A.
Lannister G. Cersei 111-222-3333 cersei@got.com www.got.com CEP M.A.T.
Argle D. Bargle Ed.M.
Sam D. Man 000-000-1111 dman123@gmail.com www.daManWithThePlan.com Ed.M.
Sam D. Man Ed.M.
Sam D. Man 111-222-333 dman123@gmail.com www.daManWithThePlan.com Ed.M.
D G Bamf M.S.
Amy Tramy Lamy Ph.D.
This code is also not working:
fieldnames = ['name','degrees','area','phone','email','website']
with open('ieca_first_col_fake_text.txt','r') as input:
with open('new_col_dict.txt','w') as output:
dict_writer = csv.DictWriter(output, fieldnames, delimiter = '\t')
dict_reader = csv.DictReader(input, delimiter = '\t')
#dict_writer.writeheader(fieldnames)
for row in dict_reader:
print row
dict_writer.writerow(fieldnames)
dict_writer.writerow(row)
See answer here, a tab delimeted file is like CSV with tab as separator.
How to add a new column to a CSV file using Python?
这篇关于将带标题的列添加到制表符分隔的文本文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!