找到模式匹配并连接 python 中的其余行 [英] finding a pattern match and concatenating the rest of lines in python
问题描述
我有一个小数据集要清理.我已经在 Pycharm 中打开了文本文件.数据集是这样的:
I have a small data set to clean. I have opened the text file in Pycharm. The data set is like this:
Code-6667+
Name of xyz company+
Address +
Number+
Contact person+
Code-6668+
Name of abc company, Address, number, contact person+
Code-6669+
name of company, Address+
number, contact person +
我需要将代码行分开并将其余的行连接(或粘贴)在一起,直到下一个代码行出现.通过这种方式,我可以将我的数据分成 2 个字段,即公司代码,其次是所有详细信息都在一个字段中.最终输出是一张表.输出应该是这样的:
I need to separate the code lines and concatenate (or paste) the rest of the lines together till the next code line comes. This way I could separate my data into 2 fields, namely, the code of the company and secondly all the details all in one field. The eventual output being a table. The output should be something like this :
Code6667 - Company details
Code6668 - Company details
有没有办法使用循环来做到这一点?在 R 编程中尝试过,但现在在 Python 中尝试.
Is there a way I could use a loop to do this? Tried this in R programming but now attempting it in Python.
推荐答案
(注意:我很确定您是否要保留 +
符号.以下代码假设您这样做.否则通过一些字符串操作很容易摆脱 +
).
(Note: I'm note quite sure whether you want to keep the +
sign. The following codes assume you do. Otherwise it's easy to get rid of the +
with a bit of string manipulations).
这是输入文件...
dat1.txt
:
Code-6667+
Name of xyz company+
Address +
Number+
Contact person+
Code-6668+
Name of abc company,Address, number, contact person+
Code-6669+
name of company , Address+
number , contact person +
代码
这是代码...(注释/取消注释 Python 2.x/3.x 版本的 print
块)
mycode.py
:
import sys
print sys.version
# open input text file
f = open("dat1.txt", "r")
# initialise our final output - a phone book
phone_book = {}
# parse text file data to phone book, in a specific format
code = ''
for line in f:
if line[:5] == 'Code-':
code = (line[:4] + line[5:]).strip()
phone_book[code] = []
elif code:
phone_book[code].append(line.strip())
else:
continue
# close text file
f.close()
# print result to console (for ease of debugging). Comment this block if you want:
for key, value in phone_book.items():
#python 3.x
# print("{0} - Company details: {1}".format(key, value))
#python 2.x
print key + " - Company details: " + "".join(value)
# write phone_book to dat2.txt
f2 = open("dat2.txt", "w")
for key, value in phone_book.items():
f2.write("{0} - Company details: {1}\n".format(key, value))
f2.close()
输出
这是您将在控制台中看到的内容(通过 print()
)或 dat2.txt
(通过 f2.write()
)...
Output
Here is what you will see in console (via print()
) or dat2.txt
(via f2.write()
)...
# Code6667+ - Company details: ['Name of xyz company+', 'Address +', 'Number+', 'Contact person+']
# Code6668+ - Company details: ['Name of abc company,Address, number, contact person+']
# Code6669+ - Company details: ['name of company , Address+', 'number , contact person +']
截图
这篇关于找到模式匹配并连接 python 中的其余行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!