找到模式匹配并连接 python 中的其余行 [英] finding a pattern match and concatenating the rest of lines in python

查看:50
本文介绍了找到模式匹配并连接 python 中的其余行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个小数据集要清理.我已经在 Pycharm 中打开了文本文件.数据集是这样的:

I have a small data set to clean. I have opened the text file in Pycharm. The data set is like this:

Code-6667+
Name of xyz company+ 
Address +
Number+ 
Contact person+
Code-6668+
Name of abc company, Address, number, contact person+
Code-6669+
name of company, Address+
number, contact person +

我需要将代码行分开并将其余的行连接(或粘贴)在一起,直到下一个代码行出现.通过这种方式,我可以将我的数据分成 2 个字段,即公司代码,其次是所有详细信息都在一个字段中.最终输出是一张表.输出应该是这样的:

I need to separate the code lines and concatenate (or paste) the rest of the lines together till the next code line comes. This way I could separate my data into 2 fields, namely, the code of the company and secondly all the details all in one field. The eventual output being a table. The output should be something like this :

Code6667 - Company details 
Code6668 - Company details

有没有办法使用循环来做到这一点?在 R 编程中尝试过,但现在在 Python 中尝试.

Is there a way I could use a loop to do this? Tried this in R programming but now attempting it in Python.

推荐答案

(注意:我很确定您是否要保留 + 符号.以下代码假设您这样做.否则通过一些字符串操作很容易摆脱 + ).

(Note: I'm note quite sure whether you want to keep the + sign. The following codes assume you do. Otherwise it's easy to get rid of the + with a bit of string manipulations).

这是输入文件...

dat1.txt:

Code-6667+
Name of xyz company+ 
Address +
Number+ 
Contact person+
Code-6668+
Name of abc company,Address, number, contact person+
Code-6669+
name of company , Address+
number , contact person +

代码

这是代码...(注释/取消注释 Python 2.x/3.x 版本的 print 块)

mycode.py:

import sys
print sys.version

# open input text file
f = open("dat1.txt", "r")

# initialise our final output - a phone book
phone_book = {}

# parse text file data to phone book, in a specific format
code = ''
for line in f:
        if line[:5] == 'Code-':
            code = (line[:4] + line[5:]).strip()
            phone_book[code] = []
        elif code:
            phone_book[code].append(line.strip())    
        else:
            continue

# close text file
f.close()


# print result to console (for ease of debugging). Comment this block if you want:
for key, value in phone_book.items():

    #python 3.x
    # print("{0} - Company details: {1}".format(key, value))

    #python 2.x
    print key + " - Company details: " + "".join(value)

# write phone_book to dat2.txt
f2 = open("dat2.txt", "w")
for key, value in phone_book.items():
    f2.write("{0} - Company details: {1}\n".format(key, value))
f2.close()

输出

这是您将在控制台中看到的内容(通过 print())或 dat2.txt(通过 f2.write())...

 Output

Here is what you will see in console (via print()) or dat2.txt (via f2.write())...

# Code6667+ - Company details: ['Name of xyz company+', 'Address +', 'Number+', 'Contact person+']
# Code6668+ - Company details: ['Name of abc company,Address, number, contact person+']
# Code6669+ - Company details: ['name of company , Address+', 'number , contact person +']

截图

这篇关于找到模式匹配并连接 python 中的其余行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆