字符串操作并根据行添加值 [英] string manipulation and adding values based on row they are

查看:59
本文介绍了字符串操作并根据行添加值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件文本分隔文件,我试图对每行进行二进制组合,并为每对提供行数.

I have a file text delimited file which I am trying to make binary combination per each line and giving the number of line to each pairs.

这是一个例子(如果你想要,你也可以在这里下载它https://gist.github.com/anonymous/4107418c63b88c6da44281a8ae7a321f)

Here is an example (you can download it here too if you want https://gist.github.com/anonymous/4107418c63b88c6da44281a8ae7a321f)

"A,B     "
"AFD,DNGS,SGDH   "
"NHYG,QHD,lkd,uyete"
"AFD,TTT"     

我想要这样

A_1     B_1
AFD_2   DNGS_2
AFD_2   SGDH_2
DNGS_2  SGDH_2
NHYG_3  QHD_3
NHYG_3  lkd_3
NHYG_3  uyete_3
QHD_3   lkd_3
QHD_3   uyete_3
lkd_3   uyete_3
AFD_4   TTT_4

这意味着,A_1 和 B_1 来自第一行AFD_2 &DNGS_2 来自第二行等

It means, A_1 and B_1 are coming from the first row AFD_2 & DNGS_2 are coming from the second row , etc etc

我尝试过,但我无法弄清楚

I have tried to do it but I cannot figure it out

#!/usr/bin/python
import itertools
# make my output
out = {}
# give a name to my data 
file_name = 'data.txt'
# read all the lines 
for n, line in enumerate(open(file_name).readlines()):
    # split each line by comma
    item1 = line.split('\t')
    # split each stirg from another one by a comma
    item2 = item1.split(',')
    # iterate over all combinations of 2 strings
    for i in itertools.combinations(item2,2):
        # save the data into out 
        out.write('\t'.join(i))

输出答案 1

"A_1,   B     "_1
"AFD_2, DNGS_2
"AFD_2, SGDH   "_2
DNGS_2, SGDH   "_2
"NHYG_3,    QHD_3
"NHYG_3,    lkd_3
"NHYG_3,    uyete"_3
QHD_3,  lkd_3
QHD_3,  uyete"_3
lkd_3,  uyete"_3
"AFD_4, TTT"_4  

答案 2

"A_1    B     "_1
"AFD_2  DNGS_2
"AFD_2  SGDH   "_2
DNGS_2  SGDH   "_2
"NHYG_3 QHD_3
"NHYG_3 lkd_3
"NHYG_3 uyete"_3
QHD_3   lkd_3
QHD_3   uyete"_3
lkd_3   uyete"_3
"AFD_4  TTT"_4

推荐答案

试试这个

#!/usr/bin/python
from itertools import combinations

with open('data1.txt') as f:
    result = []
    for n, line in enumerate(f, start=1):
        items = line.strip().split(',')

        x = [['%s_%d' % (x, n) for x in item] for item in combinations(items, 2)]
        result.append(x)

for res in result:
    for elem in res:
        print(',\t'.join(elem))

您需要一个列表列表来表示每一对.您可以在循环中使用列表理解来构建它们.

You need a list of list of lists to represent each pair. You can build them using a list comprehension in a loop.

我不确定您想要什么作为您的实际输出格式,但这会打印您的预期输出.

I wasn't sure what you wanted as your actual output format, but this prints your expected output.

如果输入文件中有引号,简单的解决方法是

If there are quotes in the input file, the simple fix is

items = line.replace("\"", "").strip().split(',')

对于上面的代码.如果数据中有其他双引号,这将中断.所以,如果你知道这不正常.

For the above code. This would break if there were other double quotes in the data. So if you know there aren't its ok.

否则,创建一个小函数来去除引号.此示例还写入文件.

Otherwise, create a small function to strip the quotes. This example also writes to a file.

#!/usr/bin/python
from itertools import combinations

def remquotes(s):
    beg, end = 0, len(s)
    if s[0] == '"': beg = 1
    if s[-1] == '"': end = -1
    return s[beg:end]

with open('data1.txt') as f:
    result = []
    for n, line in enumerate(f, start=1):
        items = remquotes(line.strip()).strip().split(',')

        x = [['%s_%d' % (x, n) for x in item] for item in combinations(items, 2)]
        result.append(x)

with open('out.txt', 'w') as fout:
    for res in result:
        for elem in res:                
            linestr = ',\t'.join(elem)
            print(linestr)
            fout.write(linestr + '\n')

这篇关于字符串操作并根据行添加值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆