字符串操作并根据行添加值 [英] string manipulation and adding values based on row they are
问题描述
我有一个文件文本分隔文件,我试图对每行进行二进制组合,并为每对提供行数.
I have a file text delimited file which I am trying to make binary combination per each line and giving the number of line to each pairs.
这是一个例子(如果你想要,你也可以在这里下载它https://gist.github.com/anonymous/4107418c63b88c6da44281a8ae7a321f)
Here is an example (you can download it here too if you want https://gist.github.com/anonymous/4107418c63b88c6da44281a8ae7a321f)
"A,B "
"AFD,DNGS,SGDH "
"NHYG,QHD,lkd,uyete"
"AFD,TTT"
我想要这样
A_1 B_1
AFD_2 DNGS_2
AFD_2 SGDH_2
DNGS_2 SGDH_2
NHYG_3 QHD_3
NHYG_3 lkd_3
NHYG_3 uyete_3
QHD_3 lkd_3
QHD_3 uyete_3
lkd_3 uyete_3
AFD_4 TTT_4
这意味着,A_1 和 B_1 来自第一行AFD_2 &DNGS_2 来自第二行等
It means, A_1 and B_1 are coming from the first row AFD_2 & DNGS_2 are coming from the second row , etc etc
我尝试过,但我无法弄清楚
I have tried to do it but I cannot figure it out
#!/usr/bin/python
import itertools
# make my output
out = {}
# give a name to my data
file_name = 'data.txt'
# read all the lines
for n, line in enumerate(open(file_name).readlines()):
# split each line by comma
item1 = line.split('\t')
# split each stirg from another one by a comma
item2 = item1.split(',')
# iterate over all combinations of 2 strings
for i in itertools.combinations(item2,2):
# save the data into out
out.write('\t'.join(i))
输出答案 1
"A_1, B "_1
"AFD_2, DNGS_2
"AFD_2, SGDH "_2
DNGS_2, SGDH "_2
"NHYG_3, QHD_3
"NHYG_3, lkd_3
"NHYG_3, uyete"_3
QHD_3, lkd_3
QHD_3, uyete"_3
lkd_3, uyete"_3
"AFD_4, TTT"_4
答案 2
"A_1 B "_1
"AFD_2 DNGS_2
"AFD_2 SGDH "_2
DNGS_2 SGDH "_2
"NHYG_3 QHD_3
"NHYG_3 lkd_3
"NHYG_3 uyete"_3
QHD_3 lkd_3
QHD_3 uyete"_3
lkd_3 uyete"_3
"AFD_4 TTT"_4
推荐答案
试试这个
#!/usr/bin/python
from itertools import combinations
with open('data1.txt') as f:
result = []
for n, line in enumerate(f, start=1):
items = line.strip().split(',')
x = [['%s_%d' % (x, n) for x in item] for item in combinations(items, 2)]
result.append(x)
for res in result:
for elem in res:
print(',\t'.join(elem))
您需要一个列表列表来表示每一对.您可以在循环中使用列表理解来构建它们.
You need a list of list of lists to represent each pair. You can build them using a list comprehension in a loop.
我不确定您想要什么作为您的实际输出格式,但这会打印您的预期输出.
I wasn't sure what you wanted as your actual output format, but this prints your expected output.
如果输入文件中有引号,简单的解决方法是
If there are quotes in the input file, the simple fix is
items = line.replace("\"", "").strip().split(',')
对于上面的代码.如果数据中有其他双引号,这将中断.所以,如果你知道这不正常.
For the above code. This would break if there were other double quotes in the data. So if you know there aren't its ok.
否则,创建一个小函数来去除引号.此示例还写入文件.
Otherwise, create a small function to strip the quotes. This example also writes to a file.
#!/usr/bin/python
from itertools import combinations
def remquotes(s):
beg, end = 0, len(s)
if s[0] == '"': beg = 1
if s[-1] == '"': end = -1
return s[beg:end]
with open('data1.txt') as f:
result = []
for n, line in enumerate(f, start=1):
items = remquotes(line.strip()).strip().split(',')
x = [['%s_%d' % (x, n) for x in item] for item in combinations(items, 2)]
result.append(x)
with open('out.txt', 'w') as fout:
for res in result:
for elem in res:
linestr = ',\t'.join(elem)
print(linestr)
fout.write(linestr + '\n')
这篇关于字符串操作并根据行添加值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!