如何将编号列表切成子列表 [英] How to slice numbered lists into sublists
问题描述
我打开了一个文件,并将readlines()
和split()
与正则表达式'\t'
一起使用以删除TAB,结果显示在以下列表中:
I have opened a file and used readlines()
and split()
with regex '\t'
to remove TABs and it has resulted into the following lists:
["1", "cats", "--,"]
["2", "chase", "--,"]
["3", "dogs", "--,"]
["1", "the", "--,"]
["2", "car", "--,"]
["3", "is", "--,"]
["4", "gray", "--,"]
现在,我想通过将索引[0]上的整数作为句子边界进行循环,将其提取并切成子列表,例如猫追狗"和汽车是灰色的".例如1-3子列表猫追狗",然后继续计数1-4子列表汽车是灰色的",依此类推,其余列表这样子,所以我得到子列表["the", "car", "is", "gray" ]
.我该怎么做?
Now I want to extract and slice this into sublists like "cats chase dogs" and "the car is gray" by looping the integers on index [0] as sentence boundaries. For instance 1 - 3 to sublist "cats chase dogs" and then continue counting 1 - 4 to sublist "the car is gray" and so on for the rest of the lists so I get sublists ["the", "car", "is", "gray" ]
. How do I do this?
我已经试过了,但是出现错误:
I've tried this I'm but getting an error:
无法连接int + str
Can't concatenate int + str
在for循环中将"i"检测为字符串元素而不是整数:
Detecting "i" in the for loop as a string element instead of an integer:
with open(buffer, 'r') as f:
words = []
for line in f:
items = line.split('\t')[:1]
for i in items:
while i>1:
i = i+1
print i
推荐答案
类似的东西:
from itertools import groupby
with open('yourfile') as fin:
# split lines
lines = (line.split() for line in fin)
# group by consecutive ints
grouped = groupby(enumerate(lines), lambda (idx, el): idx - int(el[0]))
# build sentences from words in groups
sentences = [' '.join(el[1][1] for el in g) for k, g in grouped]
# ['cats chase dogs', 'the car is gray']
注意:这基于您的示例数据:
NB: This works based on your example data of:
example = [
["1", "cats", "--,"],
["2", "chase", "--,"],
["3", "dogs", "--,"],
["1", "the", "--,"],
["2", "car", "--,"],
["3", "is", "--,"],
["4", "gray", "--,"]
]
这篇关于如何将编号列表切成子列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!