如何合并第一个单元格中具有相同值的CSV行? [英] How can I merge CSV rows that have the same value in the first cell?
问题描述
这是文件: https://drive. google.com/file/d/0B5v-nJeoVouHc25wTGdqaDV1WW8/view?usp=sharing
如您所见,第一列中有重复项,但是如果我要合并重复的行,则其他列中不会覆盖任何数据.有什么方法可以将第一列中具有重复值的行组合在一起?
As you can see, there are duplicates in the first column, but if I were to combine the duplicate rows, no data would get overridden in the other columns. Is there any way I can combine the rows with duplicate values in the first column?
例如,将"1,A,A,"和"1,,T,T"变成"1,A,A,T,T".
For example, turn "1,A,A,," and "1,,,T,T" into "1,A,A,T,T".
推荐答案
普通的Python:
import csv
reader = csv.Reader(open('combined.csv'))
result = {}
for row in reader:
idx = row[0]
values = row[1:]
if idx in result:
result[idx] = [result[idx][i] or v for i, v in enumerate(values)]
else:
result[idx] = values
此魔术的工作原理:
- 遍历CSV文件中的行
- 对于每条记录,我们检查之前是否有一条具有相同索引的记录
- 如果这是我们第一次看到此索引,只需复制行值
- 如果这是重复项,则将行值分配给空单元格.
最后一步是通过or
技巧完成的:None or value
将返回value
. value or anything
将返回value
.因此,result[idx][i] or v
将返回不为空的现有值或行值.
The last step is done via or
trick: None or value
will return value
. value or anything
will return value
. So, result[idx][i] or v
will return existing value if it is not empty, or row value.
要输出此内容而不丢失重复的行,我们需要保留索引,然后迭代并输出相应的result
条目:
To output this without loosing the duplicated rows, we need to keep index, then iterate and output corresponding result
entries:
indices = []
for row in reader:
# ...
indices.append(idx)
writer = csv.writer(open('outfile.csv', 'w'))
for idx in indices:
writer.writerow([idx] + result[idx])
这篇关于如何合并第一个单元格中具有相同值的CSV行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!