如何将特定列与其余列分开 [英] How to divide specific column with rest of columns
问题描述
我有这样的矩阵(其余的第一列名称为值,分隔符为i标签):
I have matrix like this (first column names rest are values, separator i tab):
name1 A1 B1 C1 D1
name2 A2 B2 C2 D2
矩阵可能很大(平均约有数百行和列).大小相同.我可以期待零值.
Matrix could be huge (it is mean about hundreds rows and columns). It is allays same size. I can expect zero values.
我需要这样的输出:
name1 A1 B1 C1 D1 A1/B1 A1/C1 A1/D1
name2 A2 B2 C2 D2 A2/B2 A2/C2 A2/D2
此组合保存到新文件.然后进行另一种组合:
This combination save to new file. And then make another combination:
name1 A1 B1 C1 D1 B1/A1 B1/C1 B1/D1
name2 A2 B2 C2 D2 B2/A2 B2/C2 B2/D2
,依此类推=>将矩阵中的每一列与其余各列相除,并另存为TSV到新文件中.并四舍五入到小数点后三位.
and so on so on => divide each column with rest of columns in matrix and save as TSV to new file. And also round to three decimal places.
我可以使用脚本手动完成此操作
I can do this manually with script:
awk '{OFS="\t"}{$6=$2/($3+0.001); $7=$2/($4+0.001); $8=$2/($5+0.001)}1' input_file.tsv
为什么我要加0.001的原因是不可能除以零.我可以使用wile循环创建shell脚本,但是需要很长时间.
Reason why I add number 0.001 is that division by zero is impossible. I can create shell script with wile loop, but it takes long time.
对于此过程的任何自动化,我将感到非常高兴.
I would be very happy for any automation this process.
推荐答案
由于您使用 python-3.x
标记了问题,因此以下脚本可以实现您想要的功能(它需要Python 3.6+,因为 f-strings
):
Since you tagged the question with python-3.x
, here is a script to achieve what you want (it requires Python 3.6+ though, because of f-strings
):
from pathlib import Path
import csv
source = Path('input.tsv')
with source.open() as src:
csvreader = csv.reader(src, dialect='excel-tab')
# get number of columns and rewind
cols = len(next(csvreader)[1:])
src.seek(0)
csvwriters = []
# create a csv.writer for each column
for i in range(cols):
# output_col_01.tsv, output_col_02.tsv ...
csvwriters.append(
csv.writer(
Path(f'output_col_{i + 1:02d}.tsv').open('w'),
dialect='excel-tab'
)
)
nan = float('nan')
for name, *cols in csvreader:
for i, a in enumerate(cols):
row = [name]
for j, b in enumerate(cols):
# skip the quotient of a col by itself
if i != j:
a = float(a)
b = float(b)
# nan if division by zero
row.append(round(a / b, 4) if b else nan)
csvwriters[i].writerow(row)
我没有为除数为 0
的操作添加 0.001
,而是选择返回 float('nan')
.
Instead of adding 0.001
for operations where the divisor is 0
, I opted to return float('nan')
.
它不会将列本身分开,而是将商四舍五入到小数点后四位.
It will not divide a column by itself and will round the quotients to 4 decimal places.
最后,如果您使用的Python版本早于3.6(但由于 pathlib.Path()
,您仍然需要Python版本3.4+),然后替换以下行:/p>
Finally, if you are using a Python version earlier than 3.6 (but you will still need a Python version 3.4+, because of pathlib.Path()
), then replace the following line:
Path(f'output_col_{i + 1:02d}.tsv').open('w'),
具有:
Path('output_col_%02d.tsv' % (i + 1)).open('w'),
这是必需的,因为 f-strings
是在Python 3.6中引入的.
That's needed because f-strings
were introduced in Python 3.6.
这篇关于如何将特定列与其余列分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!