如何将特定列与其余列分开 [英] How to divide specific column with rest of columns

查看:40
本文介绍了如何将特定列与其余列分开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的矩阵(其余的第一列名称为值,分隔符为i标签):

I have matrix like this (first column names rest are values, separator i tab):

name1 A1 B1 C1 D1
name2 A2 B2 C2 D2

矩阵可能很大(平均约有数百行和列).大小相同.我可以期待零值.

Matrix could be huge (it is mean about hundreds rows and columns). It is allays same size. I can expect zero values.

我需要这样的输出:

name1 A1 B1 C1 D1 A1/B1 A1/C1 A1/D1
name2 A2 B2 C2 D2 A2/B2 A2/C2 A2/D2

此组合保存到新文件.然后进行另一种组合:

This combination save to new file. And then make another combination:

name1 A1 B1 C1 D1 B1/A1 B1/C1 B1/D1
name2 A2 B2 C2 D2 B2/A2 B2/C2 B2/D2

,依此类推=>将矩阵中的每一列与其余各列相除,并另存为TSV到新文件中.并四舍五入到小数点后三位.

and so on so on => divide each column with rest of columns in matrix and save as TSV to new file. And also round to three decimal places.

我可以使用脚本手动完成此操作

I can do this manually with script:

awk '{OFS="\t"}{$6=$2/($3+0.001); $7=$2/($4+0.001); $8=$2/($5+0.001)}1' input_file.tsv

为什么我要加0.001的原因是不可能除以零.我可以使用wile循环创建shell脚本,但是需要很长时间.

Reason why I add number 0.001 is that division by zero is impossible. I can create shell script with wile loop, but it takes long time.

对于此过程的任何自动化,我将感到非常高兴.

I would be very happy for any automation this process.

推荐答案

由于您使用 python-3.x 标记了问题,因此以下脚本可以实现您想要的功能(它需要Python 3.6+,因为 f-strings ):

Since you tagged the question with python-3.x, here is a script to achieve what you want (it requires Python 3.6+ though, because of f-strings):

from pathlib import Path
import csv

source = Path('input.tsv')

with source.open() as src:
    csvreader = csv.reader(src, dialect='excel-tab')

    # get number of columns and rewind
    cols = len(next(csvreader)[1:])
    src.seek(0)

    csvwriters = []

    # create a csv.writer for each column
    for i in range(cols):
        # output_col_01.tsv, output_col_02.tsv ...
        csvwriters.append(
            csv.writer(
                Path(f'output_col_{i + 1:02d}.tsv').open('w'),
                dialect='excel-tab'
            )
        )

    nan = float('nan')

    for name, *cols in csvreader:
        for i, a in enumerate(cols):
            row = [name]
            for j, b in enumerate(cols):
                # skip the quotient of a col by itself
                if i != j:
                    a = float(a)
                    b = float(b)
                    # nan if division by zero
                    row.append(round(a / b, 4) if b else nan)

            csvwriters[i].writerow(row)

我没有为除数为 0 的操作添加 0.001 ,而是选择返回 float('nan').

Instead of adding 0.001 for operations where the divisor is 0, I opted to return float('nan').

它不会将列本身分开,而是将商四舍五入到小数点后四位.

It will not divide a column by itself and will round the quotients to 4 decimal places.

最后,如果您使用的Python版本早于3.6(但由于 pathlib.Path(),您仍然需要Python版本3.4+),然后替换以下行:/p>

Finally, if you are using a Python version earlier than 3.6 (but you will still need a Python version 3.4+, because of pathlib.Path()), then replace the following line:

Path(f'output_col_{i + 1:02d}.tsv').open('w'),

具有:

Path('output_col_%02d.tsv' % (i + 1)).open('w'),

这是必需的,因为 f-strings 是在Python 3.6中引入的.

That's needed because f-strings were introduced in Python 3.6.

这篇关于如何将特定列与其余列分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆