从宽格式重塑到长格式 [英] Reshaping from wide to long format

查看：117 发布时间：2020/5/25 0:16:06 bash parsing unix

本文介绍了从宽格式重塑到长格式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用unix将制表符分隔的文件从短/宽格式转换为长格式，与R中的reshape函数类似.我希望为起始文件中的每一行创建三行.列4当前包含3个值，以逗号分隔.我希望每个起始行的第1、2和3列都相同，但使第4列成为初始第4列的值之一.此示例可能比我口头描述的更清楚:

I am trying to use unix to transform a tab delimited file from a short/wide format to long format, in a similar way as the reshape function in R. I hope to create three rows for each row in the starting file. Column 4 currently contains 3 values separated by commas. I hope to keep columns 1, 2, and 3 the same for each starting row, but have column 4 be one of the values from the initial column 4. This example probably makes it more clear than I can describe verbally:

current file:  
A1  A2  A3  A4,A5,A6  
B1  B2  B3  B4,B5,B6  
C1  C2  C3  C4,C5,C6  

goal:  
A1  A2  A3  A4  
A1  A2  A3  A5  
A1  A2  A3  A6  
B1  B2  B3  B4  
B1  B2  B3  B5  
B1  B2  B3  B6  
C1  C2  C3  C4  
C1  C2  C3  C5  
C1  C2  C3  C6

作为刚熟悉这种语言的人，我最初的想法是使用sed来查找逗号，以换取硬性报酬

As someone just becoming familiar with this language, my initial thought was to use sed to find the commas replace with a hard return

sed 's/,/&\n/' data.frame

我真的不确定如何包含1-3列的值.我对此工作寄予厚望，但我唯一想到的就是尝试使用{print $ 1，$ 2，$ 3}插入列值.

I am really not sure how to include the values for columns 1-3. I had low hopes of this working, but the only thing I could think of was to try inserting the column values with {print $1, $2, $3}.

sed 's/,/&\n{print $1, $2, $3}/' data.frame

令我惊讶的是，输出看起来像这样:

Not to my surprise, the output looked like this:

A1  A2  A3  A4  
{print $1, $2, $3}  A5  
{print $1, $2, $3}  A6  
B1  B2  B3  B4  
{print $1, $2, $3}  B5  
{print $1, $2, $3}  B6  
C1  C2  C3  C4  
{print $1, $2, $3}  C5  
{print $1, $2, $3}  C6

似乎一种方法可能是存储第1-3列的值，然后将其插入.我不确定如何存储值，我认为可能需要使用以下脚本的改编，但是我很难理解所有组件.

It seems like an approach might be to store the values of columns 1-3 and then insert them. I am not really sure how to store the values, I think that it may involve using an adaptation of the following script, but I am having a hard time understanding all of the components.

NR==FNR{a[$1, $2, $3]=1}

预先感谢您对此的想法.

Thanks in advance for your thoughts on this.

从宽格式重塑到长格式 [英] Reshaping from wide to long format

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

从宽格式重塑到长格式 [英] Reshaping from wide to long format

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭