用0填充文本文件中的空列 [英] Fill empty columns in text file with 0

查看:82
本文介绍了用0填充文本文件中的空列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从Google Spreadsheet剪切粘贴到我的文本编辑器中的数据集(Sublime Text 2),并且该数据集与我的处理需求不完全匹配.

I have a data set that I cut-n-pasted from a Google Spreadsheet into my text editor (Sublime Text 2), and the data set doesnt' quite match my needs for processing.

以来自电子表格的形式,数据以一行字符串开始,每一行一行,然后是多行数据.在数据行中,每一列的值都为1或为空白.我不知道数据是否来自电子表格时是否用制表符分隔,但将其粘贴到文本文件中后不是.如果行中的最后一个1不在最后一列中,则该行将填充空格,直到但不包括最后一列.

In the form that it has coming from the spreadsheet, the data starts with one line of strings, one for each column, and then a number of rows with data; in the data rows, each column has either the value 1 or is blank. I don't know if the data is tab separated when it comes from the spreadsheet, but after pasting it in the text file it is not. If the last 1in a row is not in the last column, the line is padded with spaces up until but not including the last column.

我尝试使用awk做某事,但是我不知道如何解决空格既是分隔符又是列值这一事实.接下来,我用sed尝试了一些命令,包括用零替换重复的空格,并管道传递到另一个sed,用1 0替换了10,但是有时我插入了多余的零,我不知道在哪里在发生的相应行中.

I tried doing something with awk, but I couldn't figure out how to tackle the fact that space is both separator and column value. Next, I tried a few commands with sed, including replacing repeated spaces with zeros and piping to another sed which replaced 10 with 1 0, but then I sometimes got extra zeros inserted and I don't know where in the respective rows that happened.

这是一些示例数据(实际文件中有13列).我在行的最后一个字符之后添加了$作为字符,因此您可以看到行被填充了多远.

This is some example data (there are 13 columns in the real file). I've added $ as the character after the last one on the line, so you can see how far the lines are padded.

"1" "2" "3" "4"                           "1" "2" "3" "4"
  1 1 $                                   0 1 1 0
1     1 $                                 1 0 0 1
  1   $                                   0 1 0 0
1 1   1 $                                 1 1 0 1

我想以正确的结尾(然后我不在乎行的结尾),所以我可以用awk处理它.

I would like to end up with something like the right (and then I don't care about where the line ends) so I can process it with awk.

顺便说一句,我已经看到了

And by the way, I have seen this question, which doesn't solve my problem since the solution there is based on the fact that the file is tab-delimited, with no value at all in the "empty" cells. To reiterate, my file is space-delimited, with spaces in the empty cells.

推荐答案

我的第一次尝试不正确.因此,我的 2nd 3rd 第四次尝试根据修改后的输入自动确定列数:

My first attempt was not ok. So my 2nd 3rd 4th try based on the modified input with determining the number of columns automatically:

awk 'NR==1{for(;N<NF;++N)sp=" 0"sp}NR>1{$0=" "$0;sub(" +$","");gsub("  "," 0");$0=substr($0sp,2,2*N-1)}1'<<EOT
"1" "2" "3" "4"
  1 1 
1     1 
  1   
1 1   1 
EOT

第一个空格是偶数,两个空格之间是奇数,因此我在开头添加了一个空格,以便在两种情况下都使用相同的gsub.目前尚不清楚有多少尾随空格,因此脚本仅将其切掉.它包含0个字段次数. Substr从2开始以剪切添加的前导空格,并持续到(number of fields)*2-1字符以剪切尾随空格.

First spaces are even, in between ones are odd, so I added a space at the beginning to let to use the same gsub for both cases. It is not clear how many trailing spaces present, so the script just chomp them. It contains the number of 0 number of field times. Substr start from 2 to cut the added leading space, and lasts to (number of fields)*2-1 characters to cut the trailing space.

输出:

"1" "2" "3" "4"
0 1 1 0
1 0 0 1
0 1 0 0
1 1 0 1

这篇关于用0填充文本文件中的空列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆