将CSV拆分为包含设置的唯一字段值数的多个文件 [英] Split CSV to Multiple Files Containing a Set Number of Unique Field Values

查看:357
本文介绍了将CSV拆分为包含设置的唯一字段值数的多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为 awk的初学者我可以通过

拆分数据。

awk -F,'{print>> $ 1.csv; close($ 1)}'myfile.csv



但我想根据额外的



具体来说,输入

  111,1,0,1 
111,1,1,1
222,1,1,1
333,1,0,0
333,1,1,1
444,1,1,1
444,0,0,0
555,1,1,1
666,1,0, 0

我想要的输出文件是

  111,1,0,1 
111,1,1,1
222,1,1,1
333,1,0 ,0
333,1,1,1

  444,1,1,1 
444,1,0,1
555,1,1,1
666,1,0,0

每个都包含三个这种情况下)分别在第一列中的唯一值 111,222,333 444,555,666
任何帮助将不胜感激。

解决方案

这将会做的技巧,我发现它很容易理解:

  awk -F',''BEGIN {count = 0; filename = 1} 
x [$ 1] ++ == 0 {count ++}
count == 4 {count = 1; filename ++}
{print>> filename.csv; close(filename.csv);}'文件

文件名为1.然后我们计算从第一列获得的每个唯一值,并且每当它的第四个值,我们重置我们的计数并移动到下一个文件名。



这里是我使用的一些示例数据,这只是你的一些额外的行。

 〜$ cat test.txt 
111,1,0,1
111,1,1,1
222,1,1,1
333,1,0,0
333,1,1, 1
444,1,1,1
444,0,0,0
555,1,1,1
666,1,0,0
777 ,1,1,1
777,1,0,1
777,1,1,0
777,1,1,1
888,1,0,1
888,1,1,1
999,1,1,1
999,0,0,0
999,0,0,1
101, 0,0,0
102,0,0,0

所以:

 〜$ awk -F',''BEGIN {count = 0; filename = 1} 
x [$ 1] ++ == 0 {count ++}
count == 4 {count = 1; filename ++}
{print>> filename.csv; close(filename.csv);}'test.txt

我们看到以下输出文件内容:

 〜$ cat 1.csv 
111,1,0,1
111,1 ,1,1
222,1,1,1
333,1,0,0
333,1,1,1

〜$ cat 2。 csv
444,1,1,1
444,0,0,0
555,1,1,1
666,1,0,0

〜$ cat 3.csv
777,1,1,1
777,1,0,1
777,1,1,0
777,1, 1,1
888,1,0,1
888,1,1,1
999,1,1,1
999,0,0,0
999,0,0,1

〜$ cat 4.csv
101,0,0,0
102,0,0,0


As a beginner of awk I am able to split the data with unique value by

awk -F, '{print >> $1".csv";close($1)}' myfile.csv

But I would like to split a large CSV file based on additional condition which is the occurrences of unique values in a specific column.

Specifically, with input

111,1,0,1
111,1,1,1
222,1,1,1
333,1,0,0
333,1,1,1
444,1,1,1
444,0,0,0
555,1,1,1
666,1,0,0

I would like the output files to be

111,1,0,1
111,1,1,1
222,1,1,1
333,1,0,0
333,1,1,1

and

444,1,1,1
444,1,0,1
555,1,1,1
666,1,0,0

each of which contains three(in this case) unique values, 111,222,333and 444,555,666respectively, in first column. Any help would be appreciated.

解决方案

This will do the trick and I find it pretty readable and easy to understand:

awk -F',' 'BEGIN { count=0; filename=1 }
            x[$1]++==0 {count++}
            count==4 { count=1; filename++}
            {print >> filename".csv"; close(filename".csv");}' file

We start with our count at 0 and our filename at 1. We then count each unique value we get from the fist column, and whenever its the 4th one, we reset our count and move to the next filename.

Here's some sample data I used, which is just yours with some additional lines.

~$ cat test.txt
111,1,0,1
111,1,1,1
222,1,1,1
333,1,0,0
333,1,1,1
444,1,1,1
444,0,0,0
555,1,1,1
666,1,0,0
777,1,1,1
777,1,0,1
777,1,1,0
777,1,1,1
888,1,0,1
888,1,1,1
999,1,1,1
999,0,0,0
999,0,0,1
101,0,0,0
102,0,0,0

And running the awk like so:

~$ awk -F',' 'BEGIN { count=0; filename=1 }
            x[$1]++==0 {count++}
            count==4 { count=1; filename++}
            {print >> filename".csv"; close(filename".csv");}' test.txt

We see the following output files and content:

~$ cat 1.csv
111,1,0,1
111,1,1,1
222,1,1,1
333,1,0,0
333,1,1,1

~$ cat 2.csv
444,1,1,1
444,0,0,0
555,1,1,1
666,1,0,0

~$ cat 3.csv
777,1,1,1
777,1,0,1
777,1,1,0
777,1,1,1
888,1,0,1
888,1,1,1
999,1,1,1
999,0,0,0
999,0,0,1

~$ cat 4.csv
101,0,0,0
102,0,0,0

这篇关于将CSV拆分为包含设置的唯一字段值数的多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆