将CSV拆分为包含设置的唯一字段值数的多个文件 [英] Split CSV to Multiple Files Containing a Set Number of Unique Field Values

查看：357 发布时间：2017/2/24 23:50:29 csv awk split condition find-occurrences

本文介绍了将CSV拆分为包含设置的唯一字段值数的多个文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为 awk的初学者我可以通过

拆分数据。

awk -F，'{print>> $ 1.csv; close（$ 1）}'myfile.csv

但我想根据额外的

具体来说，输入
111,1,0,1 111,1,1,1 222,1,1,1 333,1,0,0 333,1,1,1 444,1,1,1 444,0,0,0 555,1,1,1 666,1,0， 0
我想要的输出文件是
111,1,0,1 111,1,1,1 222,1,1,1 333,1,0 ，0 333,1,1,1
和
444,1,1,1 444,1,0,1 555,1,1,1 666,1,0,0
每个都包含三个这种情况下）分别在第一列中的唯一值 111,222,333 和 444,555,666 。
任何帮助将不胜感激。
解决方案
这将会做的技巧，我发现它很容易理解：
awk -F'，''BEGIN {count = 0; filename = 1} x [$ 1] ++ == 0 {count ++} count == 4 {count = 1; filename ++} {print>> filename.csv; close（filename.csv）;}'文件
文件名为1.然后我们计算从第一列获得的每个唯一值，并且每当它的第四个值，我们重置我们的计数并移动到下一个文件名。

这里是我使用的一些示例数据，这只是你的一些额外的行。
〜$ cat test.txt 111,1,0,1 111,1,1,1 222,1,1,1 333,1,0,0 333,1,1， 1 444,1,1,1 444,0,0,0 555,1,1,1 666,1,0,0 777 ，1,1,1 777,1,0,1 777,1,1,0 777,1,1,1 888,1,0,1 888,1,1,1 999,1,1,1 999,0,0,0 999,0,0,1 101， 0,0,0 102,0,0,0
所以：
〜$ awk -F'，''BEGIN {count = 0; filename = 1} x [$ 1] ++ == 0 {count ++} count == 4 {count = 1; filename ++} {print>> filename.csv; close（filename.csv）;}'test.txt
我们看到以下输出文件内容：
〜$ cat 1.csv 111,1,0,1 111,1 ，1,1 222,1,1,1 333,1,0,0 333,1,1,1 〜$ cat 2。 csv 444,1,1,1 444,0,0,0 555,1,1,1 666,1,0,0 〜$ cat 3.csv 777,1,1,1 777,1,0,1 777,1,1,0 777,1， 1,1 888,1,0,1 888,1,1,1 999,1,1,1 999,0,0,0 999,0,0,1 〜$ cat 4.csv 101,0,0,0 102,0,0,0
As a beginner of awk I am able to split the data with unique value by
awk -F, '{print >> $1".csv";close($1)}' myfile.csv But I would like to split a large CSV file based on additional condition which is the occurrences of unique values in a specific column. Specifically, with input 111,1,0,1 111,1,1,1 222,1,1,1 333,1,0,0 333,1,1,1 444,1,1,1 444,0,0,0 555,1,1,1 666,1,0,0 I would like the output files to be 111,1,0,1 111,1,1,1 222,1,1,1 333,1,0,0 333,1,1,1 and 444,1,1,1 444,1,0,1 555,1,1,1 666,1,0,0 each of which contains three(in this case) unique values, 111,222,333and 444,555,666respectively, in first column. Any help would be appreciated. 解决方案 This will do the trick and I find it pretty readable and easy to understand: awk -F',' 'BEGIN { count=0; filename=1 } x[$1]++==0 {count++} count==4 { count=1; filename++} {print >> filename".csv"; close(filename".csv");}' file We start with our count at 0 and our filename at 1. We then count each unique value we get from the fist column, and whenever its the 4th one, we reset our count and move to the next filename. Here's some sample data I used, which is just yours with some additional lines. ~$ cat test.txt 111,1,0,1 111,1,1,1 222,1,1,1 333,1,0,0 333,1,1,1 444,1,1,1 444,0,0,0 555,1,1,1 666,1,0,0 777,1,1,1 777,1,0,1 777,1,1,0 777,1,1,1 888,1,0,1 888,1,1,1 999,1,1,1 999,0,0,0 999,0,0,1 101,0,0,0 102,0,0,0 And running the awk like so: ~$ awk -F',' 'BEGIN { count=0; filename=1 } x[$1]++==0 {count++} count==4 { count=1; filename++} {print >> filename".csv"; close(filename".csv");}' test.txt We see the following output files and content: ~$ cat 1.csv 111,1,0,1 111,1,1,1 222,1,1,1 333,1,0,0 333,1,1,1 ~$ cat 2.csv 444,1,1,1 444,0,0,0 555,1,1,1 666,1,0,0 ~$ cat 3.csv 777,1,1,1 777,1,0,1 777,1,1,0 777,1,1,1 888,1,0,1 888,1,1,1 999,1,1,1 999,0,0,0 999,0,0,1 ~$ cat 4.csv 101,0,0,0 102,0,0,0 这篇关于将CSV拆分为包含设置的唯一字段值数的多个文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将CSV拆分为包含设置的唯一字段值数的多个文件 [英] Split CSV to Multiple Files Containing a Set Number of Unique Field Values

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录关闭

将CSV拆分为包含设置的唯一字段值数的多个文件 [英] Split CSV to Multiple Files Containing a Set Number of Unique Field Values

问题描述

相关文章

Office最新文章

热门教程

热门工具

登录 关闭

登录关闭