合并CSV文件:附加,而不是合并 [英] Merging CSV files : Appending instead of merging

查看:181
本文介绍了合并CSV文件:附加,而不是合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以基本上我想合并两个CSV文件。 Im使用以下脚本:

 粘贴-d,* .csv> final.txt 

然而这对我来说在过去,但这一次它不工作。它将数据附加到彼此,而不是彼此之下。例如,包含以下格式的记录的两个文件

  CreatedAt ID 
Mon Jul 07 20:43:47 + 0000 2014 4.86249E + 17
Mon Jul 07 19:58:29 +0000 2014 4.86238E + 17
Mon Jul 07 19:42:33 +0000 2014 4.86234E + 17



合并时

  CreatedAt ID CreatedAt ID 
Mon Jul 07 20:43:47 +0000 2014 4.86249E + 17 Mon Jul 07 18:25:53 +0000 2014 4.86215E + 17
Mon Jul 07 19:58:29 +0000 2014 4.86238E + 17 Mon Jul 07 17:19:18 +0000 2014 4.86198E + 17
Mon Jul 07 19:42:33 +0000 2014 4.86234E + 17 Mon Jul 07 15:45:13 + 0000 2014 4.86174E + 17
Mon Jul 07 15:34:13 +0000 2014 4.86176E + 17


$ b b

有人知道这背后的原因是什么?或者我可以做什么强制合并以下记录?

解决方案

假设所有csv文件具有相同的格式,并且都以相同的标题
开头一个小脚本作为在所有文件中只添加一个以只使用一次

 #!/ bin / bash 
OutFileName =X.csv#修复输出名称
i = 0#重置计数器
/*.csv; do
if [$ filename!=$ OutFileName]; #避免递归
then
if [[$ i -eq 0]];那么
头-1 $ filename> $ OutFileName#复制标题,如果它是第一个文件
fi
tail -n +2 $ filename>> $ OutFileName#从第2行追加每个文件
i = $(($ i + 1))#增加计数器
fi
done

注意:





  • tail -n +2 打印从行号2开始的文件尾( +2

  • 测试 [...] 用于从输入列表中排除输出文件。

  • 每次输出文件都会重写

  • 命令 cat a.csv b.csv> X.csv 可以简单地用于在单个文件中附加a.csv和b csv(但是您复制2次标题)。



粘贴命令将一个文件粘贴到另一个文件的一侧。如果文件具有空格作为行,您可以获取上面报告的输出。

使用 -d,要求粘贴命令定义由逗号分隔的字段,但上面报告的文件的格式不是这样。

cat 命令反而连接文件并打印在标准输出上,这意味着它在一个文件之后写入一个文件。



请参阅 man head man tail 单个选项的语法(某些版本允许 head -1 其他 head -n 1 )...


So basically i want to merge a couple of CSV files. Im using the following script to do that :

paste -d , *.csv > final.txt

However this has worked for me in the past but this time it doesn't work. It appends the data next to each other as opposed to below each other. For instance two files that contain records in the following format

CreatedAt   ID
Mon Jul 07 20:43:47 +0000 2014  4.86249E+17
Mon Jul 07 19:58:29 +0000 2014  4.86238E+17
Mon Jul 07 19:42:33 +0000 2014  4.86234E+17

When merged give

CreatedAt   ID CreatedAt    ID
Mon Jul 07 20:43:47 +0000 2014  4.86249E+17 Mon Jul 07 18:25:53 +0000 2014  4.86215E+17
Mon Jul 07 19:58:29 +0000 2014  4.86238E+17 Mon Jul 07 17:19:18 +0000 2014  4.86198E+17
Mon Jul 07 19:42:33 +0000 2014  4.86234E+17 Mon Jul 07 15:45:13 +0000 2014  4.86174E+17
                                            Mon Jul 07 15:34:13 +0000 2014  4.86176E+17

Would anyone know what the reason behind this is? Or what i can do to force merge below records?

解决方案

Assuming that all the csv files have the same format and all start with the same header, you can write a little script as the following to append all files in only one and to take only one time the header.

#!/bin/bash
OutFileName="X.csv"                       # Fix the output name
i=0                                       # Reset a counter
for filename in ./*.csv; do 
 if [ "$filename"  != "$OutFileName" ] ;      # Avoid recursion 
 then 
   if [[ $i -eq 0 ]] ; then 
      head -1  $filename >   $OutFileName # Copy header if it is the first file
   fi
   tail -n +2  $filename >>  $OutFileName # Append from the 2nd line each file
   i=$(( $i + 1 ))                        # Increase the counter
 fi
done

Notes:

  • The head -1 or head -n 1 command print the first line of a file (the head).
  • The tail -n +2 prints the tail of a file starting from the lines number 2 (+2)
  • Test [ ... ] is used to exclude the output file from the input list.
  • The output file is rewritten each time.
  • The command cat a.csv b.csv > X.csv can be simply used to append a.csv and b csv in a single file (but you copy 2 times the header).

The paste command pastes the files one on a side of the other. If a file has white spaces as lines you can obtain the output that you reported above.
The use of -d , asks to paste command to define fields separated by a comma ,, but this is not the case for the format of the files you reported above.

The cat command instead concatenates files and prints on the standard output, that means it writes one file after the other.

Refer to man head or man tail for the syntax of the single options (some version allows head -1 other instead head -n 1)...

这篇关于合并CSV文件:附加,而不是合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆