如何拆分文件并在每个部分中保留第一行? [英] How to split a file and keep the first line in each of the pieces?
问题描述
给定:一个大的文本数据文件(例如 CSV 格式),第一行是特殊的"(例如字段名称).
Given: One big text-data file (e.g. CSV format) with a 'special' first line (e.g., field names).
需要:相当于 coreutils split -l
命令,但附加要求原始文件的标题行出现在每个文件的开头结果碎片.
Wanted: An equivalent of the coreutils split -l
command, but with the additional requirement that the header line from the original file appear at the beginning of each of the resulting pieces.
我猜一些 split
和 head
的混合物会起作用吗?
I am guessing some concoction of split
and head
will do the trick?
推荐答案
这是 robhruska 的 脚本清理了一下:
This is robhruska's script cleaned up a bit:
tail -n +2 file.txt | split -l 4 - split_
for file in split_*
do
head -n 1 file.txt > tmp_file
cat "$file" >> tmp_file
mv -f tmp_file "$file"
done
我在不需要的地方删除了 wc
、cut
、ls
和 echo
.我更改了一些文件名,使它们更有意义.我把它分成多行只是为了更容易阅读.
I removed wc
, cut
, ls
and echo
in the places where they're unnecessary. I changed some of the filenames to make them a little more meaningful. I broke it out onto multiple lines only to make it easier to read.
如果你想变得有趣,你可以使用 mktemp
或 tempfile
来创建一个临时文件名,而不是使用硬编码的.
If you want to get fancy, you could use mktemp
or tempfile
to create a temporary filename instead of using a hard coded one.
编辑
使用 GNU split
可以做到这一点:
Using GNU split
it's possible to do this:
split_filter () { { head -n 1 file.txt; cat; } > "$FILE"; }; export -f split_filter; tail -n +2 file.txt | split --lines=4 --filter=split_filter - split_
为了可读性而打破:
split_filter () { { head -n 1 file.txt; cat; } > "$FILE"; }
export -f split_filter
tail -n +2 file.txt | split --lines=4 --filter=split_filter - split_
当指定 --filter
时,split
为每个输出文件运行命令(在这种情况下是一个函数,必须导出)并设置变量 FILE
,在命令的环境中,到文件名.
When --filter
is specified, split
runs the command (a function in this case, which must be exported) for each output file and sets the variable FILE
, in the command's environment, to the filename.
过滤器脚本或函数可以对输出内容甚至文件名进行任何操作.后者的一个例子可能是输出到可变目录中的固定文件名:>例如$FILE/data.dat"
.
A filter script or function could do any manipulation it wanted to the output contents or even the filename. An example of the latter might be to output to a fixed filename in a variable directory: > "$FILE/data.dat"
for example.
这篇关于如何拆分文件并在每个部分中保留第一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!