根据文件名python或bash合并文件的好方法 [英] A good way to merge files according to the file name python or bash
问题描述
我有成千上万个这样命名的压缩CSV文件:
I have thousands of zipped csv files named like this:
result-20120705-181535.csv.gz
181535表示18:15:35,现在我想每天合并这些文件(我有一个星期的数据,所有名称均与上述示例相同),从早上2:00到凌晨2:00第二天,然后将处理后的文件移动到名为merged的文件夹中
181535 means 18:15:35, now I want to merge these files on daily basis(I have data over a week, all named like the above example), from 2:00 am in the morning till 2:00 am the next day,then moved the processed files into a folder called merged
所以在当前文件夹中,我有大量的.csv.gz文件,我想扫描名称,合并所有内容,例如 20120705-02 *,20120705-03 *
...直到 20120706-01 *
放入 20120705-result.csv.gz
,然后将 20120705-02 *,20120705-03 *
...移动到<将code> 20120706-01 * 文件放入一个名为merged的文件夹中,并开始查找第二天的数据: 20120706-02 * ..... 20120707-01 *
so in the current folder, I have tons of .csv.gz files, and I want to scan the names, merge everything like 20120705-02*, 20120705-03*
...until 20120706-01*
into 20120705-result.csv.gz
, then move 20120705-02*, 20120705-03*
...until 20120706-01*
files into a folder called merged, and started to find the next day's data: 20120706-02*.....20120707-01*
我想知道是否要使用python或bash脚本来执行此操作?
I am wondering whether to use python or bash script to do it, and how?
推荐答案
创建一个包含以下行的文本文件:
Create a textfile containing these lines:
#!/bin/bash
mkdir merged
shopt -s extglob
d1=$1
d2=$(date -d "$d1 +1 day")
for f in result-@($d1-@(0[2-9]|[1-2][0-9])|$d2-0[01])*.csv.gz ; do
gzip -cd $f
mv $f merged/$f
done | gzip > $d1-result.csv.gz
并以 .sh
扩展名保存(例如,myscript.sh).接下来,在终端中,输入
and save it with a .sh
extention (say, myscript.sh). Next, in a terminal, type
chmod +x myscript.sh
现在您可以输入
./myscript.sh 20120705
然后将按照您的描述进行操作.
which will then do as you described.
要每天自动执行此操作,可以在/etc/crontab
文件中放置一行,例如
To automatically execute this on a daily basis, you can put a line in your /etc/crontab
file, something like
2 2 * * * root ./myscript.sh
假设创建最后一个.csv.gz文件需要1分钟,再加上1分钟,以确保:)
assuming creating the last .csv.gz file takes 1 minute, plus 1 extra minute just to be sure :)
为使这种自动化方式正常工作,需要对上面的脚本进行一些修改.假设它将在当日上运行,请更改定义日期的两行:
For this way of automation to work properly, the script above needs to be modified a bit. Assuming it will then operate on the current day, change the two lines defining the dates:
d1=$(date +%Y%m%d -d "now -1 day")
d2=$(date +%Y%m%d)
那应该做.与往常一样,对其进行全面测试,然后使其自动化!
That should do. As always, test it thoroughly before automating it!
这篇关于根据文件名python或bash合并文件的好方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!