一个好方法合并文件accroding的文件名的Python或bash [英] A good way to merge files accroding to the file name python or bash
问题描述
我有成千上万的命名是这样压缩的CSV文件:
I have thousands of zipped csv files named like this:
result-20120705-181535.csv.gz
181535指18点15分35秒,现在我想在早晨合并这些文件,每天的基础上(我有一个多星期的数据,所有的命名就像上面的例子),从上午2:00至凌晨2:00第二天,再移动处理的文件到一个名为合并文件夹
181535 means 18:15:35, now I want to merge these files on daily basis(I have data over a week, all named like the above example), from 2:00 am in the morning till 2:00 am the next day,then moved the processed files into a folder called merged
因此,在当前文件夹中,我有大量的.csv.gz文件,我要扫描的名称,合并一切都像 20120705-02 *,* 20120705-03
...,直到 20120706-01 *
到 20120705-result.csv.gz
,然后移动 20120705-02 *,* 20120705-03
...,直到 20120706-01 *
文件到一个名为合并的文件夹,并开始发现第二天的数据: 20120706-02 * ..... 20120707-01 *
so in the current folder, I have tons of .csv.gz files, and I want to scan the names, merge everything like 20120705-02*, 20120705-03*
...until 20120706-01*
into 20120705-result.csv.gz
, then move 20120705-02*, 20120705-03*
...until 20120706-01*
files into a folder called merged, and started to find the next day's data: 20120706-02*.....20120707-01*
我想知道是否使用Python或bash脚本做,怎么样?
I am wondering whether to use python or bash script to do it, and how?
遗憾的是,如果这个问题真的很愚蠢,但我很新约节目,真的不知道该怎么做,任何人都可以给我一个提示?非常感谢!
Sorry that if this question is really stupid, but I am quite new about programming, really have no clue how to do it, could anyone give me a hint? Many thanks!
推荐答案
创建一个包含这些行一个文本:
Create a textfile containing these lines:
#!/bin/bash
mkdir merged
shopt -s extglob
d1=$1
d2=$(date -d "$d1 +1 day")
for f in result-@($d1-@(0[2-9]|[1-2][0-9])|$d2-0[01])*.csv.gz ; do
gzip -cd $f
mv $f merged/$f
done | gzip > $d1-result.csv.gz
和使用 .SH
延伸(比如,myscript.sh)保存。其次,在终端,键入
and save it with a .sh
extention (say, myscript.sh). Next, in a terminal, type
chmod +x myscript.sh
现在,你可以像
./myscript.sh 20120705
然后将像你描述的事情。
which will then do as you described.
要每天都在自动运行这个命令,你可以把一条线在你的 / etc / crontab中
文件,像
To automatically execute this on a daily basis, you can put a line in your /etc/crontab
file, something like
2 2 * * * root ./myscript.sh
假设创造的最后一个.csv.gz文件需要1分钟,再加上额外的1分钟只是以确保:)
assuming creating the last .csv.gz file takes 1 minute, plus 1 extra minute just to be sure :)
有关自动化这样才能正常工作,上面的脚本需要被修改了一下。假设它会再上的当天操作,改变两行定义日期:
For this way of automation to work properly, the script above needs to be modified a bit. Assuming it will then operate on the current day, change the two lines defining the dates:
d1=$(date +%Y%m%d -d "now -1 day")
d2=$(date +%Y%m%d)
这是应该做的。与往常一样,的测试它彻底的自动化之前!
That should do. As always, test it thoroughly before automating it!
这篇关于一个好方法合并文件accroding的文件名的Python或bash的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!