一个好方法合并文件accroding的文件名的Python或bash [英] A good way to merge files accroding to the file name python or bash

查看:166
本文介绍了一个好方法合并文件accroding的文件名的Python或bash的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有成千上万的命名是这样压缩的CSV文件:

I have thousands of zipped csv files named like this:

result-20120705-181535.csv.gz

181535指18点15分35秒,现在我想在早晨合并这些文件,每天的基础上(我有一个多星期的数据,所有的命名就像上面的例子),从上午2:00至凌晨2:00第二天,再移动处理的文件到一个名为合并文件夹

181535 means 18:15:35, now I want to merge these files on daily basis(I have data over a week, all named like the above example), from 2:00 am in the morning till 2:00 am the next day,then moved the processed files into a folder called merged

因此​​,在当前文件夹中,我有大量的.csv.gz文件,我要扫描的名称,合并一切都像 20120705-02 *,* 20120705-03 ...,直到 20120706-01 * 20120705-result.csv.gz ,然后移动 20120705-02 *,* 20120705-03 ...,直到 20120706-01 * 文件到一个名为合并的文件夹,并开始发现第二天的数据: 20120706-02 * ..... 20120707-01 *

so in the current folder, I have tons of .csv.gz files, and I want to scan the names, merge everything like 20120705-02*, 20120705-03*...until 20120706-01* into 20120705-result.csv.gz, then move 20120705-02*, 20120705-03*...until 20120706-01* files into a folder called merged, and started to find the next day's data: 20120706-02*.....20120707-01*

我想知道是否使用Python或bash脚本做,怎么样?

I am wondering whether to use python or bash script to do it, and how?

遗憾的是,如果这个问题真的很愚蠢,但我很新约节目,真的不知道该怎么做,任何人都可以给我一个提示?非常感谢!

Sorry that if this question is really stupid, but I am quite new about programming, really have no clue how to do it, could anyone give me a hint? Many thanks!

推荐答案

创建一个包含这些行一个文本:

Create a textfile containing these lines:

#!/bin/bash

mkdir merged
shopt -s extglob

d1=$1
d2=$(date -d "$d1 +1 day")

for f in result-@($d1-@(0[2-9]|[1-2][0-9])|$d2-0[01])*.csv.gz ; do
  gzip -cd $f
  mv $f merged/$f
done | gzip > $d1-result.csv.gz

和使用 .SH 延伸(比如,myscript.sh)保存。其次,在终端,键入

and save it with a .sh extention (say, myscript.sh). Next, in a terminal, type

chmod +x myscript.sh

现在,你可以像

./myscript.sh 20120705

然后将像你描述的事情。

which will then do as you described.

要每天都在自动运行这个命令,你可以把一条线在你的 / etc / crontab中文件,像

To automatically execute this on a daily basis, you can put a line in your /etc/crontab file, something like

2 2 * * * root ./myscript.sh 

假设创造的最后一个.csv.gz文件需要1分钟,再加上额外的1分钟只是以确保:)

assuming creating the last .csv.gz file takes 1 minute, plus 1 extra minute just to be sure :)

有关自动化这样才能正常工作,上面的脚本需要被修改了一下。假设它会再上的当天操作,改变两行定义日期:

For this way of automation to work properly, the script above needs to be modified a bit. Assuming it will then operate on the current day, change the two lines defining the dates:

d1=$(date +%Y%m%d -d "now -1 day")
d2=$(date +%Y%m%d)

这是应该做的。与往常一样,的测试它彻底的自动化之前!

That should do. As always, test it thoroughly before automating it!

这篇关于一个好方法合并文件accroding的文件名的Python或bash的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆