根据文件名python或bash合并文件的好方法 [英] A good way to merge files according to the file name python or bash

查看:83
本文介绍了根据文件名python或bash合并文件的好方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有成千上万个这样命名的压缩CSV文件:

I have thousands of zipped csv files named like this:

result-20120705-181535.csv.gz

181535表示18:15:35,现在我想每天合并这些文件(我有一个星期的数据,所有名称均与上述示例相同),从早上2:00到凌晨2:00第二天,然后将处理后的文件移动到名为merged的文件夹中

181535 means 18:15:35, now I want to merge these files on daily basis(I have data over a week, all named like the above example), from 2:00 am in the morning till 2:00 am the next day,then moved the processed files into a folder called merged

所以在当前文件夹中,我有大量的.csv.gz文件,我想扫描名称,合并所有内容,例如 20120705-02 *,20120705-03 * ...直到 20120706-01 * 放入 20120705-result.csv.gz ,然后将 20120705-02 *,20120705-03 * ...移动到<将code> 20120706-01 * 文件放入一个名为merged的文件夹中,并开始查找第二天的数据: 20120706-02 * ..... 20120707-01 *

so in the current folder, I have tons of .csv.gz files, and I want to scan the names, merge everything like 20120705-02*, 20120705-03*...until 20120706-01* into 20120705-result.csv.gz, then move 20120705-02*, 20120705-03*...until 20120706-01* files into a folder called merged, and started to find the next day's data: 20120706-02*.....20120707-01*

我想知道是否要使用python或bash脚本来执行此操作?

I am wondering whether to use python or bash script to do it, and how?

推荐答案

创建一个包含以下行的文本文件:

Create a textfile containing these lines:

#!/bin/bash

mkdir merged
shopt -s extglob

d1=$1
d2=$(date -d "$d1 +1 day")

for f in result-@($d1-@(0[2-9]|[1-2][0-9])|$d2-0[01])*.csv.gz ; do
  gzip -cd $f
  mv $f merged/$f
done | gzip > $d1-result.csv.gz

并以 .sh 扩展名保存(例如,myscript.sh).接下来,在终端中,输入

and save it with a .sh extention (say, myscript.sh). Next, in a terminal, type

chmod +x myscript.sh

现在您可以输入

./myscript.sh 20120705

然后将按照您的描述进行操作.

which will then do as you described.

要每天自动执行此操作,可以在/etc/crontab 文件中放置一行,例如

To automatically execute this on a daily basis, you can put a line in your /etc/crontab file, something like

2 2 * * * root ./myscript.sh 

假设创建最后一个.csv.gz文件需要1分钟,再加上1分钟,以确保:)

assuming creating the last .csv.gz file takes 1 minute, plus 1 extra minute just to be sure :)

为使这种自动化方式正常工作,需要对上面的脚本进行一些修改.假设它将在当日上运行,请更改定义日期的两行:

For this way of automation to work properly, the script above needs to be modified a bit. Assuming it will then operate on the current day, change the two lines defining the dates:

d1=$(date +%Y%m%d -d "now -1 day")
d2=$(date +%Y%m%d)

那应该做.与往常一样,对其进行全面测试,然后使其自动化!

That should do. As always, test it thoroughly before automating it!

这篇关于根据文件名python或bash合并文件的好方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆