拆分文件 - Java / Linux [英] Split File - Java/Linux

查看:125
本文介绍了拆分文件 - Java / Linux的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含近2.5亿个字符的大文件。现在,我想把它分成每个包含3000万个字符的部分(所以前8个部分将包含3000万个,最后部分将包含1000万个字符)。另一点是我希望在下一部分的开头包含每个文件的最后1000个字符(意味着第1部分的最后1000个字符附加在第2部分的开头 - 所以,第2部分包含3000万个1000字符,依此类推)。任何人都可以帮我如何编程(使用Java)或使用Linux命令(以快速方式)。

I have a large file contains nearly 250 million characters. Now, I want to split it into parts of each contains 30 million characters ( so first 8 parts will contains 30 million and last part will contain 10 million character). Another point is that I want to include last 1000 characters of each file at the beginning of the next part (means part 1's last 1000 characters append in 2nd part's begining - so, 2nd part contains 30 million 1000 characters and so on). Can anybody help me how to do it programmaticaly (using Java) or using Linux commands (in a fast way).

推荐答案

一个方法是使用常规的unix命令拆分文件,并在前一个文件的前1000个字节前加上。

One way is to use regular unix commands to split the file and the prepend the last 1000 bytes from the previous file.

首先拆分文件:

split -b 30000000 inputfile part.

然后,对于每个部分(忽略了farst,从一个新的文件开始,最后1000个字节来自previous:

Then, for each part (ignoring the farst make a new file starting with the last 1000 bytes from the previous:

unset prev
for i in part.*
do if [ -n "${prev}" ]
  then 
    tail -c 1000 ${prev} > part.temp
    cat ${i} >> part.temp
    mv part.temp ${i}
  fi
  prev=${i}
done

在组装之前,我们再次迭代文件,忽略第一个并扔掉前1000个字节:

Before assembling we again iterate over the files, ignoring the first and throw away the first 1000 bytes:

unset prev
for i in part.*
do if [ -n "${prev}" ]
  then 
    tail -c +1001 ${i} > part.temp
    mv part.temp ${i}
  fi
  prev=${i}
done

最后一步是重新组合文件:

Last step is to reassemble the files:

cat part.* >> newfile

由于没有解释为什么需要重叠,我只是创建它然后扔掉它。

Since there was no explanation of why the overlap was needed I just created it and then threw it away.

这篇关于拆分文件 - Java / Linux的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆