如何将文件分成相等的部分,而又不中断单独的行? [英] How to split a file into equal parts, without breaking individual lines?
问题描述
我想知道是否可以将文件分成相等的部分( edit: =除最后一个以外的所有部分),而又不中断行吗?在Unix中使用split命令,行可能会折成两半.是否有办法将文件分成5个相等的部分,但是它仍然只由整行组成(如果其中一个文件更大或更小,这没问题)?我知道我只能计算行数,但是我必须对bash脚本中的许多文件执行此操作.非常感谢!
I was wondering if it was possible to split a file into equal parts (edit: = all equal except for the last), without breaking the line? Using the split command in Unix, lines may be broken in half. Is there a way to, say, split up a file in 5 equal parts, but have it still only consist of whole lines (it's no problem if one of the files is a little larger or smaller)? I know I could just calculate the number of lines, but I have to do this for a lot of files in a bash script. Many thanks!
推荐答案
如果您表示相等数量的行, split
为此提供了一个选项:
If you mean an equal number of lines, split
has an option for this:
split --lines=75
如果您需要知道75
对于相等的部分,实际上应该是什么,则其:
If you need to know what that 75
should really be for N
equal parts, its:
lines_per_part = int(total_lines + N - 1) / N
使用wc -l
可以获得总行数.
有关示例,请参见以下脚本:
See the following script for an example:
#!/usr/bin/bash
# Configuration stuff
fspec=qq.c
num_files=6
# Work out lines per file.
total_lines=$(wc -l <${fspec})
((lines_per_file = (total_lines + num_files - 1) / num_files))
# Split the actual file, maintaining lines.
split --lines=${lines_per_file} ${fspec} xyzzy.
# Debug information
echo "Total lines = ${total_lines}"
echo "Lines per file = ${lines_per_file}"
wc -l xyzzy.*
这将输出:
Total lines = 70
Lines per file = 12
12 xyzzy.aa
12 xyzzy.ab
12 xyzzy.ac
12 xyzzy.ad
12 xyzzy.ae
10 xyzzy.af
70 total
split
的最新版本允许您使用-n/--number
选项指定多个CHUNKS
.因此,您可以使用类似以下的内容:
More recent versions of split
allow you to specify a number of CHUNKS
with the -n/--number
option. You can therefore use something like:
split --number=l/6 ${fspec} xyzzy.
(即ell-slash-six
,表示lines
,而不是one-slash-six
).
(that's ell-slash-six
, meaning lines
, not one-slash-six
).
这将使您的文件大小大致相等,没有中线分割.
That will give you roughly equal files in terms of size, with no mid-line splits.
我提到最后一点是因为它在每个文件中给您的行数量几乎不相同,而字符的数量却相同.
I mention that last point because it doesn't give you roughly the same number of lines in each file, more the same number of characters.
因此,如果您有一个20个字符的行和19个1个字符的行(总共20行)并拆分为五个文件,则很可能不会在每个文件中获得四行
So, if you have one 20-character line and 19 1-character lines (twenty lines in total) and split to five files, you most likely won't get four lines in every file.
这篇关于如何将文件分成相等的部分,而又不中断单独的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!