拆分成固定序列并留下多余的部分 [英] To Split into fixed sequences and leave extra out

查看:22
本文介绍了拆分成固定序列并留下多余的部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将所有文件限制为相同的固定长度,但最后一项可以是任何可变大小,但不超过 557.这意味着文件数量可以超过由命令 split 的标志 -n 决定的.

I would like to limit all files to be of the same fixed length but the last item can be any variable size but not more than 557. This means that the file amount can be more than determined by the flag -n of the command split.

$ seq -w 1 1671 > /tmp/k && gsplit -n15 /tmp/k && wc -c xaa && wc -c xao
557 xaa
557 xao

其中 xaa 是序列的第一个文件,而 xao 是最后一个.我将序列增加了一个单位,但导致最后一个文件 xao 增加了 5 个单位(557->562),我不明白:

where xaa is the first file of the sequence, while xao the last one. I increase the sequence by one unit but it causes 5 unit increase (557->562) in the last file xao which I do not understand:

$ seq -w 1 1672 > /tmp/k && gsplit -n15 /tmp/k && wc -c xaa && wc -c xao
557 xaa
562 xao

为什么依次增加一个单位,最后一项(xao)增加了5个单位?

Why does the increase of one-unit in sequence increase the last item (xao) by 5 units?

$ seq -w 1 1671 | gsed ':a;N;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao
445 xaa
455 xao
$ seq -w 1 1672 | gsed ':a;N;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao
445 xaa
459 xao

因此将整个长度增加一个序列(4 个字符)会导致 4 个字符增加(455 -> 459),而第一个代码增加了 5 个字符.

so increasing the whole length by one sequence (4 characters) leads to 4 character increase (455 -> 459), in contrast to the first code where increase is 5 characters.

现在让我们通过 seq -w 0 0.0001 1 | 将每个序列单元固定为 4 个字符 |gsed 's/\.//g':

$ seq -w 0 0.0001 1 | gsed 's/\.//g' |  gsed ':a;N;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao
3333 xaa
3344 xao
$ seq -w 0 0.0001 1.0001 | gsed 's/\.//g' |  gsed ':a;N;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao
3334 xaa
3335 xao

因此将序列增加一个字符会增加 xaa 一个单位,但将 xao 减少 9 个单位.这种行为是我不保持逻辑的.

so increasing the sequence by one characters increases xaa by unit but decreases xao by 9 units. This behavior is what I do not keep so logical.

如何先限制序列长度,比如固定为557,然后确定成功文件的文件数量?

How can you limit the sequence length first, for instance to be fixed at 557 and later determine the amount of files of successful files?

推荐答案

Original answer — for Code 1

因为 seq -w 1 1671 每个数字生成 5 个字符 — 4 位数字和 1 个换行符.因此,在输出中添加一个数字会为输出增加 5 个字节.

Original answer — for Code 1

Because seq -w 1 1671 generates 5 characters per number — 4 digits and 1 newline. So adding one number to the output adds 5 bytes to the output.

您已要求 GNU split(又名 gsplit)将文件输入分成 15 个块.它尽最大努力将值平衡.但是当总字节数不是 15 的倍数时,它可以做什么是有限制的.有选项可以控制发生的事情.

You've asked GNU split (aka gsplit) to split the file input into 15 chunks. It does its best to even the values out. But there's a limit to what it can do when the total number of bytes is not a multiple of 15. There are options to control what happens.

但是,在基本形式中,-n 15选项意味着前14个输出文件每个得到445个字符,最后一个得到455个字符,因为有6685 = 445 * 15 + 10个字符在输出文件中.当您向文件中添加另外 4 个字符(因为您删除了换行符)时,最后一个文件将获得额外的 4 个字符(因为 6689 = 445 * 15 + 14).

However, in the basic form, the -n 15 option means that the first 14 output files each get 445 characters, and the last gets 455 because there are 6685 = 445 * 15 + 10 characters in the output file. When you add another 4 characters to the file (because you delete the newlines), then the last file gets an additional 4 characters (because 6689 = 445 * 15 + 14).

首先,seq -w 0 0.0001 1 的输出看起来像:

First of all, the output from seq -w 0 0.0001 1 looks like:

0.0000
0.0001
0.0002
…
0.9998
0.9999
1.0000

因此在使用第一个 sed 编辑输出后,会出现从 00000 到 10000 的数字,每行一个,每行 6 个字符(包括换行符).第二个 sed 再次消除换行符.

So after the output is edited with the first sed, the numbers from 00000 to 10000 are present, one per line, with 6 characters per line (including the newline). The second sed eliminates the newlines, again.

/tmp/k 一行有 50006 个字节.这等于 15 * 3333 + 11,因此是第一个输出.第二个变体在 /tmp/k 中有 50011 个字节,即 15 * 3334 + 1.因此只有一个差异.

There are 50006 bytes in /tmp/k on one line. That's equal to 15 * 3333 + 11, hence the first output. The second variant has 50011 bytes in /tmp/k, which is 15 * 3334 + 1. Hence the difference of only one.

这篇关于拆分成固定序列并留下多余的部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆