分裂成固定的序列,顺利通过旗组值 [英] To Split into fixed sequences smoothly by group values of flags

查看:161
本文介绍了分裂成固定的序列,顺利通过旗组值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

欲分割尽可能均匀,不超过一个字节通过改变量文件,但有一些最小数量和文件的最大数量的输出文件的大小之间的差异。 本次讨论的关于序列这里在这里我提供过少的情况下,解释序列的行为,但那里的加载行为的第一个线程增量导致最后一个序列5字符增加。 标志的不同的条件都可以使用。

I want to split as even as possible, no more than one byte difference between the sizes of the output files by altering the amount files but having some minimum number and maximum number of files. The first thread of this discussion about load behavior of sequences here where I provided too few cases to explain the sequence behavior but where the increment leads to 5 character increase in the last sequence. Different conditions of flags can be used.

此平滑化无法实现与仅明确定义的算法。 我只是有部分索引可以工作,因为有永远只是一小部分有数据,并动态地出现在目录中的条目平滑的直觉。 该解决方案可能会涉及一些精心挑选的数据结构与算法的一些。

This smoothing cannot be accomplished with only well defined algorithm. I just have an intuition that a partial index could work because there are always just a small subset which has data and the smoothing occurs dynamically through the entries of the directory. The solution may involve some well chosen data-structure with some algorithm.

我想影响的字符的装载行为到其中发生的时刻,而不合逻辑并不能平稳地将所得的文件。

I would like to affect the loading behavior of the characters into the resulted files which occurs at the moment rather unlogically and not smoothly.

$ seq -w 0 0.0001 1                                \
| gsed 's/\.//g'                                   \
| gsed ':a;N;$!ba;s/\n//g' > /tmp/k                \
&& gsplit -n{a,b} -e -b{k,n,m} /tmp/k              \
&& wc -c 1stFile && wc -c lastFile

其中

  • 命令的部分 gsplit -n {A,B} {-b K,N,M} 只是pseudocommand
  • 标记 N B 可用于
  • -e elides空文件的输出,但它是不够单独强制输出是在某个区间
  • 标志 -n 单值导致文件的固定金额,但可以通过缩小到最小 -e 因此处理组中的每个单位分别是这里的可能发生的情况。
  • 标记的定性 -n ,当只有一个值,导致序列奇怪的装载行为到文件中。
  • parts of the command gsplit -n{a,b} -b{k,n,m} is just pseudocommand
  • flags n and b can be used
  • -e elides empty files from the output but it is not enough alone to force the output to be in some interval
  • The flag -n with single value leads to fixed amount of files but can be narrowed to minimum by -e so processing each unit of group separately is the possible situation here.
  • The definiteness of the flag -n, when only a single value, leads to strange loading behavior of sequences into files.

你怎么能更好地控制新序列加载到新的文件,而在一些文件中快速峰?

How can you control better the loading of new sequences into new files without rapid peaks in some files?

推荐答案

下面是一个shell脚本,将计算出允许的组合 文件大小和数量的给定的各种参数。这将退出 如果成功的任意组合被发现,并退出失败,如果 没有可能的组合被发现对于给定的输入。注意 并非所有的参数可能的组合有一个解决办法。 如果有必要的溶液提供,所允许的数 文件可以被增加或减少。两人的琐碎案件 文件或一个文件数等于字节数总是 解决的。

Here's a shell script that will figure out allowable combinations of file sizes and quantities given various parameters. It will exit successfully if any combinations are found, and exits failure if no possible combinations are found for the given inputs. Note that not all possible combinations of parameters have a solution. If it is necessary that a solution be provided, the number of allowed files can be increased or decreased. The trivial cases of two files or a number of files equal to the number of bytes are always solvable.

#!/bin/sh

# N is the bytes total.
# L is the lowest number of files allowable.
# H is the highest number of files allowable.
# F is the actual number of files used
# B is the minimum bytes per file
# R is the remaining bytes if all files are of size B
# K is the maximum number of files allowed to be one byte larger than the
# minimum, K < F
# 
# So, you need to determine if there is some L <= F <= H such that R <= K.
# 
# For a given candidate F:
# B = floor(N / F)
# R = N % B
# if R <= K then the candidate F is allowable, F files will be used,
# R of them will be of size B+1 and F-R of them will be of size B.

# usage: <program> <bytes> <min files> <max files> [max larger files]
# copyright disclaimed, this program is in the public domain

N=$1
L=$2
H=$3
K=${4:-1} # default to one file allowed to be larger

status=1
echo checking number of files F: $L '<= F <=' $H, at most $K one byte larger
for F in $(seq $L $H); do
        B=$(($N / $F))
        R=$(($N % $B))
        if [ $R -le $K ]; then
                if [ $R -eq 0 ]; then
                echo Usable: $F files, size $B
                else
                echo Usable: $F files, $(($F - $R)) size $B, $R size $(($B+1))
                fi
                status=0;
        fi
done
exit $status

一些例子:

字节的相当大的质数:

% sh trysplit 16769023 3 100; echo $?
checking number of files F: 3 <= F <= 100, at most 1 files one byte larger
Usable: 3 files, 2 size 5589674, 1 size 5589675
Usable: 6 files, 5 size 2794837, 1 size 2794838
Usable: 61 files, 60 size 274902, 1 size 274903
0
% 

那么,它有一些解决方案,但唉。

Well, it has some solutions, but ugh.

怎么样一个幸运数字:

% sh trysplit 16769024 3 100; echo $?
checking number of files F: 3 <= F <= 100, at most 1 files one byte larger
Usable: 4 files, size 4192256
Usable: 8 files, size 2096128
Usable: 16 files, size 1048064
Usable: 23 files, size 729088
Usable: 32 files, size 524032
Usable: 46 files, size 364544
Usable: 64 files, size 262016
Usable: 89 files, size 188416
Usable: 92 files, size 182272
0
% 

一个字节更大,你有很多选择。

One byte larger and you've got lots of choices.

如果我们允许一个以上的文件要大一些:

What if we allow more than one file to be larger:

% sh trysplit 16769023 3 100 2; echo $?
checking number of files F: 3 <= F <= 100, at most 2 files one byte larger
Usable: 3 files, 2 size 5589674, 1 size 5589675
Usable: 6 files, 5 size 2794837, 1 size 2794838
Usable: 17 files, 15 size 986413, 2 size 986414
Usable: 61 files, 60 size 274902, 1 size 274903
0
%

如果其中任何一个可以大么?我认为,在这种情况下,但 还没有证明,你可以使用你想要的任何数量的文件, 它只会影响到有多少个字节的分配 大。您可以使用脚本来看看的确切数字 文件通过设置最小和最大的文件希望工程 相同,允许为不同的一个小于

What if any of them can be larger? I think in this case, but haven't proved, that you can use any number of files you want, it will just affect the distribution of how many are one byte larger. You can use the script to see if the exact number of files you want works by setting the minimum and maximum files the same and the allowed to be different to one less than that.

这可以适用于刚打印出你感兴趣的参数 在这样你就可以用它来填充一个shell变量,可以再 用于构建合适的分割命令

This can be adapted to just print out the parameters you are interested in so you can use it to populate a shell variable that can then be used to construct the appropriate split command.

这篇关于分裂成固定的序列,顺利通过旗组值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆