在bash中创建具有随机可打印字符串的特定大小的文件 [英] Create a file of a specific size with random printable strings in bash

查看:93
本文介绍了在bash中创建具有随机可打印字符串的特定大小的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个特定大小的文件,仅包含bash中可打印的字符串.

I want to create a file of a specific size containing only printable strings in bash.

我的第一个想法是使用/dev/urandom:

My first thought was to use /dev/urandom:

dd if=/dev/urandom of=/tmp/file bs=1M count=100
  100+0 records in
  100+0 records out
  104857600 bytes (105 MB, 100 MiB) copied, 10,3641 s, 10,1 MB/s

file /tmp/file && du -h /tmp/file
  /tmp/file: data
  101M  /tmp/file

这给我留下了一个所需大小的文件,但不仅包含可打印的字符串.

This leaves me with a file, of my desired size, but not only containing printable strings.

现在,我可以使用strings创建一个仅包含可打印字符串的文件.

Now, I can use strings to create a file only containing printable strings.

cat /tmp/file | strings > /tmp/file.txt
file /tmp/file.txt && du -h /tmp/file.txt 
  /tmp/file.txt: ASCII text
  7,0M  /tmp/file.txt

这给我留下了一个仅包含可打印字符串但文件大小错误的文件.

This leaves me with a file containing only printable strings, but with the wrong file size.

TL; DR

如何在bash中创建特定大小的文件,仅包含可打印的字符串?

How can I create a file of a specific size, containing only printable strings, in bash?

推荐答案

正确的方法是使用诸如base64的转换将随机字节转换为字符.这将不会消除源中的任何随机性,只会将其转换为其他形式.
对于大小为1 MB的文件(稍大一点):

The correct way is to use a transformation like base64 to convert the random bytes to characters. That will not erase any of the randomness from the source, it will only convert it to some other form.
For a (a little bit bigger) file of 1 MegaByte of size:

dd if=/dev/urandom bs=786438 count=1 | base64 > /tmp/file

结果文件将包含A–Za–z0–9+/=范围内的字符.

The resulting file will contain characters in the range A–Za–z0–9 and +/=.

下面是文件较大的原因,以及解决方法.

Below is the reason for the file to be a little bigger, and a solution.

您可以添加一个过滤器,以便使用tr从该列表转换为其他列表(大小相同或更小).

You could add a filter to translate from that list to some other list (of the same size or less) with tr.

cat /tmp/file | tr 'A-Za-z0-9+/=' 'a-z0-9A-Z$%'

我将=留在了翻译之外,因为对于均匀的随机分布,最好省略(几乎)始终是=的最后一个字符.

I have left the = outside of the translation because for an uniform random distribution it is better to leave out the last characters that will (almost) allways be =.

文件大小将从/dev/random中使用的原始大小扩展为4/3.那是因为我们将256个字节的值转换为64个不同的字符.这是通过从字节流中提取6位来对每个字符进行编码来完成的.编码4个字符(6 * 4 = 24位)后,仅消耗了三个字节(8 * 3 = 24).

The size of the file will get expanded from the original size used from /dev/random in a factor of 4/3. That is because we are transforming 256 byte values into 64 different characters. That is done by taking 6 bits from the stream of bytes to encode each character. When 4 characters have been encoded (6*4=24 bits) only three bytes have been consumed (8*3=24).

因此,我们需要一个3的字节数才能得到准确的结果,而需要4的字节数,因为我们必须除以该字节.
我们不能得到恰好1024字节(1k)或1024 * 1024 = 1,048,576字节(1M)的随机文件,因为两者都不是3的精确倍数.但是我们可以产生一个更大的文件并将其截断(如果这样的精度为需要):

So, we need a count of bytes multiple of 3 to get an exact result, and multiple of 4 because we will have to divide by that.
We can not get a random file of exactly 1024 bytes (1k) or 1024*1024 = 1,048,576 bytes (1M) because both are not exact multiple of 3. But we can produce a file a little bigger and truncate it (if such precision is needed):

wanted_size=$((1024*1024))
file_size=$(( ((wanted_size/12)+1)*12 ))
read_size=$((file_size*3/4))

echo "wanted=$wanted_size file=$file_size read=$read_size"

dd if=/dev/urandom bs=$read_size count=1 | base64 > /tmp/file

truncate -s "$wanted_size" /tmp/file 

截断为确切值的最后一步是可选的.

The last step to truncate to the exact value is optional.

当您要从urandom中提取大量随机值时,请不要使用random(使用urandom),否则您的应用将被长时间阻止,并且计算机的其余部分将正常运行.

As you are going to extract so much random values from urandom, please do not use random (use urandom) or your app will be blocked for a long time and the rest of the computer will work without randomness.

我建议您安装已安装以下软件包:

I'll recommend that you install the package haveged:

已使用HAVEGE(硬件易变熵收集和扩展) 维护一个用于填充/dev/random的1M随机字节池 每当dev/random中的随机位供应量低于低位时 设备的水印.

haveged uses HAVEGE (HArdware Volatile Entropy Gathering and Expansion) to maintain a 1M pool of random bytes used to fill /dev/random whenever the supply of random bits in dev/random falls below the low water mark of the device.

如果可能的话.

这篇关于在bash中创建具有随机可打印字符串的特定大小的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆