以相同顺序随机播放多个文件 [英] Shuffle multiple files in same order
问题描述
设置:
我有50个文件,每个文件有25000行。
待办事项:
我需要将所有这些改成相同的顺序。
例如:
如果在洗牌之前:
档案1文件2文件3
AAA
BBB
CCC
然后在洗牌之后,我应该得到:
文件1文件2文件3
BBB
CCC
AAA
ie文件中的相应行应该以相同的顺序进行混洗。
此外,shuffle应该是确定性的,即如果我将文件A作为输入,它应该总是产生相同的混洗输出。
我可以编写一个Java程序来完成它,可能是一个脚本。比如,在1到25000之间的shuffle数字,并将其存储在文件中,比如说shuffle_order。然后,一次只处理一个文件,并根据shuffle_order对现有行进行排序。但是有没有更好/更快的方法呢?
如果需要更多信息,请告诉我。
下一步只使用基本的bash命令。原则是:
- 生成随机订单(数字)
- 按此顺序订购所有文件
代码
# !/ bin / bash
来自输入文件的pre>
案例$#in
0)echo用法:$ 0 files .....; 1号出口;;
esac
ORDER =./。rand。$$
陷阱rm -f $ ORDER;退出1 2
count = $(grep - c'^'$ 1)
让odcount = $(($ count * 4))
paste -d<(od -A n -N $ odcount -t u4 / dev / urandom | grep -o'[0-9] *')<(seq -w $ count)| \
sort -k1n | cut -d - f2> $ ORDER
#如果你的系统有shuf命令你可以用简单的
替换上面的3行#seq -w $ count | shuf> $ OR $
for$ @
do
paste -d''$ ORDER $ file | sort -k1n | cut -d'' - f2-> $ file.rand
done
echo订单在$ ORDER文件中#删除此行
#rm -f $ ORDER#并取消注释此
#如果不需要保留订单
粘贴-d*。并#删除此行 - 它仅用于显示测试结果
:
ABC
--------
a1 a2 a3
b1 b2 b3
c1 c2 c3
d1 d2 d3
e1 e2 e3
f1 f2 f3
g1 g2 g3
h1 h2 h3
i1 i2 i3
j1 j2 j3
将使用下一个示例内容
A.rand B.rand C.rand
g1 g2 g3
e1 e2 e3
b1 b2 b3
c1 c2 c3
f1 f2 f3
j1 j2 j3
d1 d2 d3
h1 h2 h3
i1 i2 i3
a1 a2 a3
真实测试 - 用25k行创建50个文件
line =Consequatur qui et qui.Mollitia expedita aut除了莫迪。 Enim nihil et laboriosam坐在tenetur。
for n in $(seq -w 50)
do
seq -f$ line%g25000> file。$ n
$
运行脚本
bash sorter.sh文件。??
结果在我的笔记本上
real 1m13.404s
用户0m56.127s
sys 0m5.143s
Setup:
I have 50 files, each with 25000 lines.
To-do:
I need to shuffle all of them "in the same order". E.g.:
If before shuffle:
File 1 File 2 File 3 A A A B B B C C C
then after shuffle I should get:
File 1 File 2 File 3 B B B C C C A A A
i.e. corresponding rows in files should be shuffled in same order.
Also, the shuffle should be deterministic, i.e. if I give File A as input, it should always produce same shuffled output.
I can write a Java program to do it, probably a script to. Something like, shuffle number between 1 and 25000 and store that in a file, say shuffle_order. Then simply process one file at a time and order existing rows according to shuffle_order. But is there a better/quick way to do this?
Please let me know if more info needed.
解决方案The next uses only basic bash commands. The principe is:
- generate a random order (numbers)
- order all files in this order
the code
#!/bin/bash
case "$#" in
0) echo "Usage: $0 files....." ; exit 1;;
esac
ORDER="./.rand.$$"
trap "rm -f $ORDER;exit" 1 2
count=$(grep -c '^' "$1")
let odcount=$(($count * 4))
paste -d" " <(od -A n -N $odcount -t u4 /dev/urandom | grep -o '[0-9]*') <(seq -w $count) |\
sort -k1n | cut -d " " -f2 > $ORDER
#if your system has the "shuf" command you can replace the above 3 lines with a simple
#seq -w $count | shuf > $ORDER
for file in "$@"
do
paste -d' ' $ORDER $file | sort -k1n | cut -d' ' -f2- > "$file.rand"
done
echo "the order is in the file $ORDER" # remove this line
#rm -f $ORDER # and uncomment this
# if dont need preserve the order
paste -d " " *.rand #remove this line - it is only for showing test result
from the input files:
A B C
--------
a1 a2 a3
b1 b2 b3
c1 c2 c3
d1 d2 d3
e1 e2 e3
f1 f2 f3
g1 g2 g3
h1 h2 h3
i1 i2 i3
j1 j2 j3
will make A.rand B.rand C.rand
with the next example content
g1 g2 g3
e1 e2 e3
b1 b2 b3
c1 c2 c3
f1 f2 f3
j1 j2 j3
d1 d2 d3
h1 h2 h3
i1 i2 i3
a1 a2 a3
real testing - genereting 50 files with 25k lines
line="Consequatur qui et qui. Mollitia expedita aut excepturi modi. Enim nihil et laboriosam sit a tenetur."
for n in $(seq -w 50)
do
seq -f "$line %g" 25000 >file.$n
done
running the script
bash sorter.sh file.??
result on my notebook
real 1m13.404s
user 0m56.127s
sys 0m5.143s
这篇关于以相同顺序随机播放多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!