用Java创建一个近似大小的随机txt文件 [英] Create a random txt file of approximate size in Java
问题描述
好的,我在生成随机txt文件的地方找到了以下代码。基本上,我希望随机单词之间用空格隔开,以便运行MapReduce单词计数模拟。
OK, I have found the following code somewhere that generate a random txt file. Basically I want random words separated by some whitespace in order to run MapReduce word counting simulations.
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Random;
public class MainClass {
public static void main(String[] args) {
// TODO Auto-generated method stub
try{
PrintWriter writer = new PrintWriter("bigfile.txt", "UTF-8");
Random random = new Random();
for(int i = 0; i < 23695522; i++)
{
char[] word = new char[random.nextInt(8)+3]; // words of length 3 through 10. (1 and 2 letter words are boring.)
for(int j = 0; j < word.length; j++)
{
word[j] = (char)('a' + random.nextInt(26));
}
writer.print(new String(word) + ' ');
if (i % 10 == 0){
writer.println();
}
}
writer.close();
} catch (IOException e) {
// do something
}
}
}
现在,我想稍微修改一下此代码,以使文件具有所需的尽可能多的迭代次数,以具有大约预定义的大小。因此,每次迭代将产生大约6.5个字符(由于统一选择),每个2个字节。因此,我将所需文件大小除以(6.5 * 2)字节,将结果设置为for循环迭代次数,并得到一个比我期望的小得多的文件。
Now I want to alter this code a bit in order to have as much iterations as needed for the file to have approximately a predefined size. So, every iteration produces about 6.5 characters (due to uniform selection) each of 2 bytes. So, I divide the size of file I want in bytes by (6.5*2), set the result as the number of for loop iteration and get a file much smaller than I expect it to be.
推荐答案
import java.io.File;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Random;
public class MainClass {
public static void main(String[] args) {
// TODO Auto-generated method stub
long count=0;
try{
File file = new File("bigfile.txt");
PrintWriter writer = new PrintWriter(file, "UTF-8");
Random random = new Random();
for(int i = 0; i < 23695522; i++)
{
char[] word = new char[random.nextInt(8)+3]; // words of length 3 through 10. (1 and 2 letter words are boring.)
count+=word.length;
for(int j = 0; j < word.length; j++)
{
word[j] = (char)('a' + random.nextInt(26));
}
writer.print(new String(word) + ' ');
count+=1;
if (i % 10 == 0){
writer.println();
count+=2;
}
}
writer.close();
} catch (IOException e) {
// do something
}
System.out.println(count);
}
}
尝试这个。换行符char是2个字节,其他字符是1个字节。
Try this one. Newline char is 2 byte and the others are 1 byte.
这篇关于用Java创建一个近似大小的随机txt文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!