尝试从大文本文件读取/写入时出现OutOfMemoryError [英] OutOfMemoryError when trying to read/write from a huge text file
问题描述
我正在尝试读取/写入一个巨大的文本文件。
但是当我尝试这样做时,我得到错误:
I'm trying to read/write a huge text file. But when I try to do that I get the error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
at java.lang.AbstractStringBuilder.append(Unknown Source)
at java.lang.StringBuilder.append(Unknown Source)
at ReadWriteTextFile.getContents(ReadWriteTextFile.java:52)
at ReadWriteTextFile.main(ReadWriteTextFile.java:148)
我的代码如下:
import java.io.*;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
public class ReadWriteTextFile {
/**
* Fetch the entire contents of a text file, and return it in a String.
* This style of implementation does not throw Exceptions to the caller.
*
* @param aFile is a file which already exists and can be read.
*/
static public String getContents(File aFile) {
//...checks on aFile are elided
StringBuilder contents = new StringBuilder();
int maxlines = 1000; //counts max lines t read/write to the file
BufferedReader input = null;
BufferedWriter bw = null;
try {
//use buffering, reading one line at a time
//FileReader always assumes default encoding is OK!
input = new BufferedReader(new FileReader(aFile));
try {
String line = null; //not declared within while loop
/*
* readLine is a bit quirky :
* it returns the content of a line MINUS the newline.
* it returns null only for the END of the stream.
* it returns an empty String if two newlines appear in a row.
*/
//for (int i = 0; i < 100; i++){
//int count = 0;//initiates the line counter
while (( line = input.readLine()) != null){
int count = 0;//initiates the line counter
String modified1 = line.substring(2,17);
String modified2 = line.substring(18,33);
String modified3 = line.substring(40);
String result = "empty";
result = modified1 + ",," +modified2 + modified3;
System.out.println (result);
// contents.append(line);
// contents.append(System.getProperty("line.separator"));
//int count = 0;//initiates the line counter
try {
contents.append(line);
contents.append(System.getProperty("line.separator"));
String content = result;
File file = new File("C:\\temp\\out.txt");//output path
// if file doesnt exists, then create it
if (!file.exists()) {
file.createNewFile();
}
for ( int i = 0; i < 1000; i++){
if (count++ % maxlines == 0) {
FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
bw = new BufferedWriter(fw);
bw.write(content);
bw.newLine();
}
bw.close();
}
} catch (IOException e) {
e.printStackTrace();
}
//}
}
}
finally {
input.close();
bw.close();
}
}
catch (IOException ex){
ex.printStackTrace();
}
return contents.toString();
}
/**
* Change the contents of text file in its entirety, overwriting any
* existing text.
*
* This style of implementation throws all exceptions to the caller.
*
* @param aFile is an existing file which can be written to.
* @throws IllegalArgumentException if param does not comply.
* @throws FileNotFoundException if the file does not exist.
* @throws IOException if problem encountered during write.
*/
static public void setContents(File aFile, String aContents)
throws FileNotFoundException, IOException {
if (aFile == null) {
throw new IllegalArgumentException("File should not be null.");
}
if (!aFile.exists()) {
throw new FileNotFoundException ("File does not exist: " + aFile);
}
if (!aFile.isFile()) {
throw new IllegalArgumentException("Should not be a directory: " + aFile);
}
if (!aFile.canWrite()) {
throw new IllegalArgumentException("File cannot be written: " + aFile);
}
//use buffering
Writer output = new BufferedWriter(new FileWriter(aFile, true));
try {
//FileWriter always assumes default encoding is OK!
output.write( aContents );
}
finally {
output.close();
}
}
/** Simple test harness. */
public static void main (String... aArguments) throws IOException {
File testFile = new File("C:\\temp\\in.txt");//input path
System.out.println("\n" + getContents(testFile));
}
}
我试过添加一个计数器(计数),以便在读取一定量的行后刷新缓冲区。它没用。
我知道计数器无法正常工作。在执行特殊数量的while循环后,它不会变为零。我在while循环之前和之后添加了一个for循环来清空计数器,但这也没有用。
I tried to add a counter (count) so it can flush the buffer after a certain amount of lines read. It didn't work. I know the counter does not work correctly. It doesn't goes to zero After a special number of execution of "while" loop. I added a "for" loop before and after while loop to empty the counter but that didn't work as well.
任何建议?
推荐答案
尝试使用 FileInputStream 而不是BufferedReader / Writer。当我使用FileInputStream时,我可以复制一个超过3600万行的虚拟日志文件,并且在不到几秒的时间内几乎达到500MB。
Try using a FileInputStream instead of a BufferedReader/Writer. When I used a FileInputStream, I could copy a dummy log file that had over 36 MILLION lines and was almost 500MB in size in less than a few seconds.
FileInputStream in = new FileInputStream(from); //Read data from a file
FileOutputStream out = new FileOutputStream(to); //Write data to a file
byte[] buffer = new byte[4096]; //Buffer size, Usually 1024-4096
int len;
while ((len = in.read(buffer, 0, buffer.length)) > 0) {
out.write(buffer, 0, len);
}
//Close the FileStreams
in.close();
out.close();
如果你想逐行读取文件而不是字节块,你可以使用BufferedReader ,但以不同的方式。
if you wanted to read the file line by line instead of chunks of bytes, you could use a BufferedReader, but in a different way.
// Removed redundant exists()/createNewFile() calls altogether
String line;
BufferedReader br = new BufferedReader(new FileReader(aFile));
BufferedWriter output = new BufferedWriter(new FileWriter(file, true));
while ((line = br.readLine()) != null) {
String modified1 = line.substring(2,17);
String modified2 = line.substring(18,33);
String modified3 = line.substring(40);
String result = "empty";
result = modified1 + ",," +modified2 + modified3;
System.out.println (result);
output.append(result + "\n");//Use \r\n for Windows EOL
}
//Close Streams
br.close();
output.close();
像EJP所说,不要将整个文件读入内存 - 这不是一个聪明的事情一点都不您最好的选择是逐个读取每一行或一次读取文件的块 - 但是,为了准确,逐行读取它可能是最好的。
Like EJP said, don't read an entire file into memory - that's not a smart thing to do at all. Your best bet would be to read each line one-by-one or to read chunks of a file at once - although, for accuracy, reading it line-by-line might be best.
在 while((line = br.readLine())!= null)
期间,你应该做那些需要在那里加载整个文件的东西只有1行被加载到内存中。 (例如检查一行是否包含 _ 或从中获取文字)。
During the while ((line = br.readLine()) != null)
, you should do what would have needed the entire file loaded in there while only 1 line is loaded into the memory. (Such as checking if a line contains _ or grabbing text from it).
您可以尝试做的另一件事是避免OOM异常是使用多个字符串。
Another thing you could try to do to avoid the OOM exception is to use multiple Strings.
if(contents.length() => (Integer.MAX_VALUE-5000)) { //-5000 to give some headway when checking
. . .
}
这篇关于尝试从大文本文件读取/写入时出现OutOfMemoryError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!