将CsvListReader限制为一行 [英] Limit CsvListReader to one line

查看:188
本文介绍了将CsvListReader限制为一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理可处理大型CSV文件(数百MB)的应用程序.最近,我遇到了一个问题,起初看起来是应用程序中的内存泄漏,但是经过一些调查,看来这是格式错误的CSV和CsvListReader尝试解析无休止行的结合.结果,我得到以下异常:

I am working on application which processes large CSV files (several hundreds of MB's). Recently I faced a problem which at first looked as a memory leak in application, but after some investigation, it appears that it is combination of bad formatted CSV and attempt of CsvListReader to parse never-ending line. As a result, I got following exception:

at java.lang.OutOfMemoryError.<init>(<unknown string>)
at java.util.Arrays.copyOf(<unknown string>)
   Local Variable: char[]#13624
at java.lang.AbstractStringBuilder.expandCapacity(<unknown string>)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(<unknown string>)
at java.lang.AbstractStringBuilder.append(<unknown string>)
at java.lang.StringBuilder.append(<unknown string>)
   Local Variable: java.lang.StringBuilder#3
at org.supercsv.io.Tokenizer.readStringList(<unknown string>)
   Local Variable: java.util.ArrayList#642
   Local Variable: org.supercsv.io.Tokenizer#1
   Local Variable: org.supercsv.io.PARSERSTATE#2
   Local Variable: java.lang.String#14960
at org.supercsv.io.CsvListReader.read(<unknown string>)

通过基于转储结果分析堆转储和CSV文件,我注意到CSV行之一中的一列缺少右引号,这显然导致读者试图通过将文件内容附加到内部字符串中来找到行的结尾缓冲,直到没有更多的堆内存为止.

By analyzing heap dump and CSV file based on dump findings, I noticed that one of columns in one of CSV lines was missing closing quotes, which obviously resulted in reader trying to find end of the line by appending file content to internal string buffer until there was no more heap memory.

无论如何,这就是问题所在,这是由于CSV格式错误导致的-一旦我删除了关键行,问题就消失了.我想实现的目标是告诉读者:

Anyway, that was the problem and it was due to bad formatted CSV - once I removed critical line, problem disappeared. What I want to achieve is to tell reader that:

  • 即使引号未正确关闭(不支持多行),它应解释的所有内容始终以换行符结尾
  • 或者,提供CSV行的某些限制(以字节为单位)

在SuperCSV中使用CsvListReader(在我的情况下首选)有一些明确的方法吗?

Is there some clear way to do this in SuperCSV using CsvListReader (preferred in my case)?

推荐答案

问题已经有报道,目前我正在进行一些增强(对于将来的主要版本),这应该会使这两种选择都更容易一些.

That issue has been reported, and I'm working on some enhancements (for a future major release) at the moment that should make both options a bit easier.

目前,您必须向阅读器提供自己的令牌生成器(因此,Super CSV使用您的令牌生成器而不是其自己的令牌生成器).我建议复制一份Super CSV的 Tokenizer 并进行修改与您的更改.这样,您无需修改​​Super CSV,也不会浪费时间.

For now, you'll have to supply your own Tokenizer to the reader (so Super CSV uses yours instead of its own). I'd suggest taking a copy of Super CSV's Tokenizer and modifying with your changes. That way you don't have to modify Super CSV, and you won't waste time.

这篇关于将CsvListReader限制为一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆