读取文件和解析每一行的有效方法 [英] Effective way to read file and parse each line
问题描述
我有一个下一个格式的文本文件:每行以字符串开头,后跟数字序列.每行的长度都是未知的(数字数量未知,数量从0到1000).
I have a text file of next format: each line starts with a string which is followed by sequence of numbers. Each line has unknown length (unknown amount of numbers, amount from 0 to 1000).
string_1 3 90 12 0 3
string_2 49 0 12 94 13 8 38 1 95 3
.......
string_n 9 43
然后,我必须使用handleLine
方法处理每一行,该方法接受两个参数:字符串名称和数字集(请参见下面的代码).
Afterwards I must handle each line with handleLine
method which accept two arguments: string name and numbers set (see code below).
如何读取文件并有效地使用handleLine
处理每一行?
How to read the file and handle each line with handleLine
efficiently?
我的解决方法:
- 使用java8流
Files.lines
逐行读取文件. 它阻止了吗? - 用正则表达式分隔每一行
- 将每行转换为标题字符串和一组数字
- Read file line by line with java8 streams
Files.lines
. Is it blocking? - Split each line with regexp
- Convert each line into header string and set of numbers
由于第二步和第三步,我认为这几乎是无效的.第一步意味着java首先将文件字节转换为字符串,然后在第二和第三步中将它们转换回String
/Set<Integer>
. 这对性能有很大影响吗?如果是,如何做得更好?
I think it's pretty uneffective due 2nd and 3rd steps. 1st step mean that java convert file bytes to string first and then in 2nd and 3rd steps I convert them back to String
/Set<Integer>
. Does that influence performance a lot? If yes - how to do better?
public handleFile(String filePath) {
try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
stream.forEach(this::indexLine);
} catch (IOException e) {
e.printStackTrace();
}
}
private void handleLine(String line) {
List<String> resultList = this.parse(line);
String string_i = resultList.remove(0);
Set<Integer> numbers = resultList.stream().map(Integer::valueOf).collect(Collectors.toSet());
handleLine(string_i, numbers); // Here is te final computation which must to be done only with string_i & numbers arguments
}
private List<String> parse(String str) {
List<String> output = new LinkedList<String>();
Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
while (match.find()) {
output.add(match.group());
}
return output;
}
推荐答案
关于第一个问题,这取决于您引用Stream
的方式. Streams
本质上是懒惰的,如果您不打算使用它,那就不要工作.例如,对Files.lines
的调用实际上不会读取文件,直到您在Stream
上添加了终端操作.
Regarding your first question, it depends on how you reference the Stream
. Streams
are inherently lazy, and don't do work if you're not going to use it. For example, the call to Files.lines
doesn't actually read the file until you add a terminal operation on the Stream
.
来自Java文档:
以流的形式从文件中读取所有行.与readAllLines不同,此方法不会将所有行都读入List,而是随着流的使用而缓慢地填充
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed
forEach(Consumer<T>)
调用是一项终端操作,在那一点上,文件的行将被逐一读取,并传递给您的indexLine
方法.
The forEach(Consumer<T>)
call is a terminal operation, and, at that point, the lines of the file are read one by one and passed to your indexLine
method.
关于您的其他评论,您实际上在这里没有任何问题.您正在尝试测量/最小化什么?仅仅因为某些步骤是多个步骤,并不能使它本身具有较差的性能.即使您创建了wizbang oneliner,也可以将其直接从File
字节转换为您的String
& Set
,您可能只是匿名进行了中间映射,或者您已经调用了某种东西,无论如何都会使编译器执行该操作.
Regarding your other comments, you don't really have a question here. What are you trying to measure/minmize? Just because something is multiple steps doesn't inherently make it have poor performance. Even if you created a wizbang oneliner to convert from the File
bytes directly to your String
& Set
, you probably just did the intermediate mapping anonymously, or you've called something that will cause the compiler to do that anyway.
这篇关于读取文件和解析每一行的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!