读取文件和解析每一行的有效方法 [英] Effective way to read file and parse each line

查看:98
本文介绍了读取文件和解析每一行的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个下一个格式的文本文件:每行以字符串开头,后跟数字序列.每行的长度都是未知的(数字数量未知,数量从0到1000).

I have a text file of next format: each line starts with a string which is followed by sequence of numbers. Each line has unknown length (unknown amount of numbers, amount from 0 to 1000).

string_1 3 90 12 0 3
string_2 49 0 12 94 13 8 38 1 95 3
.......
string_n 9 43

然后,我必须使用handleLine方法处理每一行,该方法接受两个参数:字符串名称和数字集(请参见下面的代码).

Afterwards I must handle each line with handleLine method which accept two arguments: string name and numbers set (see code below).

如何读取文件并有效地使用handleLine处理每一行?

How to read the file and handle each line with handleLine efficiently?

我的解决方法:

  1. 使用java8流Files.lines逐行读取文件. 它阻止了吗?
  2. 用正则表达式分隔每一行
  3. 将每行转换为标题字符串和一组数字
  1. Read file line by line with java8 streams Files.lines. Is it blocking?
  2. Split each line with regexp
  3. Convert each line into header string and set of numbers

由于第二步和第三步,我认为这几乎是无效的.第一步意味着java首先将文件字节转换为字符串,然后在第二和第三步中将它们转换回String/Set<Integer>. 这对性能有很大影响吗?如果是,如何做得更好?

I think it's pretty uneffective due 2nd and 3rd steps. 1st step mean that java convert file bytes to string first and then in 2nd and 3rd steps I convert them back to String/Set<Integer>. Does that influence performance a lot? If yes - how to do better?

public handleFile(String filePath) {
    try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
        stream.forEach(this::indexLine);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

private void handleLine(String line) {
    List<String> resultList = this.parse(line);
    String string_i = resultList.remove(0);
    Set<Integer> numbers = resultList.stream().map(Integer::valueOf).collect(Collectors.toSet());
    handleLine(string_i, numbers); // Here is te final computation which must to be done only with string_i & numbers arguments
}

private List<String> parse(String str) {
    List<String> output = new LinkedList<String>();
    Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
    while (match.find()) {
        output.add(match.group());
    }
    return output;
}

推荐答案

关于第一个问题,这取决于您引用Stream的方式. Streams本质上是懒惰的,如果您不打算使用它,那就不要工作.例如,对Files.lines的调用实际上不会读取文件,直到您在Stream上添加了终端操作.

Regarding your first question, it depends on how you reference the Stream. Streams are inherently lazy, and don't do work if you're not going to use it. For example, the call to Files.lines doesn't actually read the file until you add a terminal operation on the Stream.

来自Java文档:

以流的形式从文件中读取所有行.与readAllLines不同,此方法不会将所有行都读入List,而是随着流的使用而缓慢地填充

Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed

forEach(Consumer<T>)调用是一项终端操作,在那一点上,文件的行将被逐一读取,并传递给您的indexLine方法.

The forEach(Consumer<T>) call is a terminal operation, and, at that point, the lines of the file are read one by one and passed to your indexLine method.

关于您的其他评论,您实际上在这里没有任何问题.您正在尝试测量/最小化什么?仅仅因为某些步骤是多个步骤,并不能使它本身具有较差的性能.即使您创建了wizbang oneliner,也可以将其直接从File字节转换为您的String& Set,您可能只是匿名进行了中间映射,或者您已经调用了某种东西,无论如何都会使编译器执行该操作.

Regarding your other comments, you don't really have a question here. What are you trying to measure/minmize? Just because something is multiple steps doesn't inherently make it have poor performance. Even if you created a wizbang oneliner to convert from the File bytes directly to your String & Set, you probably just did the intermediate mapping anonymously, or you've called something that will cause the compiler to do that anyway.

这篇关于读取文件和解析每一行的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆