这是Files.lines()中的错误,还是我误解了并行流的某些内容? [英] Is this a bug in Files.lines(), or am I misunderstanding something about parallel streams?

查看:229
本文介绍了这是Files.lines()中的错误,还是我误解了并行流的某些内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

环境:Ubuntu x86_64(14.10),Oracle JDK 1.8u25

Environment: Ubuntu x86_64 (14.10), Oracle JDK 1.8u25

我尝试使用 Files.lines() 但我想 .skip() 第一行(带有标题的CSV文件) )。因此我尝试这样做:

I try and use a parallel stream of Files.lines() but I want to .skip() the first line (it's a CSV file with a header). Therefore I try and do this:

try (
    final Stream<String> stream = Files.lines(thePath, StandardCharsets.UTF_8)
        .skip(1L).parallel();
) {
    // etc
}

然后一列无法解析为int ...

But then one column failed to parse to an int...

所以我尝试了一些简单的代码。文件问题很简单:

So I tried some simple code. The file is question is dead simple:

$ cat info.csv 
startDate;treeDepth;nrMatchers;nrLines;nrChars;nrCodePoints;nrNodes
1422758875023;34;54;151;4375;4375;27486
$

代码同样简单:

public static void main(final String... args)
{
    final Path path = Paths.get("/home/fge/tmp/dd/info.csv");
    Files.lines(path, StandardCharsets.UTF_8).skip(1L).parallel()
        .forEach(System.out::println);
}

系统地获得以下结果(确定) ,我只运行了20次左右):

And I systematically get the following result (OK, I have only run it something around 20 times):

startDate;treeDepth;nrMatchers;nrLines;nrChars;nrCodePoints;nrNodes

我在这里缺少什么?

编辑似乎问题或误解比根本更加根深蒂固(以下两个例子是由FreeNode的## java编写的) :

EDIT It seems like the problem, or misunderstanding, is much more rooted than that (the two examples below were cooked up by a fellow on FreeNode's ##java):

public static void main(final String... args)
{
    new BufferedReader(new StringReader("Hello\nWorld")).lines()
        .skip(1L).parallel()
        .forEach(System.out::println);

    final Iterator<String> iter
        = Arrays.asList("Hello", "World").iterator();
    final Spliterator<String> spliterator
        = Spliterators.spliteratorUnknownSize(iter, Spliterator.ORDERED);
    final Stream<String> s
        = StreamSupport.stream(spliterator, true);

    s.skip(1L).forEach(System.out::println);
}

打印:

Hello
Hello

呃。

@Holger建议对任何 ORDERED 的流发生这种情况,而不是 SIZED 使用此另一个样本:

@Holger suggested that this happens for any stream which is ORDERED and not SIZED with this other sample:

Stream.of("Hello", "World")
    .filter(x -> true)
    .parallel()
    .skip(1L)
    .forEach(System.out::println);

此外,它源于已经发生的所有问题的讨论(如果它是一个? )是 .forEach()(作为 @ SotiriosDelimanolis首先指出)。

Also, it stems from all the discussion which already took place that the problem (if it is one?) is with .forEach() (as @SotiriosDelimanolis first pointed out).

推荐答案

由于问题的当前状态与此前所做的陈述完全相反,应该注意的是,现在有一个 Brian Goetz的明确声明关于通过 skip 操作的无序特征的反向传播被认为是一个错误。 还说明了它现在被认为是根本没有对终端操作的有序性进行反向传播。

Since the current state of the issue is quite the opposite of the earlier statements made here, it should be noted, that there is now an explicit statement by Brian Goetz about the back-propagation of the unordered characteristic past a skip operation is considered a bug. It’s also stated that it is now considered to have no back-propagation of the ordered-ness of a terminal operation at all.

还有一个相关错误报告,JDK-8129120 ,其状态为在Java 9中修复,并且向后移植到Java 8,更新60

There is also a related bug report, JDK-8129120 whose status is "fixed in Java 9" and it’s backported to Java 8, update 60

我做了一些测试 jdk1.8.0_60 现在看来,实现确实表现出更直观的行为。

I did some tests with jdk1.8.0_60 and it seems that the implementation now indeed exhibits the more intuitive behavior.

这篇关于这是Files.lines()中的错误,还是我误解了并行流的某些内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆