扫描仪的nextLine()，仅获取部分内容 [英] Scanner's nextLine(), Only fetching partial

查看：33 发布时间：2021/2/10 18:34:30 java java.util.scanner

本文介绍了扫描仪的nextLine()，仅获取部分内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

因此，使用类似以下内容的

So, using something like:

for (int i = 0; i < files.length; i++) {
            if (!files[i].isDirectory() && files[i].canRead()) {
                try {
                    Scanner scan = new Scanner(files[i]);
                System.out.println("Generating Categories for " + files[i].toPath());
                while (scan.hasNextLine()) {
                    count++;
                    String line = scan.nextLine();
                    System.out.println("  ->" + line);
                    line = line.split("\t", 2)[1];
                    System.out.println("!- " + line);
                    JsonParser parser = new JsonParser();
                    JsonObject object = parser.parse(line).getAsJsonObject();
                    Set<Entry<String, JsonElement>> entrySet = object.entrySet();
                    exploreSet(entrySet);
                }
                scan.close();
                // System.out.println(keyset);
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            }

        }
    }

当人们查看Hadoop输出文件时，中间的JSON对象之一正在损坏...因为scan.nextLine()在将其拆分之前并未获取整行.即输出为:

as one goes over a Hadoop output file, one of the JSON objects in the middle is breaking... because scan.nextLine() is not fetching the whole line before it brings it to split. ie, the output is:

  ->0   {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{   ...  "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~
!- {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{   ...  "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~

上面的大多数数据都已经过清理(但是不是URL(大部分是...).

Most of the above data has been sanitized (not the URL (for the most part) however... )

，URL继续为: $(KGrHqZHJCgFBsO4dC3MBQdC2)Y4Tg ~~ 60_1.JPG?set_id = 8800005007 在文件中....

and the URL continues as: $(KGrHqZHJCgFBsO4dC3MBQdC2)Y4Tg~~60_1.JPG?set_id=8800005007 in the file....

所以有点slightly.

So its slightly miffing.

这也是条目#112，我已经解析了其他文件而没有错误...但是，这让我很头疼，主要是因为我不知道scan.nextLine()是如何工作的...

This also is entry #112, and I have had other files parse without errors... but this one is screwing with my mind, mostly because I dont see how scan.nextLine() isnt working...

通过调试输出，JSON错误是由字符串未正确拆分引起的.

By debug output, the JSON error is caused by the string not being split properly.

而且几乎忘了，如果我尝试将有问题的行放在其自己的文件中并对其进行解析，它也可以正常工作.

And almost forgot, it also works JUST FINE if I attempt to put the offending line in its own file and parse just that.

如果我在几乎相同的位置删除有问题的行，也会炸毁.

Also blows up if I remove the offending line in about the same place.

尝试使用JVM 1.6和1.7

Attempted with JVM 1.6 and 1.7

解决方法: BufferedReader scan = new BufferedReader(new FileReader(files [i])); 而不是扫描仪....

Workaround Solution: BufferedReader scan = new BufferedReader(new FileReader(files[i])); instead of scanner....

扫描仪的nextLine()，仅获取部分内容 [英] Scanner's nextLine(), Only fetching partial

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

扫描仪的nextLine()，仅获取部分内容 [英] Scanner&#39;s nextLine(), Only fetching partial

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

扫描仪的nextLine()，仅获取部分内容 [英] Scanner's nextLine(), Only fetching partial

登录关闭