使用Java 8搜索字谜 [英] Searching anagrams with Java 8

查看:77
本文介绍了使用Java 8搜索字谜的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须编写程序,该程序应该读取字谜的文件并显示单词+他的字谜. Txt文件很大,使用扫描仪后,listOfWords的大小为:25000.

I have to write program which should be reading file for anagrams and show word + his anagrams. Txt files is very big, after using scanner, listOfWords size is: 25000.

输出示例:

word anagram1 anagram2 anagram3 ...
word2 anagram1 anagram2...

我有代码,它可以工作,但速度很慢:

I have code, it works but very slow:

  private static List<String> listOfWords = new ArrayList<String>();
  private static List<ArrayList<String>> allAnagrams = new ArrayList<ArrayList<String>>();

  public static void main(String[] args) throws Exception {
    URL url = new URL("www.xxx.pl/textFile.txt");
    Scanner scanner = new Scanner(url.openStream());
    while (scanner.hasNext()) {
      String nextToken = scanner.next();
      listOfWords.add(nextToken);
    }
    scanner.close();

    while (listOfWords.isEmpty() == false) {
      ArrayList<String> anagramy = new ArrayList<String>();
      String wzor = listOfWords.remove(0);
      anagramy.add(wzor);
      char[] ch = wzor.toCharArray();
      Arrays.sort(ch);
      for (int i = 0; i < listOfWords.size(); i++) {
        String slowo = listOfWords.get(i);
        char[] cha = slowo.toCharArray();
        Arrays.sort(cha);
        if (Arrays.equals(ch, cha)) {
          anagramy.add(slowo);
          listOfWords.remove(i);
          i--;
        }
      }
      allAnagrams.add(anagramy);
    }

    for (ArrayList<String> ar : allAnagrams) {
      String result = "";
      if (ar.size() > 1) {
        for (int i = 1; i < ar.size(); i++) {
          result = ar.get(i) + " ";
        }
        System.out.println(ar.get(0) + " " + result);
      }
    }
  }

我必须使用Java 8-流来编写它,但我不知道.可以使用流读取URL +搜索字谜吗?您能帮我按流搜索字谜吗?老师告诉我,阅读整个清单,代码应该比我的短.只有几行,这可能吗?

I have to write it with Java 8 - streams but I don't know. It is possible to use Streams for reading from URL + searching anagrams? Could you help me with searching anagrams by Stream? Teacher told me that code should be shorter that mine with reading a whole list. Only a few lines, is that possible?

推荐答案

让我们创建一个对字母进行排序的单独方法.您也可以使用Stream API做到这一点:

Let's create separate method which sorts letters. You can do this with Stream API as well:

private static String canonicalize(String s) {
    return Stream.of(s.split("")).sorted().collect(Collectors.joining());
}

现在您可以阅读一些Reader,从中提取单词并按规范形式对单词进行分组:

Now you can read some Reader, extract words from it and group words by canonical form:

Map<String, Set<String>> map = new BufferedReader(reader).lines()
             .flatMap(Pattern.compile("\\W+")::splitAsStream)
             .collect(Collectors.groupingBy(Anagrams::canonicalize, Collectors.toSet()));

接下来,您可以第三次使用Stream API删除单个字母组:

Next, you can remove single letter groups using Stream API for the third time:

return map.values().stream().filter(list -> list.size() > 1).collect(Collectors.toList());

现在,您可以将一些读者传递给此代码,以从中提取字谜.这是完整的代码:

Now you can pass some reader to this code to extract anagrams from it. Here's complete code:

import java.io.*;
import java.util.*;
import java.util.regex.Pattern;
import java.util.stream.*;

public class Anagrams {
    private static String canonicalize(String s) {
        return Stream.of(s.split("")).sorted().collect(Collectors.joining());
    }

    public static List<Set<String>> getAnagrams(Reader reader) {
    Map<String, Set<String>> map = new BufferedReader(reader).lines()
                                     .flatMap(Pattern.compile("\\W+")::splitAsStream)
                                     .collect(Collectors.groupingBy(Anagrams::canonicalize, Collectors.toSet()));
        return map.values().stream().filter(list -> list.size() > 1).collect(Collectors.toList());
    }

    public static void main(String[] args) throws IOException {
        getAnagrams(new StringReader("abc cab tat aaa\natt tat bbb"))
                .forEach(System.out::println);
    }
}

它打印

[att, tat]
[abc, cab]

如果要使用URL,只需将StringReader替换为new InputStreamReader(new URL("www.xxx.pl/textFile.txt").openStream(), StandardCharsets.UTF_8)

If you want to use an URL, just replace the StringReader with new InputStreamReader(new URL("www.xxx.pl/textFile.txt").openStream(), StandardCharsets.UTF_8)

如果要提取字谜集的第一个元素,则应稍作修改:

If you want to extract the first element of the anagram set, the solution should be modified slightly:

public static Map<String, Set<String>> getAnagrams(Reader reader) {
    Map<String, List<String>> map = new BufferedReader(reader).lines()
       .flatMap(Pattern.compile("\\W+")::splitAsStream)
       .distinct() // remove repeating words
       .collect(Collectors.groupingBy(Anagrams::canonicalize));
    return map.values().stream()
       .filter(list -> list.size() > 1)
       .collect(Collectors.toMap(list -> list.get(0), 
                                 list -> new TreeSet<>(list.subList(1, list.size()))));
}

这里的结果是映射,其中键是字谜集中的第一个元素(首先出现在输入文件中),值是其余元素,按字母顺序排序(我创建了一个子列表以跳过第一个元素,然后移动它们放入TreeSet进行排序;另一种选择是list.stream().skip(1).sorted().collect(Collectors.toList())).

Here the result is the map where the key is the first element in anagram set (first occurred in the input file) and the value is the rest elements sorted alphabetically (I make a sublist to skip the first element, then move them into TreeSet to perform sorting; an alternative would be list.stream().skip(1).sorted().collect(Collectors.toList())).

示例用法:

getAnagrams(new StringReader("abc cab tat aaa\natt tat bbb\ntta\ncabr\nrbac cab crab cabrc cabr"))
        .entrySet().forEach(System.out::println);

这篇关于使用Java 8搜索字谜的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆