匹配模式并使用Java 8 Stream将流写入文件 [英] Match a pattern and write the stream to a file using Java 8 Stream
问题描述
我正在尝试读取一个大文件,并将引号"中的文本提取出来,并将这些行放入集合中,然后使用Java 8 Stream
将集合的内容写入文件中./p>
I'm trying to read a huge file and extract the text within "quotes" and put the lines into a set and write the content of the set to a file using Java 8 Stream
.
public class DataMiner {
private static final Pattern quoteRegex = Pattern.compile("\"([^\"]*)\"");
public static void main(String[] args) {
String fileName = "c://exec.log";
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
Set<String> dataSet = stream.
//How do I Perform pattern match here
.collect(Collectors.toSet());
Files.write(Paths.get(fileName), dataSet);
} catch (IOException e) {
e.printStackTrace();
}
}
}
请帮助我.谢谢!
问题的答案.
- 不,没有多引号的文字.
- 我本可以使用简单的循环.但是我想使用Java 8流
推荐答案
Unfortunately, the Java regular expression classes don't provide a stream for matched results, only a splitAsStream()
method, but you don't want split.
注意:已在Java 9中添加为 Matcher.results().
不过,您可以自己为其创建通用帮助程序类:
You can however create a generic helper class for it yourself:
public final class PatternStreamer {
private final Pattern pattern;
public PatternStreamer(String regex) {
this.pattern = Pattern.compile(regex);
}
public Stream<MatchResult> results(CharSequence input) {
List<MatchResult> list = new ArrayList<>();
for (Matcher m = this.pattern.matcher(input); m.find(); )
list.add(m.toMatchResult());
return list.stream();
}
}
Then your code becomes easy by using flatMap()
:
private static final PatternStreamer quoteRegex = new PatternStreamer("\"([^\"]*)\"");
public static void main(String[] args) throws Exception {
String inFileName = "c:\\exec.log";
String outFileName = "c:\\exec_quoted.txt";
try (Stream<String> stream = Files.lines(Paths.get(inFileName))) {
Set<String> dataSet = stream.flatMap(quoteRegex::results)
.map(r -> r.group(1))
.collect(Collectors.toSet());
Files.write(Paths.get(outFileName), dataSet);
}
}
由于您一次只能处理一行,因此临时 List
很好.如果输入字符串很长并且匹配很多,那么 Spliterator
是一个更好的选择.请参见如何创建正则表达式匹配流?
Since you only process a line at a time, the temporary List
is fine. If the input string is very long and will have a lot of matches, then a Spliterator
would be a better choice. See How do I create a Stream of regex matches?
这篇关于匹配模式并使用Java 8 Stream将流写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!