Java中的Regex环顾四周结构:建议优化 [英] Regex lookaround construct in Java: advise on optimization needed
问题描述
我正在尝试在逗号分隔的列表中搜索文件名:
I am trying to search for filenames in a comma-separated list in:
text.txt,temp_doc.doc,template.tmpl,empty.zip
text.txt,temp_doc.doc,template.tmpl,empty.zip
我使用Java的regex实现.输出要求如下:
I use Java's regex implementation. Requirements for output are as follows:
- 仅显示文件名,而不显示其扩展名
- 排除以"temp_"开头的文件
它应该像这样:
文本
模板
空
到目前为止,我已经设法编写了令人满意的正则表达式来应对第一个任务:
So far I have managed to write more or less satisfactory regex to cope with the first task:
[^\\.,]++(?=\\.[^,]*+,?+)
我相信要使其符合第二个要求,最好的选择是使用环视结构,但不确定如何编写可靠且优化的表达式.尽管以下正则表达式确实可以满足要求,但如果仅依靠显式的最大文件名长度,则别无其他理由,这显然是一个有缺陷的解决方案.
I believe to make it comply with the second requirement best option is to use lookaround constructs, but not sure how to write a reliable and optimized expression. While the following regex does seem to do what is required, it is obviously a flawed solution if for no other reason than it relies on explicit maximum filename length.
(?!temp_|emp_|mp_|p_|_)(?<!temp_\\w{0,50})[^\\.,]++(?=\\.[^,]*+,?+)
P.S.我只研究了正则表达式几天,所以请不要嘲笑这个新手风格的过于复杂的代码:)
P.S. I've been studying regexes only for a few days, so please don't laugh at this newbie-style overcomplicated code :)
推荐答案
这是怎么回事:
Pattern regex = Pattern.compile(
"\\b # Start at word boundary\n" +
"(?!temp_) # Exclude words starting with temp_\n" +
"[^,]+ # Match one or more characters except comma\n" +
"(?=\\.) # until the last available dot",
Pattern.COMMENTS);
这也允许文件名中包含点.
This also allows dots within filenames.
这篇关于Java中的Regex环顾四周结构:建议优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!