Java中的Regex环顾四周结构:建议优化 [英] Regex lookaround construct in Java: advise on optimization needed

查看:87
本文介绍了Java中的Regex环顾四周结构:建议优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在逗号分隔的列表中搜索文件名:

I am trying to search for filenames in a comma-separated list in:

text.txt,temp_doc.doc,template.tmpl,empty.zip

text.txt,temp_doc.doc,template.tmpl,empty.zip

我使用Java的regex实现.输出要求如下:

I use Java's regex implementation. Requirements for output are as follows:

  1. 仅显示文件名,而不显示其扩展名
  2. 排除以"temp_"开头的文件

它应该像这样:

文本

模板

到目前为止,我已经设法编写了令人满意的正则表达式来应对第一个任务:

So far I have managed to write more or less satisfactory regex to cope with the first task:

[^\\.,]++(?=\\.[^,]*+,?+)

我相信要使其符合第二个要求,最好的选择是使用环视结构,但不确定如何编写可靠且优化的表达式.尽管以下正则表达式确实可以满足要求,但如果仅依靠显式的最大文件名长度,则别无其他理由,这显然是一个有缺陷的解决方案.

I believe to make it comply with the second requirement best option is to use lookaround constructs, but not sure how to write a reliable and optimized expression. While the following regex does seem to do what is required, it is obviously a flawed solution if for no other reason than it relies on explicit maximum filename length.

(?!temp_|emp_|mp_|p_|_)(?<!temp_\\w{0,50})[^\\.,]++(?=\\.[^,]*+,?+)

P.S.我只研究了正则表达式几天,所以请不要嘲笑这个新手风格的过于复杂的代码:)

P.S. I've been studying regexes only for a few days, so please don't laugh at this newbie-style overcomplicated code :)

推荐答案

这是怎么回事:

Pattern regex = Pattern.compile(
    "\\b        # Start at word boundary\n" +
    "(?!temp_)  # Exclude words starting with temp_\n" +
    "[^,]+      # Match one or more characters except comma\n" +
    "(?=\\.)    # until the last available dot", 
    Pattern.COMMENTS);

这也允许文件名中包含点.

This also allows dots within filenames.

这篇关于Java中的Regex环顾四周结构:建议优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆