如何获取字符串中所有匹配项的位置? [英] How to get the positions of all matches in a String?

查看:122
本文介绍了如何获取字符串中所有匹配项的位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文档和一个查询(查询可能超过一个单词).我想找到该查询在文档中所有出现的位置.

I have a text document and a query (the query could be more than one word). I want to find the position of all occurrences of the query in the document.

我想到了documentText.indexOf(query)或使用正则表达式,但无法使其正常工作.

I thought of the documentText.indexOf(query) or using regular expression but I could not make it work.

我最终得到以下方法:

首先,我创建了一个名为QueryOccurrence

First, I have create a dataType called QueryOccurrence

public class QueryOccurrence implements Serializable{
  public QueryOccurrence(){}
  private int start;
  private int end;      

  public QueryOccurrence(int nameStart,int nameEnd,String nameText){
    start=nameStart;
    end=nameEnd;        
  }

  public int getStart(){
    return start;
  }

  public int getEnd(){
    return end;
  }

  public void SetStart(int i){
    start=i;
  }

  public void SetEnd(int i){
     end=i;
  }
}

然后,我在以下方法中使用了此数据类型:

Then, I have used this datatype in the following method:

    public static List<QueryOccurrence>FindQueryPositions(String documentText, String query){

    // Normalize do the following: lower case, trim, and remove punctuation
    String normalizedQuery = Normalize.Normalize(query);
    String normalizedDocument = Normalize.Normalize(documentText);

    String[] documentWords = normalizedDocument.split(" ");;               
    String[] queryArray = normalizedQuery.split(" ");


    List<QueryOccurrence> foundQueries = new ArrayList();
    QueryOccurrence foundQuery = new QueryOccurrence();

    int index = 0;

    for (String word : documentWords) {            

        if (word.equals(queryArray[0])){
            foundQuery.SetStart(index);
        }

        if (word.equals(queryArray[queryArray.length-1])){
            foundQuery.SetEnd(index);
            if((foundQuery.End()-foundQuery.Start())+1==queryArray.length){

                //add the found query to the list
                foundQueries.add(foundQuery);
                //flush the foundQuery variable to use it again
                foundQuery= new QueryOccurrence();
            }
        }

        index++;
    }
    return foundQueries;
}

此方法返回文档中所有出现的查询的列表,每个列表及其位置.

This method return a list of all occurrence of the query in the document each one with its position.

您能否建议任何更轻松,更快捷的方法来完成此任务.

Could you suggest any easer and faster way to accomplish this task.

谢谢

推荐答案

您的第一种方法是个好主意,但是String.indexOf不支持正则表达式.

Your first approach was a good idea, but String.indexOf does not support regular expressions.

另一种更简单的方法,它使用类似的方法,但是采用两步法,如下所示:

Another easier way which uses a similar approach, but in a two step method, is as follows:

List<Integer> positions = new ArrayList();
Pattern p = Pattern.compile(queryPattern);  // insert your pattern here
Matcher m = p.matcher(documentText);
while (m.find()) {
   positions.add(m.start());
}

位置将保留比赛的所有开始位置.

Where positions will hold all the start positions of the matches.

这篇关于如何获取字符串中所有匹配项的位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆