在Android的JSOUP中使用正则表达式突出显示 [英] Highlighting using Regex in JSOUP for android

查看:71
本文介绍了在Android的JSOUP中使用正则表达式突出显示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用JSoup解析器查找html文档的特定部分(由regex定义),并通过将找到的字符串包装在<span>标记中来突出显示它.这是我执行突出显示的代码-

I am using JSoup parser to find particular parts of a html document (defined by regex) and highlight it by wrapping the found string in <span> tag. Here is my code that does the highlighting -

public String highlightRegex() {
Document doc = Jsoup.parse(htmlContent);

        NodeTraversor nd  = new NodeTraversor(new NodeVisitor() {

            @Override
            public void tail(Node node, int depth) {
                if (node instanceof Element) {

                    Element elem = (Element) node;

                    StringBuffer obtainedText;
                    for(Element tn : elem.getElementsMatchingOwnText(pat)) {

                        Log.e("HELLO", tn.baseUri());
                        Log.e("HELLO", tn.text());
                        obtainedText = new StringBuffer(tn.ownText());
                        mat = pat.matcher(obtainedText.toString());
                        int nextStart = 0;
                        while(mat.find(nextStart)) {
                            obtainedText = obtainedText.replace(mat.start(), mat.end(), "<span>" + mat.group() + "</span>");
                            nextStart = mat.end() + 1;
                        }
                        tn.text(obtainedText.toString());
                        Log.e("HELLO" , "AFTER:" + tn.text());

                    }
                }
            }

            @Override
            public void head(Node node, int depth) {        
            }
        });

        nd.traverse(doc.body());
        return doc.toString();
    }

它确实可以工作,但是标签<span>在Web视图中可见.我究竟做错了什么?

It does work but the tag <span> is visible inside the webview. What am I doing wrong?

推荐答案

似乎没人知道.这是我想出的一些代码.缓慢且效率低下,但仍然可以工作.接受建议:)

Looks like no one knows. Here's some code that i've come up with. Slow and inefficient but works anyway. Suggestions are accepted :)

此类可用于使用正则表达式突出显示任何html.

This class can be used to highlight any html using a regex.

public class Highlighter {

    private String regex;
    private String htmlContent;
    Pattern pat;
    Matcher mat;


    public Highlighter(String searchString, String htmlString) {
        regex = buildRegexFromQuery(searchString);
        htmlContent = htmlString;
        pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
    }

    public String getHighlightedHtml() {

        Document doc = Jsoup.parse(htmlContent);

        final List<TextNode> nodesToChange = new ArrayList<TextNode>();

        NodeTraversor nd  = new NodeTraversor(new NodeVisitor() {

            @Override
            public void tail(Node node, int depth) {
                if (node instanceof TextNode) {
                    TextNode textNode = (TextNode) node;
                    String text = textNode.getWholeText();

                    mat = pat.matcher(text);

                    if(mat.find()) {
                        nodesToChange.add(textNode);
                    }
                }
            }

            @Override
            public void head(Node node, int depth) {        
            }
        });

        nd.traverse(doc.body());

        for (TextNode textNode : nodesToChange) {
            Node newNode = buildElementForText(textNode);
            textNode.replaceWith(newNode);
        }
        return doc.toString();
    }

    private static String buildRegexFromQuery(String queryString) {
        String regex = "";
        String queryToConvert = queryString;

        /* Clean up query */

        queryToConvert = queryToConvert.replaceAll("[\\p{Punct}]*", " ");
        queryToConvert = queryToConvert.replaceAll("[\\s]*", " ");

        String[] regexArray = queryString.split(" ");

        regex = "(";
        for(int i = 0; i < regexArray.length - 1; i++) {
            String item = regexArray[i];
            regex += "(\\b)" + item + "(\\b)|";
        }

        regex += "(\\b)" + regexArray[regexArray.length - 1] + "[a-zA-Z0-9]*?(\\b))";
        return regex;
    }

    private Node buildElementForText(TextNode textNode) {
        String text = textNode.getWholeText().trim();

        ArrayList<MatchedWord> matchedWordSet = new ArrayList<MatchedWord>();

        mat = pat.matcher(text);

        while(mat.find()) {
            matchedWordSet.add(new MatchedWord(mat.start(), mat.end()));
        }

        StringBuffer newText = new StringBuffer(text);

        for(int i = matchedWordSet.size() - 1; i >= 0; i-- ) {
            String wordToReplace = newText.substring(matchedWordSet.get(i).start, matchedWordSet.get(i).end);
            wordToReplace = "<b>" + wordToReplace+ "</b>";
            newText = newText.replace(matchedWordSet.get(i).start, matchedWordSet.get(i).end, wordToReplace);       
        }
        return new DataNode(newText.toString(), textNode.baseUri());
    }

    class MatchedWord {
        public int start;
        public int end;

        public MatchedWord(int start, int end) {
            this.start = start;
            this.end = end;
        }
    }
}

您必须调用这两种方法来获取突出显示的html-

you have to call these two methods to get the highlighted html -

Highlighter hl = new Highlighter("abc def", htmlString);
String newhtmlString = hl.getHighlightedHtml();

这将突出显示与正则表达式(abc)|(def)*相匹配的所有内容. 您可以通过modifying buildRegexFromQuery()函数更改希望构建正则表达式的方式.

This will highlight everything that matches the regex (abc)|(def)*. You can change the way you want the regex to be built by modifying buildRegexFromQuery() function.

这篇关于在Android的JSOUP中使用正则表达式突出显示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆