在Android的JSOUP中使用正则表达式突出显示 [英] Highlighting using Regex in JSOUP for android
问题描述
我正在使用JSoup解析器查找html文档的特定部分(由regex定义),并通过将找到的字符串包装在<span>
标记中来突出显示它.这是我执行突出显示的代码-
I am using JSoup parser to find particular parts of a html document (defined by regex) and highlight it by wrapping the found string in <span>
tag. Here is my code that does the highlighting -
public String highlightRegex() {
Document doc = Jsoup.parse(htmlContent);
NodeTraversor nd = new NodeTraversor(new NodeVisitor() {
@Override
public void tail(Node node, int depth) {
if (node instanceof Element) {
Element elem = (Element) node;
StringBuffer obtainedText;
for(Element tn : elem.getElementsMatchingOwnText(pat)) {
Log.e("HELLO", tn.baseUri());
Log.e("HELLO", tn.text());
obtainedText = new StringBuffer(tn.ownText());
mat = pat.matcher(obtainedText.toString());
int nextStart = 0;
while(mat.find(nextStart)) {
obtainedText = obtainedText.replace(mat.start(), mat.end(), "<span>" + mat.group() + "</span>");
nextStart = mat.end() + 1;
}
tn.text(obtainedText.toString());
Log.e("HELLO" , "AFTER:" + tn.text());
}
}
}
@Override
public void head(Node node, int depth) {
}
});
nd.traverse(doc.body());
return doc.toString();
}
它确实可以工作,但是标签<span>
在Web视图中可见.我究竟做错了什么?
It does work but the tag <span>
is visible inside the webview. What am I doing wrong?
推荐答案
似乎没人知道.这是我想出的一些代码.缓慢且效率低下,但仍然可以工作.接受建议:)
Looks like no one knows. Here's some code that i've come up with. Slow and inefficient but works anyway. Suggestions are accepted :)
此类可用于使用正则表达式突出显示任何html.
This class can be used to highlight any html using a regex.
public class Highlighter {
private String regex;
private String htmlContent;
Pattern pat;
Matcher mat;
public Highlighter(String searchString, String htmlString) {
regex = buildRegexFromQuery(searchString);
htmlContent = htmlString;
pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
}
public String getHighlightedHtml() {
Document doc = Jsoup.parse(htmlContent);
final List<TextNode> nodesToChange = new ArrayList<TextNode>();
NodeTraversor nd = new NodeTraversor(new NodeVisitor() {
@Override
public void tail(Node node, int depth) {
if (node instanceof TextNode) {
TextNode textNode = (TextNode) node;
String text = textNode.getWholeText();
mat = pat.matcher(text);
if(mat.find()) {
nodesToChange.add(textNode);
}
}
}
@Override
public void head(Node node, int depth) {
}
});
nd.traverse(doc.body());
for (TextNode textNode : nodesToChange) {
Node newNode = buildElementForText(textNode);
textNode.replaceWith(newNode);
}
return doc.toString();
}
private static String buildRegexFromQuery(String queryString) {
String regex = "";
String queryToConvert = queryString;
/* Clean up query */
queryToConvert = queryToConvert.replaceAll("[\\p{Punct}]*", " ");
queryToConvert = queryToConvert.replaceAll("[\\s]*", " ");
String[] regexArray = queryString.split(" ");
regex = "(";
for(int i = 0; i < regexArray.length - 1; i++) {
String item = regexArray[i];
regex += "(\\b)" + item + "(\\b)|";
}
regex += "(\\b)" + regexArray[regexArray.length - 1] + "[a-zA-Z0-9]*?(\\b))";
return regex;
}
private Node buildElementForText(TextNode textNode) {
String text = textNode.getWholeText().trim();
ArrayList<MatchedWord> matchedWordSet = new ArrayList<MatchedWord>();
mat = pat.matcher(text);
while(mat.find()) {
matchedWordSet.add(new MatchedWord(mat.start(), mat.end()));
}
StringBuffer newText = new StringBuffer(text);
for(int i = matchedWordSet.size() - 1; i >= 0; i-- ) {
String wordToReplace = newText.substring(matchedWordSet.get(i).start, matchedWordSet.get(i).end);
wordToReplace = "<b>" + wordToReplace+ "</b>";
newText = newText.replace(matchedWordSet.get(i).start, matchedWordSet.get(i).end, wordToReplace);
}
return new DataNode(newText.toString(), textNode.baseUri());
}
class MatchedWord {
public int start;
public int end;
public MatchedWord(int start, int end) {
this.start = start;
this.end = end;
}
}
}
您必须调用这两种方法来获取突出显示的html-
you have to call these two methods to get the highlighted html -
Highlighter hl = new Highlighter("abc def", htmlString);
String newhtmlString = hl.getHighlightedHtml();
这将突出显示与正则表达式(abc)|(def)*
相匹配的所有内容.
您可以通过modifying buildRegexFromQuery()
函数更改希望构建正则表达式的方式.
This will highlight everything that matches the regex (abc)|(def)*
.
You can change the way you want the regex to be built by modifying buildRegexFromQuery()
function.
这篇关于在Android的JSOUP中使用正则表达式突出显示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!