Jmeter - beanshell 中的正则表达式(matcher()/pattern())正在切割国家字符 [英] Jmeter - regex in beanshell (matcher()/pattern() ) is cutting national characters
问题描述
我需要从服务器响应数据中删除一些单词.
i need to cut some words from server response data.
使用我得到的正则表达式提取器
Use Regular Expression Extractor I get
<span class="snippet_word">Działalność</span> <span class="snippet_word">lecznicza</span>.</a>
我只需要:Działalność lecznicza"
from that i need just: "Działalność lecznicza"
所以我在 Beanshell 中编写了一个程序,它应该可以做到这一点,因为我得到了一个问题
so i write a program in Beanshell which should do that and there's a problem because i get
"lecznicza lecznicza"
"lecznicza lecznicza"
这是我的程序:
import java.util.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
String pattern = "\w+(?=\<)";
String co = vars.get("tresc");
int len = Integer.parseInt(vars.get("length"));
String phrase="";
StringBuffer sb = new StringBuffer();
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(co);
for(i=0; i < len ;i++){
if (m.find()){
strbuf = new StringBuffer(m.group(0));
}
else {
phrase="notfound";
}
sb.append(" ");
sb.append(strbuf);
}
phrase = sb.toString();
return phrase;
tresc - 是我提取模式词的来源.长度 - 告诉我我提取了多少个单词.
tresc - is my source from I extract pattern word. Length - tells me how many words i'm extracting.
对于没有国家字符的短语,程序运行良好.这就是为什么我认为编码或此处某处存在一些问题:
Program is working fine for phrase without national characters. Thats why I think there is some problem with encoding or somewhere here:
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(co);
但我不知道如何更改我的代码.
but i don't know how to change my code.
推荐答案
w
与 unicode 不匹配.要匹配正则表达式中的 unicode,您可以使用 p{L}
:
w
does not match unicode. To match unicode in regex, you can use p{L}
:
String pattern = "\p{L}+(?=\<)";
尽管对于此类工作,我建议使用 XML 解析器,因为正则表达式完全不适合解析 这篇文章
Although for this type of work I would recommend using an XML parser as regular expressions are completely unsuitable for parsing HTML/XML as described in this post
这篇关于Jmeter - beanshell 中的正则表达式(matcher()/pattern())正在切割国家字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!