为什么这个正则表达式没有给出预期的输出? [英] Why this regex not giving expected output?
问题描述
我有包含一些值的字符串,如下所示.我想用一些新文本替换包含特定 customerId 的 html img 标签.我尝试了没有给我预期输出的小型 Java 程序.这是程序信息
i have string which contains some value as given below. i want to replace the html img tags containing specific customerId with some new text. i tried small java program which is not giving me expected output.here is the program info
我的输入字符串是
String inputText = "Starting here.. <img src=\"getCustomers.do?custCode=2&customerId=3334¶m1=123/></p>"
+ "<p>someText</p><img src=\"getCustomers.do?custCode=2&customerId=3340¶m2=456/> ..Ending here";
正则表达式是
String regex = "(?s)\\<img.*?customerId=3340.*?>";
我想放入输入字符串中的新文本
new text i want to put inside input string
编辑开始:
String newText = "<img src=\"getCustomerNew.do\">";
编辑结束:
现在我在做
String outputText = inputText.replaceAll(regex, newText);
输出是
Starting here.. Replacing Text ..Ending here
但我的预期输出是
Starting here.. <img src=\"getCustomers.do?custCode=2&customerId=3334¶m1=123/></p><p>someText</p>Replacing Text ..Ending here
请注意,在我的预期输出中,只有包含 customerId=3340 的 img 标记被替换为替换文本.我不明白为什么在输出中我得到两个 img 标签都被重新显示?
推荐答案
正如其他人在评论中告诉您的那样,HTML 不是常规语言,因此使用正则表达式来操作它通常很痛苦.最好的选择是使用 HTML 解析器.我以前没有用过 Jsoup,但在谷歌上搜索了一下,似乎你需要这样的东西:
As other people have told you in the comments, HTML is not a regular language so using regex for manipulating it is usually painful. Your best option is to use an HTML parser. I haven't used Jsoup before, but googling a little bit it seems you need something like:
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
public class MyJsoupExample {
public static void main(String args[]) {
String inputText = "<html><head></head><body><p><img src=\"getCustomers.do?custCode=2&customerId=3334¶m1=123\"/></p>"
+ "<p>someText <img src=\"getCustomers.do?custCode=2&customerId=3340¶m2=456\"/></p></body></html>";
Document doc = Jsoup.parse(inputText);
Elements myImgs = doc.select("img[src*=customerId=3340");
for (Element element : myImgs) {
element.replaceWith(new TextNode("my replaced text", ""));
}
System.out.println(doc.toString());
}
}
基本上代码获取img
节点列表,其中src
属性包含给定的字符串
Basically the code gets the list of img
nodes with a src
attribute containing a given string
Elements myImgs = doc.select("img[src*=customerId=3340");
然后遍历列表并用一些文本替换这些节点.
then loop over the list and replace those nodes with some text.
更新
如果您不想用文本替换整个 img
节点,而是需要为其 src
属性赋予一个新值,那么您可以替换块for
循环的代码:
If you don't want to replace the whole img
node with text but instead you need to give a new value to its src
attribute then you can replace the block of the for
loop with:
element.attr("src", "my new value"));
或者如果您只想更改 src
值的一部分,那么您可以这样做:
or if you want to change just a part of the src
value then you can do:
String srcValue = element.attr("src");
element.attr("src", srcValue.replace("getCustomers.do", "getCustonerNew.do"));
这与我发布的内容非常相似 在此线程中.
which is very similar to what I posted in this thread.
这篇关于为什么这个正则表达式没有给出预期的输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!