为什么这个正则表达式没有给出预期的输出? [英] Why this regex not giving expected output?

查看:43
本文介绍了为什么这个正则表达式没有给出预期的输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有包含一些值的字符串,如下所示.我想用一些新文本替换包含特定 customerId 的 html img 标签.我尝试了没有给我预期输出的小型 Java 程序.这是程序信息

i have string which contains some value as given below. i want to replace the html img tags containing specific customerId with some new text. i tried small java program which is not giving me expected output.here is the program info

我的输入字符串是

 String inputText = "Starting here.. <img src=\"getCustomers.do?custCode=2&customerId=3334&param1=123/></p>"
    + "<p>someText</p><img src=\"getCustomers.do?custCode=2&customerId=3340&param2=456/> ..Ending here";

正则表达式是

  String regex = "(?s)\\<img.*?customerId=3340.*?>";

我想放入输入字符串中的新文本

new text i want to put inside input string

编辑开始:

String newText = "<img src=\"getCustomerNew.do\">";

编辑结束:

现在我在做

  String outputText = inputText.replaceAll(regex, newText);

输出是

 Starting here.. Replacing Text ..Ending here

但我的预期输出是

 Starting here.. <img src=\"getCustomers.do?custCode=2&customerId=3334&param1=123/></p><p>someText</p>Replacing Text ..Ending here

请注意,在我的预期输出中,只有包含 customerId=3340 的 img 标记被替换为替换文本.我不明白为什么在输出中我得到两个 img 标签都被重新显示?

推荐答案

正如其他人在评论中告诉您的那样,HTML 不是常规语言,因此使用正则表达式来操作它通常很痛苦.最好的选择是使用 HTML 解析器.我以前没有用过 Jsoup,但在谷歌上搜索了一下,似乎你需要这样的东西:

As other people have told you in the comments, HTML is not a regular language so using regex for manipulating it is usually painful. Your best option is to use an HTML parser. I haven't used Jsoup before, but googling a little bit it seems you need something like:

import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;

public class MyJsoupExample {
    public static void main(String args[]) {
        String inputText = "<html><head></head><body><p><img src=\"getCustomers.do?custCode=2&customerId=3334&param1=123\"/></p>"
            + "<p>someText <img src=\"getCustomers.do?custCode=2&customerId=3340&param2=456\"/></p></body></html>";
        Document doc = Jsoup.parse(inputText);
        Elements myImgs = doc.select("img[src*=customerId=3340");
        for (Element element : myImgs) {
            element.replaceWith(new TextNode("my replaced text", ""));
        }
        System.out.println(doc.toString());
    }
}

基本上代码获取img节点列表,其中src属性包含给定的字符串

Basically the code gets the list of img nodes with a src attribute containing a given string

Elements myImgs = doc.select("img[src*=customerId=3340");

然后遍历列表并用一些文本替换这些节点.

then loop over the list and replace those nodes with some text.

更新

如果您不想用文本替换整个 img 节点,而是需要为其 src 属性赋予一个新值,那么您可以替换块for 循环的代码:

If you don't want to replace the whole img node with text but instead you need to give a new value to its src attribute then you can replace the block of the for loop with:

element.attr("src", "my new value"));

或者如果您只想更改 src 值的一部分,那么您可以这样做:

or if you want to change just a part of the src value then you can do:

String srcValue = element.attr("src");
element.attr("src", srcValue.replace("getCustomers.do", "getCustonerNew.do"));

这与我发布的内容非常相似 在此线程中.

which is very similar to what I posted in this thread.

这篇关于为什么这个正则表达式没有给出预期的输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆