使用java从字符串中移除html标签 [英] remove html tags from string using java

查看：1107 发布时间：2018/6/15 12:05:06 java html string

本文介绍了使用java从字符串中移除html标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在撰写一个程序来读取和分离垃圾邮件和火腿邮件。现在我正在使用java的bufferedreader类读取它。我可以使用replaceAll（）方法删除任何不需要的字符，例如'（'或'。'等。我也想删除html标签，包括& amp。如何实现这一点！？$ /

谢谢

编辑：
感谢您的回应，但我已经有了正则表达式，如何将我的需求和放到一个。heres我正在使用的正则表达式。

  lines.replaceAll（[^ a-zA-Z] ，）

注意：我从txt文件中获取行
任何其他建议plss？！

解决方案

也许这会起作用：

  String noHTMLString = htmlString.replaceAll（\\<。*？>，）;

它使用正则表达式来删除所有HTML标记在字符串中。

更具体地说，它会从字符串中删除所有类似于XML的标记，所以< 1234>将被删除尽管它不是一个有效的HTML标签。但对大多数意图和目的来说都是有益的。

希望这有助于。

I am writing one program which reads and separate spam and ham emails. Now I am reading it using bufferedreader class of java. I am able to remove any unwanted characters like '(' or '.' etc, using replaceAll() method. I want to remove html tags too, including &amp. How to achieve this!?

thanks

EDIT: Thanks for the response, but I am already having a regex, how to combine both my needs and put into one. Heres the regex i am using now.
lines.replaceAll("[^a-zA-Z]", " ")
Note: I am getting lines from a txt file. Any other suggestions plss?!
解决方案
Maybe this will work:
String noHTMLString = htmlString.replaceAll("\\<.*?>","");
It uses regular expressions to remove all HTML tags in a string.

More specifically, it removes all XML like tags from a string. So <1234> will be removed even though it is not a valid HTML tag. But it's good for most intents and purposes.

Hope this helps.

这篇关于使用java从字符串中移除html标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用java从字符串中移除html标签 [英] remove html tags from string using java

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用java从字符串中移除html标签 [英] remove html tags from string using java

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭