如何在Java中提取网页文本内容? [英] how to extract web page textual content in java?
本文介绍了如何在Java中提取网页文本内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在寻找一种使用jdk或其他库从网页(最初为html)提取文本的方法.请帮助
i am looking for a method to extract text from web page (initially html) using jdk or another library . please help
谢谢
推荐答案
Use a HTML parser if at all possible; there are many available for Java.
或者您可以像许多人一样使用正则表达式.但是,通常不建议这样做,除非您进行的处理非常简单.
Or you can use regex like many people do. This is generally not advisable, however, unless you're doing very simplistic processing.
- Java HTML Parsing
- Which Html Parser is best?
- Any good Java HTML parsers?
- recommendations for a java HTML parser/editor
- What HTML parsing libraries do you recommend in Java
文本提取:
- Text Extraction from HTML Java
- Text extraction with java html parsers
标签剥离:
- 在Java中剥离HTML标记
- 如何剥离HTML属性,但" 中的src"和"alt"
- 从Java字符串中删除HTML
- Stripping HTML tags in Java
- How to strip HTML attributes except "src" and "alt" in JAVA
- Removing HTML from a Java String
这篇关于如何在Java中提取网页文本内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文