如何使用Java从网页中读取文本? [英] How to read a text from a web page with Java?
本文介绍了如何使用Java从网页中读取文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想从网页上阅读文字。我不想获取网页的HTML代码。我找到了这段代码:
I want to read the text from a web page. I don't want to get the web page's HTML code. I found this code:
try {
// Create a URL for the desired page
URL url = new URL("http://www.uefa.com/uefa/aboutuefa/organisation/congress/news/newsid=1772321.html#uefa+moving+with+tide+history");
// Read all the text returned by the server
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String str;
while ((str = in.readLine()) != null) {
str = in.readLine().toString();
System.out.println(str);
// str is one line of text; readLine() strips the newline character(s)
}
in.close();
} catch (MalformedURLException e) {
} catch (IOException e) {
}
但是这段代码给了我网页的HTML代码。我想在此页面中获取整个文本。我怎么能用Java做这个?
but this code gives me the HTML code of the web page. I want to get the whole text inside this page. How can I do this with Java?
推荐答案
你可能想看看 jsoup 为此:
String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
String text = doc.body().text(); // "An example link"
此示例摘自其网站上的一个。
This example is an extract from one on their site.
这篇关于如何使用Java从网页中读取文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文