如何使用Java从网页中读取文本? [英] How to read a text from a web page with Java?

查看:128
本文介绍了如何使用Java从网页中读取文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从网页上阅读文字。我不想获取网页的HTML代码。我找到了这段代码:

I want to read the text from a web page. I don't want to get the web page's HTML code. I found this code:

    try {
        // Create a URL for the desired page
        URL url = new URL("http://www.uefa.com/uefa/aboutuefa/organisation/congress/news/newsid=1772321.html#uefa+moving+with+tide+history");       

        // Read all the text returned by the server
        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
        String str;
        while ((str = in.readLine()) != null) {
            str = in.readLine().toString();
            System.out.println(str);
            // str is one line of text; readLine() strips the newline character(s)
        }
        in.close();
    } catch (MalformedURLException e) {
    } catch (IOException e) {
    }

但是这段代码给了我网页的HTML代码。我想在此页面中获取整个文本。我怎么能用Java做这个?

but this code gives me the HTML code of the web page. I want to get the whole text inside this page. How can I do this with Java?

推荐答案

你可能想看看 jsoup 为此:

String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html); 
String text = doc.body().text(); // "An example link"

此示例摘自其网站上的一个。

This example is an extract from one on their site.

这篇关于如何使用Java从网页中读取文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆