获得从HTML原始文本 [英] Get raw text from html

查看:146
本文介绍了获得从HTML原始文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

林在Android上开发一个相当基本的水平。

Im on quite a basic level of android development.

我想从一个页面的文本,如http://www.google.com。 (页面我将使用只会有文字,所以没有图片或者类似的东西) 因此,必须明确:我想编写了一个页面就成等串在我的应用程序中的文本

I would like to get text from a page such as "http://www.google.com". (The page i will be using will only have text, so no pictures or something like that) So, to be clear: I want to get the text written on a page into etc. a string in my application.

我想这code,但即时通讯甚至不知道这是否是我想要做什么。

I tried this code, but im not even sure if it does what i want.

URL url = new URL(/*"http://www.google.com");
URLConnection connection = url.openConnection();
// Get the response     
BufferedReader rd = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = "";

我不能得到任何的文字,从它无论如何。我应该怎么办呢?

I cant get any text from it anyhow. How should I do this?

推荐答案

从样品code,你给你甚至不读取请求的响应。我会得到的HTML有以下code

From the sample code you gave you are not even reading the response from the request. I would get the html with the following code

URL u = new URL("http://www.google.com");
URLConnection conn = u.openConnection();
BufferedReader in = new BufferedReader(
                        new InputStreamReader(
                            conn.getInputStream()));
StringBuffer buffer = new StringBuffer();
String inputLine;
while ((inputLine = in.readLine()) != null) 
    buffer.append(inputLine);
in.close();
System.out.println(buffer.toString());

您会从那里需要将字符串传递到某种HTML解析器,如果你想只有文字。从我所听到的把JTidy 将是这样做的一个很好的库,但是我从来没有使用过任何Java HTML解析库。

From there you would need to pass the string into some kind of html parser if you want only the text. From what I've heard JTidy would is a good library for this however I have never used any Java html parsing libraries.

这篇关于获得从HTML原始文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆