将网站内容读入字符串 [英] Reading website's contents into string
问题描述
目前我正在开发一个可用于读取网址指定网站内容的类。我只是用 java.io
和 java.net
开始我的冒险,所以我需要咨询我的设计。
Currently I'm working on a class that can be used to read the contents of the website specified by the url. I'm just beginning my adventures with java.io
and java.net
so I need to consult my design.
用法:
TextURL url = new TextURL(urlString);
String contents = url.read();
我的代码:
package pl.maciejziarko.util;
import java.io.*;
import java.net.*;
public final class TextURL
{
private static final int BUFFER_SIZE = 1024 * 10;
private static final int ZERO = 0;
private final byte[] dataBuffer = new byte[BUFFER_SIZE];
private final URL urlObject;
public TextURL(String urlString) throws MalformedURLException
{
this.urlObject = new URL(urlString);
}
public String read()
{
final StringBuilder sb = new StringBuilder();
try
{
final BufferedInputStream in =
new BufferedInputStream(urlObject.openStream());
int bytesRead = ZERO;
while ((bytesRead = in.read(dataBuffer, ZERO, BUFFER_SIZE)) >= ZERO)
{
sb.append(new String(dataBuffer, ZERO, bytesRead));
}
}
catch (UnknownHostException e)
{
return null;
}
catch (IOException e)
{
return null;
}
return sb.toString();
}
//Usage:
public static void main(String[] args)
{
try
{
TextURL url = new TextURL("http://www.flickr.com/explore/interesting/7days/");
String contents = url.read();
if (contents != null)
System.out.println(contents);
else
System.out.println("ERROR!");
}
catch (MalformedURLException e)
{
System.out.println("Check you the url!");
}
}
}
我的问题是:
这是实现我想要的好方法吗?还有更好的解决方案吗?
My question is: Is it a good way to achieve what I want? Are there any better solutions?
我特别不喜欢 sb.append(new String(dataBuffer,ZERO,bytesRead));
但我无法以不同的方式表达它。每次迭代都创建一个新的String是否合适?我想没有。
I particularly didn't like sb.append(new String(dataBuffer, ZERO, bytesRead));
but I wasn't able to express it in a different way. Is it good to create a new String every iteration? I suppose no.
还有其他任何弱点吗?
提前致谢!
推荐答案
考虑使用 URLConnection
。此外,您可能希望利用 IOUtils来自 Apache Commons IO 的
,以便更轻松地阅读字符串。例如:
Consider using URLConnection
instead. Furthermore you might want to leverage IOUtils
from Apache Commons IO to make the string reading easier too. For example:
URL url = new URL("http://www.example.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding(); // ** WRONG: should use "con.getContentType()" instead but it returns something like "text/html; charset=UTF-8" so this value must be parsed to extract the actual encoding
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);
如果您不想使用 IOUtils
我可能会在以下内容上重写该行:
If you don't want to use IOUtils
I'd probably rewrite that line above something like:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[8192];
int len = 0;
while ((len = in.read(buf)) != -1) {
baos.write(buf, 0, len);
}
String body = new String(baos.toByteArray(), encoding);
这篇关于将网站内容读入字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!