无法从有效的URL读取HTML内容 [英] Can't read in HTML content from valid URL

查看:137
本文介绍了无法从有效的URL读取HTML内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试一个简单的程序,用于从给定的URL读取HTML内容.在这种情况下,我尝试使用的URL不需要任何cookie/用户名/密码,但是仍然出现 io.IOException:服务器返回HTTP响应代码:403 错误.谁能告诉我我在做什么错? (我知道SO中也有类似的问题,但它们没有帮助):

I am trying out a simple program for reading the HTML content from a given URL. The URL I am trying in this case doesn't require any cookie/username/password, but still I am getting a io.IOException: Server returned HTTP response code: 403 error. Can anyone tell me what am I doing wrong here? (I know there are similar question in SO, but they didn't help):

    import java.net.*;
import java.io.*;
import java.net.MalformedURLException;
import java.io.IOException;
public class urlcont {
public static void main(String[] args) {
try {
  URL u = new URL("http://www.amnesty.org/");
  URLConnection uc = u.openConnection();
  uc.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
  uc.connect();
  InputStream in = uc.getInputStream();
  int b;
  File f = new File("C:\\Users\\kausta\\Desktop\\urlcont.txt");
  f.createNewFile();
  OutputStream s = new FileOutputStream(f);
  while ((b = in.read()) != -1) {
    s.write(b);
  }
}
catch (MalformedURLException e) {System.err.println(e);}
catch (IOException e) {System.err.println(e);} 
}
}

推荐答案

如果您可以在浏览器中而不是通过Java来获取URL,那么对我来说,这表明它们正在阻止用户通过编程方式访问该页面,代理过滤.尝试在连接上设置用户代理,以使您的代码在Web服务器上显示为Web浏览器.

If you can fetch the URL in a browser, but not via Java, that indicates, to me, that they are blocking programmatic access to the page via user-agent filtering. Try setting the user-agent on your connection so that your code appears, to the webserver, to be a web-browser.

请参见以下主题以寻求帮助:在URLConnection中设置标题的正确方法是什么?

See this thread for help on that: What is the proper way of setting headers in a URLConnection?

这篇关于无法从有效的URL读取HTML内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆