当我从网站提取数据时的异常 [英] Exceptions while I am extracting data from a Web site

查看:158
本文介绍了当我从网站提取数据时的异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Jsoup从网站的邮政编码提取数据。邮政编码从文本文件
中读取,结果写在控制台上。我有大约1500个邮政编码。该程序抛出两种异常:

I am using Jsoup to extract data by zip codes from a Web site.The zip codes are read from a text file and the results are written at the console. I have around 1500 zip codes. The program throws two kinds of exceptions:

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=500, URL=http://www.moving.com/real-estate/city-profile/...

java.net.SocketTimeoutException: Read timed out

我以为解决方案是只读少数几个数据。所以,我使用一个柜台,从文本文件计数200邮政编码,我停止程序5分钟后,我有200个邮政编码的数据。
正如我所说,我还有例外。到目前为止,当我看到异常时,我复制粘贴可用数据,然后继续使用以下邮政编码。
但是我想读取所有数据而不中断。
可以这样吗?任何提示将不胜感激。谢谢你提前!

I thought the solution is to read only few data at the time. So, I used a counter, to count 200 zip codes from the text file and I stop the program for 5 minutes after I have data for 200 zip codes. As I said, I still have the exceptions. So far, when I see the exception, I copy paste the available data, and I continue after that with the following zip codes. But I want to read all data without interruptions. Can be this possible? Any hint will be appreciated. Thank you in advance!

这是我读取所有数据的代码:

This is my code for reading all data:

    while (br.ready())
        {
            count++;

            String s = br.readLine();
            String str="http://www.moving.com/real-estate/city-profile/results.asp?Zip="+s; 
            Document doc = Jsoup.connect(str).get();

            for (Element table : doc.select("table.DataTbl"))
            {
                for (Element row : table.select("tr")) 
                {
                    Elements tds = row.select("td");
                    if (tds.size() > 1)
                    {
                        if (tds.get(0).text().contains("Per capita income"))
                            System.out.println(s+","+tds.get(2).text());
                    }
                }
            }
            if(count%200==0)
            {
                Thread.sleep(300000);
                System.out.println("Stoped for 5 minutes");
            }
        }


推荐答案

更新这行文档doc = Jsoup.connect(str).get(); 将超时设置为:

Update this line Document doc = Jsoup.connect(str).get(); to set the timeout as:

        Connection conn = Jsoup.connect(str);
        conn.timeout(300000); //5 minutes
        Document doc = conn.get();

这篇关于当我从网站提取数据时的异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆