如何创建的Java Web爬虫? [英] How to create web crawler in java?

查看:148
本文介绍了如何创建的Java Web爬虫?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好我想在Java中创建一个网络爬虫中,我希望以检索从网页标题一样的一些数据,描述和存储在数据库中的DATAS

Hi i want to create a web crawler in java in which i want to retrive some data like title, description from the web page and store the datas in database

推荐答案

如果你想要做自己使用包含的 API HttpClient的一>

If you want to do your own use the included HttpClient in the android API.

HttpClient的实例的使用(你只需要分析出:

Example usage of HttpClient (you only need to parse out the :

public class HttpTest {
    public static void main(String... args) 
    throws ClientProtocolException, IOException {
        crawlPage("http://www.google.com/");
    }

    static Set<String> checked = new HashSet<String>();

    private static void crawlPage(String url) throws ClientProtocolException, IOException {

        if (checked.contains(url))
            return;

        checked.add(url);

        System.out.println("Crawling: " + url);

        HttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet("http://www.google.com");
        HttpResponse response = client.execute(request);

        Reader reader = null;
        try {
            reader = new InputStreamReader(response.getEntity().getContent());

            Links links = new Links();
            new ParserDelegator().parse(reader, links, true);

            for (String link : links.list) 
                if (link.startsWith("http://"))
                    crawlPage(link);

        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }



    static class Links extends HTMLEditorKit.ParserCallback {

        List<String> list = new LinkedList<String>();

        public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
            if (t == HTML.Tag.A)
                list.add(a.getAttribute(HTML.Attribute.HREF).toString());
        }
    }
}

这篇关于如何创建的Java Web爬虫?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆