如何创建的Java Web爬虫？ [英] How to create web crawler in java?

查看：148 发布时间：2016/3/2 18:42:00 java android

本文介绍了如何创建的Java Web爬虫？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好我想在Java中创建一个网络爬虫中，我希望以检索从网页标题一样的一些数据，描述和存储在数据库中的DATAS

Hi i want to create a web crawler in java in which i want to retrive some data like title, description from the web page and store the datas in database

推荐答案

如果你想要做自己使用包含的 API HttpClient的一>

If you want to do your own use the included HttpClient in the android API.

HttpClient的实例的使用（你只需要分析出：

Example usage of HttpClient (you only need to parse out the :

public class HttpTest {
    public static void main(String... args) 
    throws ClientProtocolException, IOException {
        crawlPage("http://www.google.com/");
    }

    static Set<String> checked = new HashSet<String>();

    private static void crawlPage(String url) throws ClientProtocolException, IOException {

        if (checked.contains(url))
            return;

        checked.add(url);

        System.out.println("Crawling: " + url);

        HttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet("http://www.google.com");
        HttpResponse response = client.execute(request);

        Reader reader = null;
        try {
            reader = new InputStreamReader(response.getEntity().getContent());

            Links links = new Links();
            new ParserDelegator().parse(reader, links, true);

            for (String link : links.list) 
                if (link.startsWith("http://"))
                    crawlPage(link);

        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }



    static class Links extends HTMLEditorKit.ParserCallback {

        List<String> list = new LinkedList<String>();

        public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
            if (t == HTML.Tag.A)
                list.add(a.getAttribute(HTML.Attribute.HREF).toString());
        }
    }
}

这篇关于如何创建的Java Web爬虫？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何创建的Java Web爬虫？ [英] How to create web crawler in java?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何创建的Java Web爬虫？ [英] How to create web crawler in java?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭