Jsoup将内容保存到数据库中 [英] Jsoup save content into the database

查看:148
本文介绍了Jsoup将内容保存到数据库中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个url数组,我想存储我在数据库中读取的url中的信息。我的问题是我的数据列表太大url如果读取序列化数据库中存储的每个url上面的露水需要时间。

I have an array of urls, I want to store information from the url I read it in the database. My problem is the list of my data too large url if read serialize each url from above dew under stored in the database will take time.

我知道有办法使用线程来操纵但我不知道该怎么做,请帮助我。或者你的方法

I know there is a way to use thread to manipulate but I do not know how to do, please help me. Or whatever your method

try {
    String lstUrls = "http://www.java2s.com/Tutorials/Java/Scala/index.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0020__Scala_Variables.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0040__Scala_Variable_Declarations.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0060__Scala_Semicolons.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0080__Scala_Code_Blocks.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0090__Scala_Comments.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0100__Scala_Type_Hierarchy.htm\n";
    String[] urls = lstUrls.split("\n");
    for (String url : urls) {
        Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
        Elements select = doc.select("div.row");
        String html = select.html();
        System.out.println(html);
        /*
         insert html to database
         */
    }
} catch (IOException ex) {
    ex.printStackTrace();
}


推荐答案

使用多个线程进行检索数据,您可以这样做:

To use multiple threads for retrieving the data, you can do something like this:

    Executor ex = Executors.newFixedThreadPool(3);
    String lstUrls = "http://www.java2s.com/Tutorials/Java/Scala/index.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0020__Scala_Variables.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0040__Scala_Variable_Declarations.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0060__Scala_Semicolons.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0080__Scala_Code_Blocks.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0090__Scala_Comments.htm\n"
            + "http://www.java2s.com/Tutorials/Java/Scala/0100__Scala_Type_Hierarchy.htm\n";
    String[] urls = lstUrls.split("\n");
    for (final String url : urls) {
        try {
            ex.execute(new Runnable() {
                @Override
                public void run() {
                    try {
                        Document doc = Jsoup
                                .connect(url)
                                .userAgent(
                                        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
                                .get();
                        Elements select = doc.select("div.row");
                        String html = select.html();
                        System.out.println(html);
                        /*
                         * insert html to database
                         */
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            });
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

这将使用3个线程同时处理网址,如果你想使用超过3个线程更改此行 Executor ex = Executors.newFixedThreadPool(3); 并替换 3 无论你想要什么号码。

This will use 3 threads to process the urls concurrently, if you want to use more then 3 threads change this line Executor ex = Executors.newFixedThreadPool(3); and replace 3 with whatever number you want.

你可以找到更多关于执行者在这里

这篇关于Jsoup将内容保存到数据库中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆