通过电子表格将JSOUP的URL导入到Scrape [英] Importing URLs for JSOUP to Scrape via Spreadsheet

查看:84
本文介绍了通过电子表格将JSOUP的URL导入到Scrape的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我终于让IntelliJ工作了.我正在使用下面的代码.完美的作品.我需要它一遍又一遍地循环,并从电子表格中提取链接,以便一遍又一遍地查找不同项目上的价格.我有一个电子表格,该电子表格的C列中的一些示例URL从第2行开始.如何让JSOUP使用此电子表格中的URL,然后将其输出到D列?

I finally got IntelliJ to work. I'm using the code below. It works perfect. I need it to loop over and over and pull links from a spreadsheet to find the price over and over again on different items. I have a spreadsheet with a few sample URLs located in column C starting at row 2. How can I have JSOUP use the URLs in this spreadsheet then output to column D?

public class Scraper {

public static void main(String[] args) throws Exception {

    final Document document = Jsoup.connect("examplesite.com").get();

    for (Element row : document.select("#price")) {

        final String price = row.select("#price").text();

        System.out.println(price);
    }
}

在此先感谢您的帮助! 埃里克

Thanks in advance for any help! Eric

推荐答案

您可以使用JExcel库读取和编辑工作表:

You can use JExcel library to read and edit sheets: https://sourceforge.net/projects/jexcelapi/ .

当您下载带有库的zip文件时,还有一个非常有用的tutorial.html.

When you download the zip file with library there's also very useful tutorial.html.

注释说明:

import java.io.File;
import java.io.IOException;

import jxl.Cell;
import jxl.CellType;
import jxl.Workbook;
import jxl.write.Label;
import jxl.write.WritableSheet;
import jxl.write.WritableWorkbook;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class StackoverflowQuestion51577491 {

    private static final int URL_COLUMN = 2; // Column C
    private static final int PRICE_COLUMN = 3; // Column D

    public static void main(final String[] args) throws Exception {

        // open worksheet with URLs
        Workbook originalWorkbook = Workbook.getWorkbook(new File("O:/original.xls"));
        // create editable copy
        WritableWorkbook workbook = Workbook.createWorkbook(new File("O:/updated.xls"), originalWorkbook);
        // close read-only workbook as it's not needed anymore
        originalWorkbook.close();
        // get first available sheet
        WritableSheet sheet = workbook.getSheet(0);
        // skip title row 0
        int currentRow = 1;
        Cell cell;
        // iterate each cell from column C until we find an empty one
        while (!(cell = sheet.getCell(URL_COLUMN, currentRow)).getType().equals(CellType.EMPTY)) {
            // raed cell contents
            String url = cell.getContents();
            System.out.println("parsing URL: " + url);
            // parse and get the price
            String price = parseUrlWithJsoupAndGetProductPrice(url);
            System.out.println("found price: " + price);
            // create new cell with price
            Label cellWithPrice = new Label(PRICE_COLUMN, currentRow, price);
            sheet.addCell(cellWithPrice);
            // go to next row
            currentRow++;
        }
        // save and close file
        workbook.write();
        workbook.close();
    }

    private static String parseUrlWithJsoupAndGetProductPrice(String url) throws IOException {
        // download page and parse it to Document
        Document doc = Jsoup.connect(url).get();
        // get the price from html
        return doc.select("#priceblock_ourprice").text();
    }
}

之前: 后:

这篇关于通过电子表格将JSOUP的URL导入到Scrape的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆