如何将本地html文件加载到Jsoup中? [英] How do I load a local html file into Jsoup?

查看:124
本文介绍了如何将本地html文件加载到Jsoup中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎无法使用Jsoup库加载本地html文件。或者至少它似乎没有认识到它。我对本地文件中的确切html进行了硬编码(如var'html'),当我切换到该文件而不是文件输入时,代码完美地工作。但是该文件在两种情况下均被读取。

I can't seem to load in a local html file, using the Jsoup library. Or at the very least it doesn't seem to be recognising it. I hardcoded the exact html in the local file (as the var 'html') and when I switch to that instead of a file input the code works perfectly. But the file is read on both occasions.

import java.io.File;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class FileHtmlParser{

public String input;


//constructor
public FileHtmlParser(String inputFile){input = inputFile;}


//methods
public FileHtmlParser execute(){

    File file = new File(input);
    System.out.println("The file can be read: " + file.canRead());

    String html = "<html><head><title>First parse</title><meta>106</meta> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" /></head>"
              + "<body><p>Parsed HTML into a doc.</p>" +
              "" +
              "<div id=\"navbar\">this is the div</div></body></html>";
            Document doc = Jsoup.parseBodyFragment(input);




    Elements content = doc.getElementsByTag("div");
    if(content.hasText()){System.out.println("result is " + content.outerHtml());}
    else System.out.println("nothing!");


    return this;
}

}/*endOfClass*/

结果为:

Document doc = Jsoup.parseBodyFragment(html)

The file can be read: true
result is <div id="navbar">
this is the div
</div>

结果为:

Document doc = Jsoup.parseBodyFragment(input )

Result when:
Document doc = Jsoup.parseBodyFragment(input)

The file can be read: true
nothing!


推荐答案

你的错误是假设 Jsoup.parseBodyFragment()知道你是否传递一个包含html标记或包含html标记的字符串的文件名。

Your mistake is in assuming that Jsoup.parseBodyFragment() knows whether you're passing it a filename that contains html markup or a string that contains the html markup.

Jsoup.parseBodyFragment(input)预计 input 是一个 String 包含html标记,而不是文件名。

Jsoup.parseBodyFragment(input) expects that input is a String that contains html markup, not a filename.

要从文件中解析它,请使用 Jsoup.parse(File in,String charsetName )方法改为:

To ask it to parse from a file use the Jsoup.parse(File in, String charsetName) method instead:

File in = new File(input);
Document doc = Jsoup.parse(in, null);

这篇关于如何将本地html文件加载到Jsoup中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆