在JAVA中使用JSOUP从HTML提取CSS样式 [英] Extract CSS Styles from HTML using JSOUP in JAVA

查看:1413
本文介绍了在JAVA中使用JSOUP从HTML提取CSS样式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以使用Java中的Jsoup来帮助从HTML中提取CSS样式. 例如在html下面,我要提取 .ft00和.ft01

Can anyone help with extraction of CSS styles from HTML using Jsoup in Java. For e.g in below html i want to extract .ft00 and .ft01

<HTML>
<HEAD>
<TITLE>Page 1</TITLE>

<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<DIV style="position:relative;width:931;height:1243;">
<STYLE type="text/css">
<!--
    .ft00{font-size:11px;font-family:Times;color:#ffffff;}
    .ft01{font-size:11px;font-family:Times;color:#ffffff;}
-->
</STYLE>
</HEAD>
</HTML>

推荐答案

如果样式已嵌入到元素中,则只需使用.attr("style").

If the style is embedded in your Element you just have to use .attr("style").

JSoup不是HTML渲染器,它只是HTML解析器,因此您将不得不从检索到的<style>标签html内容中解析内容.您可以为此使用一个简单的正则表达式;但并非在所有情况下都有效.您可能要使用CSS解析器来完成此任务.

JSoup is not a Html renderer, it is just a HTML parser, so you will have to parse the content from the retrieved <style> tag html content. You can use a simple regex for this; but it won't work in all cases. You may want to use a CSS parser for this task.

public class Test {
    public static void main(String[] args) throws Exception {
        String html = "<HTML>\n" +
                "<HEAD>\n"+
                "<TITLE>Page 1</TITLE>\n"+
                "<META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n"+
                "<DIV style=\"position:relative;width:931;height:1243;\">\n"+
                "<STYLE type=\"text/css\">\n"+
                "<!--\n"+
                "    .ft00{font-size:11px;font-family:Times;color:#ffffff;}\n"+
                "    .ft01{font-size:11px;font-family:Times;color:#ffffff;}\n"+
                "-->\n"+
                "</STYLE>\n"+
                "</HEAD>\n"+
                "</HTML>";

        Document doc = Jsoup.parse(html);
        Element style = doc.select("style").first();
        Matcher cssMatcher = Pattern.compile("[.](\\w+)\\s*[{]([^}]+)[}]").matcher(style.html());
        while (cssMatcher.find()) {
            System.out.println("Style `" + cssMatcher.group(1) + "`: " + cssMatcher.group(2));
        }
    }
}

将输出:

Style `ft00`: font-size:11px;font-family:Times;color:#ffffff;
Style `ft01`: font-size:11px;font-family:Times;color:#ffffff;

这篇关于在JAVA中使用JSOUP从HTML提取CSS样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆