在Javascript中解析HTML的最佳方法 [英] Best way to parse HTML in Javascript

查看:173
本文介绍了在Javascript中解析HTML的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在学习RegExp时遇到了很多麻烦,并提出了一个很好的算法来做到这一点。我有这个HTML字符串,我需要解析。请注意,当我解析它时,它仍然是一个字符串对象,而不是浏览器上的HTML,因为我需要在它到达之前解析它。 HTML如下所示:

I am having a lot of trouble learning RegExp and coming up with a good algorithm to do this. I have this string of HTML that I need to parse. Note that when I am parsing it, it is still a string object and not yet HTML on the browser as I need to parse it before it gets there. The HTML looks like this:

<html>
  <head>
    <title>Geoserver GetFeatureInfo output</title>
  </head>
  <style type="text/css">
    table.featureInfo, table.featureInfo td, table.featureInfo th {
        border:1px solid #ddd;
        border-collapse:collapse;
        margin:0;
        padding:0;
        font-size: 90%;
        padding:.2em .1em;
    }
    table.featureInfo th {
        padding:.2em .2em;
        font-weight:bold;
        background:#eee;
    }
    table.featureInfo td{
        background:#fff;
    }
    table.featureInfo tr.odd td{
        background:#eee;
    }
    table.featureInfo caption{
        text-align:left;
        font-size:100%;
        font-weight:bold;
        text-transform:uppercase;
        padding:.2em .2em;
    }
  </style>

  <body>
    <table class="featureInfo2">
    <tr>
        <th class="dataLayer" colspan="5">Tibetan Villages</th>
    </tr>
    <!-- EOF Data Layer -->
    <tr class="dataHeaders">
        <th>ID</th>
        <th>Latitude</th>
        <th>Longitude</th>
        <th>Place Name</th>
        <th>English Translation</th>
    </tr>
    <!-- EOF Data Headers -->
    <!-- Data -->
    <tr>
    <!-- Feature Info Data -->
        <td>3394</td>
        <td>29.1</td>
        <td>93.15</td>
        <td>བསྡམས་གྲོང་ཚོ།</td>
        <td>Dam Drongtso </td>
    </tr>
    <!-- EOF Feature Info Data -->
    <!-- End Data -->
    </table>
    <br/>
  </body>
</html>

我需要这样:

3394,
29.1,
93.15,
བསྡམས་གྲོང་ཚོ།,
Dam Drongtso

基本上是一个数组...如果它根据其字段标题匹配更好,以及它们以某种方式来自哪个表,看起来像这样:

Basically an array...even better if it matches according to its field headers and from which table they are somehow, which look like this:

Tibetan Villages

ID
Latitude
Longitude
Place Name
English Translation

查找JavaScript不支持精彩映射是一个无赖我有我想要的工作。然而,它是非常非常硬编码,我想我应该使用RegExp来更好地处理这个问题。不幸的是,我有一个非常艰难的时间:(。这是我解析我的字符串的函数(非常丑陋的IMO):

Finding out JavaScript does not support wonderful mapping was a bummer and I have what I want working already. However it is VERY VERY hard coded and I'm thinking I should probably use RegExp to handle this better. Unfortunately I am having a real tough time :(. Here is my function to parse my string (very ugly IMO):

    function parseHTML(html){

    //Getting the layer name
    alert(html);
    //Lousy attempt at RegExp
    var somestring = html.replace('/m//\<html\>+\<body\>//m/',' ');
    alert(somestring);
    var startPos = html.indexOf('<th class="dataLayer" colspan="5">');
    var length = ('<th class="dataLayer" colspan="5">').length;
    var endPos = html.indexOf('</th></tr><!-- EOF Data Layer -->');
    var dataLayer = html.substring(startPos + length, endPos);

    //Getting the data headers
    startPos = html.indexOf('<tr class="dataHeaders">');
    length = ('<tr class="dataHeaders">').length;
    endPos = html.indexOf('</tr><!-- EOF Data Headers -->');
    var newString = html.substring(startPos + length, endPos);
    newString = newString.replace(/<th>/g, '');
    newString = newString.substring(0, newString.lastIndexOf('</th>'));
    var featureInfoHeaders = new Array();
    featureInfoHeaders = newString.split('</th>');

    //Getting the data
    startPos = html.indexOf('<!-- Data -->');
    length = ('<!-- Data -->').length;
    endPos = html.indexOf('<!-- End Data -->');
    newString = html.substring(startPos + length, endPos);
    newString = newString.substring(0, newString.lastIndexOf('</tr><!-- EOF Feature Info Data -->'));
    var featureInfoData = new Array();
    featureInfoData = newString.split('</tr><!-- EOF Feature Info Data -->');

    for(var s = 0; s < featureInfoData.length; s++){
        startPos = featureInfoData[s].indexOf('<!-- Feature Info Data -->');
        length = ('<!-- Feature Info Data -->').length;
        endPos = featureInfoData[s].lastIndexOf('</td>');
        featureInfoData[s] = featureInfoData[s].substring(startPos + length, endPos);
        featureInfoData[s] = featureInfoData[s].replace(/<td>/g, '');
        featureInfoData[s] = featureInfoData[s].split('</td>');
    }//end for

    alert(featureInfoData);

    //Put all the feature info in one array
    var featureInfo = new Array();
    var len = featureInfoData.length;
    for(var j = 0; j < len; j++){
        featureInfo[j] = new Object();
        featureInfo[j].id = featureInfoData[j][0];
        featureInfo[j].latitude = featureInfoData[j][1];
        featureInfo[j].longitude = featureInfoData[j][2];
        featureInfo[j].placeName = featureInfoData[j][3];
        featureInfo[j].translation = featureInfoData[j][4];
        }//end for 

    //This can be ignored for now...
        var string = redesignHTML(featureInfoHeaders, featureInfo);
        return string;

    }//end parseHTML

因此你可以看到内容在那个字符串发生变化的过程中,我的代码将被严重破坏。我想尽可能地避免这种情况,并尝试编写更好的代码。我感谢您给我的所有帮助和建议。

So as you can see if the content in that string ever changes, my code will be horribly broken. I want to avoid that as much as possible and try to write better code. I appreciate all the help and advice you can give me.

推荐答案

您可以使用 jQuery 轻松遍历DOM并自动创建具有该结构的对象。

You can use jQuery to easily traverse the DOM and create an object with the structure automatically.

var $dom = $('<html>').html(the_html_string_variable_goes_here);
var featureInfo = {};

$('table:has(.dataLayer)', $dom).each(function(){
    var $tbl = $(this);
    var section = $tbl.find('.dataLayer').text();
    var obj = [];
    var $structure = $tbl.find('.dataHeaders');
    var structure = $structure.find('th').map(function(){return $(this).text().toLowerCase();});
    var $datarows= $structure.nextAll('tr');
    $datarows.each(function(i){
        obj[i] = {};
        $(this).find('td').each(function(index,element){
            obj[i][structure[index]] = $(element).text();
        });
    });
    featureInfo[section] = obj;
});

工作演示

代码可以使用内部具有不同结构的多个表。 。还有每个表中的多个数据行..

The code can work with multiple tables with different structures inside.. and also multiple data rows inside each table..

featureInfo将保存最终的结构和数据,并且可以像

The featureInfo will hold the final structure and data, and can be accessed like

alert( featureInfo['Tibetan Villages'][0]['English Translation'] );

alert( featureInfo['Tibetan Villages'][0].id );

这篇关于在Javascript中解析HTML的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆