HTML分析 - 从一个div里面的表中获取数据? [英] HTML Parsing - Get data from a table inside a div?

查看:116
本文介绍了HTML分析 - 从一个div里面的表中获取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于HTML解析/抓取的整个想法,我相对较新。我希望能来这里得到我需要的帮助!

基本上我期望做的(我认为)是指定页面的URL我希望从中获取数据。在这种情况下 - http://www.epgpweb.com/guild/us/Caelestrasz/ Crimson /



从那里,我想抓住div class = snapshot_table中的table class = listing。

然后我希望将该表嵌入到我自己的页面中,并在更新原始内容时更新它。



我已经阅读了其他一些在Google和Stackoverflow上的帖子,我也看了Nettuts +上的一个教程,但它似乎有点太多,一次采取。



希望有人在这里可以帮助我,并尽可能简单:)



干杯,



- 编辑 -



当前代码截至上午11:22(GMT + 10)

 <?php 
#不要忘记库
include('simple_html_dom.php');
?>
< html>
< / head>
< body>
<?php
$ html = file_get_html('http://www.epgpweb.com/guild/us/Caelestrasz/Crimson/');
$ table = $ html-> find('#snapshot_table table.listing');
print_r($ table);
?>
< / body>
< / html>


解决方案

我想我已经开始工作了,很多! :)

 <?php 
//获取当前时间戳
$ url ='http: //www.epgpweb.com/api/snapshot/us/Caelestrasz/Crimson';
$ url = file_get_contents($ url);
$ url = substr($ url,-12,10);

//根据时间戳获取成员资料
$ url ='http://www.epgpweb.com/api/snapshot/us/Caelestrasz/Crimson/'.$url ;
$ url = file_get_contents($ url);

//将unicode转换为html实体,就像我在这里找到的那​​样:http://stackoverflow.com/questions/2934563/how-to-decode-unicode-escape-sequences-like-u00ed- to-proper-utf-8-encoded-char
函数replace_unicode_escape_sequence($ match){
return mb_convert_encoding(pack('H *',$ match [1]),'UTF-8',' UCS-2BE');
}
$ url = preg_replace_callback('/ \\\\\u([0-9a-f] {4})/ i','replace_unicode_escape_sequence',$ url);

//擦除/替换不重要的部分,将数据放入数组
函数擦除($ a){
global $ url;
$ url = explode($ a,$ url);
$ url = implode(,$ url);
}
函数替换($ a,$ b){
global $ url;
$ url = explode($ a,$ url);
$ url = implode($ b,$ url);
}
replace([[,;);
replace(]],;);
replace(],,;);
erase('[');
erase(''');
replace(:,,);
$ url = explode(;,$ url);

//丢失前端和结束位,并维护成员数据
array_shift($ url);
array_pop($ url);

//将数据放入array
foreach($ url as $ k => $ v){
$ v = explode(,,$ v);
foreach($ v as $ k2 => $ v2){
$ data [$ k] [$ k2] = $ v2;
}
$ pr = round(intval($ data [$ k] [1])/ intval ($ data [$ k] [2]),3);
$ pr = str_pad($ pr,5,0,STR_PAD_RIGHT);
$ pr = substr($ pr,0, 5);
$ data [$ k] [3] = $ pr;
}

//按PR编号排序数组
function compare($ x ,$ y)
{
if($ x [3] == $ y [3])
return 0;
else if($ x [3]> $ y [3])
return -1;
else
return 1;
}
usort($ data,'compare');

//将数据输出到表中
echo< table>< tbody>< tr>< th>< th>< th>< th> GP< /第><的第i; PR< /第>< / TR& ;
foreach($ data as $ k => $ v){
echo< tr>;
foreach($ v as $ v2){
echo< td>。$ v2。< / td>;
}
echo< / tr>;
}
echo< / tbody>< / table>;
?>


I am relatively new to the whole idea for HTML parsing/scraping. I was hoping that I could come here to get the help that I need!

Basically what I am looking to do (i think), is specify the url of the page I wish to grab the data from. In this case - http://www.epgpweb.com/guild/us/Caelestrasz/Crimson/

From there, I want to grab the table class=listing in the div id=snapshot_table.

I then wish to embed that table onto my own page and have it update when the original content is updated.

I have read a few of the other posts on Google and Stackoverflow, I also had a look at a tutorial on Nettuts+ but it just seemed to be a bit too much to take in at once.

Hopefully someone here can help me out and make this as simple as possible :)

Cheers,

Mat

--Edit--

Current code as of 11:22am (GMT+10)

<?php
    # don't forget the library
    include('simple_html_dom.php');
?>
<html>
</head>
<body>
<?php
    $html = file_get_html('http://www.epgpweb.com/guild/us/Caelestrasz/Crimson/');
    $table = $html->find('#snapshot_table table.listing');
    print_r($table);
?>
</body>
</html>

解决方案

I think I got it to work, and I learned a lot! :)

<?php
//Get the current timestamp
$url = 'http://www.epgpweb.com/api/snapshot/us/Caelestrasz/Crimson';
$url = file_get_contents($url);
$url = substr($url,-12,10); 

//Get the member data based on the timestamp
$url = 'http://www.epgpweb.com/api/snapshot/us/Caelestrasz/Crimson/'.$url;
$url = file_get_contents($url);

//Convert the unicode to html entities, as I found here: http://stackoverflow.com/questions/2934563/how-to-decode-unicode-escape-sequences-like-u00ed-to-proper-utf-8-encoded-char
function replace_unicode_escape_sequence($match) {
    return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}
$url = preg_replace_callback('/\\\\u([0-9a-f]{4})/i', 'replace_unicode_escape_sequence', $url);

//erase/replace the insignificant parts, to put the data into an array
function erase($a){
    global $url;
    $url = explode($a,$url);
    $url = implode("",$url);
}
function replace($a,$b){
    global $url;
    $url = explode($a,$url);
    $url = implode($b,$url);    
}
replace("[[",";");
replace("]]",";");
replace("],",";");
erase('[');
erase('"');
replace(":",",");
$url = explode(";", $url);

//lose the front and end bits, and maintain the member data
array_shift($url);
array_pop($url);

//put the data into an array
foreach($url as $k=>$v){
    $v = explode(",",$v);
    foreach($v as $k2=>$v2){
        $data[$k][$k2] = $v2;
    }
    $pr = round(intval($data[$k][1]) / intval($data[$k][2]),3);
    $pr = str_pad($pr,5,"0",STR_PAD_RIGHT);
    $pr = substr($pr, 0, 5);
    $data[$k][3] = $pr;
}

//sort the array by PR number
function compare($x, $y)
{
if ( $x[3] == $y[3] )
 return 0;
else if ( $x[3] > $y[3] )
 return -1;
else
 return 1;
}
usort($data, 'compare');

//output the data into a table
echo "<table><tbody><tr><th>Member</th><th>EP</th><th>GP</th><th>PR</th></tr>";
foreach($data as $k=>$v){
    echo "<tr>";
    foreach($v as $v2){ 
        echo "<td>".$v2."</td>";
    }
    echo "</tr>";
}
echo "</tbody></table>";
?>

这篇关于HTML分析 - 从一个div里面的表中获取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆