加载时间:用PHP的DOMDocument或正则表达式解析HTML是否更快? [英] Load time: is it quicker to parse HTML with PHP's DOMDocument or with Regular Expressions?

查看:143
本文介绍了加载时间:用PHP的DOMDocument或正则表达式解析HTML是否更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将Flickr帐户中的图片拉到我的网站,我使用了大约九行代码来创建一个preg_match_all函数来拉取图片。



我已经阅读过几次,最好通过DOM解析HTML。



个人而言,我发现通过DOM解析HTML更复杂。我做了一个类似的功能,用PHP的DOMDocument来绘制图像,它大约是22行代码。花了一段时间来创建,我不知道有什么好处。



该页面大约在每个代码的同一时间加载,所以我不确定为什么我会使用DOMDocument。



DOMDocument的工作速度比preg_match_all要快吗?



我会告诉你我的代码,如果你有兴趣(你可以看到DOMDocument代码有多长):

  /这里是URL 
$ flickrGallery ='http://www.flickr.com/photos/***/collections/***/';

//以下是DOMDocument方法
$ flickr = new DOMDocument();
$ doc-> validateOnParse = true;
$ flickr-> loadHTMLFile($ flickrGallery);
$ elements = $ flickr-> getElementById('ViewCollection') - > getElementsByTagName('div');
$ flickr = array(); ($ i = 0; $ i <$ elements-> length; $ i ++){
if($ elements-> item($ i) - > hasAttribute('class' &&$ element-> item($ i) - > getAttribute('class')=='setLinkDiv'){
$ flickr [] = array(
'href'=> ; $ element-> item($ i) - > getElementsByTagName('a') - > item(0) - > getAttribute('href'),
'src'=> $ elements- > item($ i) - > getElementsByTagName('img') - > item(0) - > getAttribute('src'),
'title'=> $ elements-> item $ i) - > getElementsByTagName('img') - > item(0) - > getAttribute('alt')
);
}
}
$ elements = NULL;
foreach($ flickr as $ k => $ v){
$ setQuery = explode(/,$ flickr [$ k] ['href']);
$ setQuery = $ setQuery [4];
echo'< a href =?set ='。$ setQuery。'>< img src ='$ flickr [$ k] ['src']。'title = $ flickr [$ k] ['title']。'width = 75 height = 75 />< / a>';
}
$ flickr = NULL;

// preg_match_all代码低于

$ sets = file_get_contents($ flickrGallery);
preg_match_all('/(class =setLinkhref =(。*?)。*?class =setThumbsrc =(。*? ))+ /秒,$套,$套,PREG_SET_ORDER);
foreach($ sets as $ k => $ v){
$ setQuery = explode(/,$ sets [$ k] [2]);
$ setQuery = $ setQuery [4];
echo'< a href =?set ='。$ setQuery。'>< img src ='$ set [$ k] [3]。title =' set [$ k] [4]。'width = 75 height = 75 />< / a>';
}
$ sets = NULL;


解决方案

如果您愿意牺牲正确性的速度,然后继续尝试用正则表达式滚动自己的解析器。



你说个人而言,我发现通过DOM解析HTML更复杂。您是否正在优化结果的正确性,或者您编写代码有多么容易?



如果您想要的是速度和代码不复杂,为什么不只是使用这个:

  $ array_of_photos = Array('booger.jpg','aunt-martha-on-a-horse.png '); 

或者也许只是

  $ array_of_photos = Array(); 

那些运行在不断的时间,他们很容易理解。没问题,对吧?



这是什么?你想要准确的结果?然后不用正则表达式解析HTML



<最后,当您使用像DOM这样的解析器时,您正在使用一段经过很好测试和调试的代码。当你编写自己的正则表达式来进行解析时,你正在使用你要编写,测试和调试自己的代码。为什么你不想使用许多人已经使用了多年的工具?你觉得你可以在飞行中做得更好吗?


I'm pulling images from my Flickr account to my website, and I had used about nine lines of code to create a preg_match_all function that would pull the images.

I've read several times that it is better to parse HTML through DOM.

Personally, I've found it more complicated to parse HTML through DOM. I made up a similar function to pull the images with PHP's DOMDocument, and it's about 22 lines of code. It took awhile to create, and I'm not sure what the benefit was.

The page loads at about the same time for each code, so I'm not sure why I would use DOMDocument.

Does DOMDocument work faster than preg_match_all?

I'll show you my code, if you're interested (you can see how lengthy the DOMDocument code is):

//here's the URL
$flickrGallery = 'http://www.flickr.com/photos/***/collections/***/';

//below is the DOMDocument method
$flickr = new DOMDocument();
$doc->validateOnParse = true;
$flickr->loadHTMLFile($flickrGallery);
$elements = $flickr->getElementById('ViewCollection')->getElementsByTagName('div');
$flickr = array();
for($i=0;$i<$elements->length;$i++){
    if($elements->item($i)->hasAttribute('class')&&$elements->item($i)->getAttribute('class')=='setLinkDiv'){
        $flickr[] = array(
                          'href' => $elements->item($i)->getElementsByTagName('a')->item(0)->getAttribute('href'), 
                          'src' => $elements->item($i)->getElementsByTagName('img')->item(0)->getAttribute('src'), 
                          'title' => $elements->item($i)->getElementsByTagName('img')->item(0)->getAttribute('alt')
                          );
    }
}
$elements = NULL;
foreach($flickr as $k=>$v){
    $setQuery = explode("/",$flickr[$k]['href']);
    $setQuery = $setQuery[4];
    echo '<a href="?set='.$setQuery.'"><img src="'.$flickr[$k]['src'].'" title="'.$flickr[$k]['title'].'" width=75 height=75 /></a>';
}
$flickr = NULL;

//preg_match_all code is below

$sets = file_get_contents($flickrGallery);
preg_match_all('/(class="setLink" href="(.*?)".*?class="setThumb" src="(.*?)".*?alt="(.*?)")+/s',$sets,$sets,PREG_SET_ORDER);
foreach($sets as $k=>$v){
    $setQuery = explode("/",$sets[$k][2]);
    $setQuery = $setQuery[4];
echo '<a href="?set='.$setQuery.'"><img src="'.$sets[$k][3].'" title="'.$sets[$k][4].'" width=75 height=75 /></a>';
}
$sets = NULL;

解决方案

If you're willing to sacrifice speed for correctness, then go ahead and try to roll your own parser with regular expressions.

You say "Personally, I've found it more complicated to parse HTML through DOM." Are you optimizing for correctness of results, or how easy it is for you to write the code?

If all you want is speed and code that's not complicated, why not just use this:

$array_of_photos = Array( 'booger.jpg', 'aunt-martha-on-a-horse.png' );

or maybe just

$array_of_photos = Array();

Those run in constant time, and they're easy to understand. No problem, right?

What's that? You want accurate results? Then don't parse HTML with regular expressions.

Finally, when you're working with a parser like DOM, you're working with a piece of code that has been well-tested and debugged for years. When you're writing your own regular expressions to do the parsing, you're working with code that you're going to have to write, test and debug yourself. Why would you not want to work with the tools that many people have been using for many years? Do you think you can do a better job yourself on the fly?

这篇关于加载时间:用PHP的DOMDocument或正则表达式解析HTML是否更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆