如何提高这一点？在另一个域中获取网页的一部分 [英] How to Enhance This? Get a Part of a Web Page in Another Domain

查看：66 发布时间：2018/6/26 20:42:29 php jquery html domdocument

本文介绍了如何提高这一点？在另一个域中获取网页的一部分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经这样做了：

 < html> 
< head> 
< script src =// ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js\"> ;</script> 
< script> 
 $（document）.ready（
 function（）
 {
 $（body）。html（$（＃HomePageTabs_cont_3）。html（））; 
} 
）; 
< / script> 
< / head> 
< body> 
<？php 
 echo file_get_contents（http://www.bankasya.com.tr/index.jsp）; 
？> 
 
< / body> 
< / html>

当我使用Firebug检查我的页面时，它会提供无数丢失的文件（图像，css文件， js文件等）错误。我想只是网页的一部分而已。 此代码符合我的要求。但是我想知道是否有更好的方法。

编辑：

该页面做我需要的。我不需要所有的内容。所以 iframe 对我来说毫无用处。我只想要div ＃HomePageTabs_cont_3 的原始数据。

解决方案

你最好的选择是PHP服务器端解析。我已经写了一小段代码向你展示了如何使用 DOMDocument （可能还有 tidy >如果你的服务器拥有它，以排除所有形式错误的XHTML foos）。

$ b 警告：输出UTF-8。你可以在DOMDocument的构造函数中改变它

注意事项2 ： -8不是iso-8859-9。
header（content-键入：text / html; charset = utf-8）; $ data = file_get_contents（http://www.bankasya.com.tr/index.jsp）; //清理它 if（class_exists（tidy））{ $ dataTidy = new tidy（）; $ dataTidy-> parseString（$ data， array（ input-encoding=>iso-8859-9， output-encoding= >iso-8859-9， clean=> 1， input-xml=> true， output-xml=> wrap=> 0， anchor-as-name=> false ））; $ dataTidy-> cleanRepair（）; $ data =（string）$ dataTidy; } else { $ do = true; while（$ do）{ $ start = stripos（$ data，'< script'）; $ stop = stripos（$ data，'< / script>'）;如果（（is_numeric（$ start））&&（is_numeric（$ stop）））{ $ s = substr（$ data，$ start，$ stop- $ start）; 。 $ data = substr（$ data，0，$ start）.substr（$ data，（$ stop + strlen（'< / script>'）））; } else { $ do = false; } } //破解它？ $ data = str_replace（& nbsp;，，$ data）; //修复任何需要自闭标记的元素 if（preg_match_all（/<（link | img）（[^>] +）> / is，$ data ，$ mt，PREG_SET_ORDER））{ foreach（$ mt as $ v）{ if（substr（$ v [2]， - 1）！=/）{ $ data = str_replace（$ v [0]，<。$ v [1]。$ v [2]。/>，$ data）; $ b // Barf out in line JS $ data = preg_replace（/ javascript：[^;] + / is，＃，$数据）; // Barf out noscripts $ data = preg_replace（＃< noscript>（。+？）< / noscript> #is，，$ data）; // Muppets。格式错误的评论=另外一个正则表达式，他们可以学习编写正确的HTML ... $ data = preg_replace（＃<！ - （。*？） - ！？> #is，，$数据）; } $ DOM = new \DOMDocument（1.0，utf-8）; $ DOM-> recover = true; 函数error_callback_xmlfunction（$ errno，$ errstr）{抛出新的异常（$ errstr）; } $ old = set_error_handler（error_callback_xmlfunction）; //抛出所有的XML命名空间（如果有的话） $ data = preg_replace（＃xmlns = [\\']？（[^ \\'] +）（\\（substr（$ data，0，5，$））！==<？xml）？'<？xml version =1.0encoding =utf-8？>'：）。$ data）; } catch（Exception $（$（$（$（$）$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $'$' ）; restore_error_handler（）; error_reporting（E_ALL）; $ DOM-> queryEntities = true; $ xpath = new \DOMXPath（$ DOM）; echo $ DOM-> saveXML（$ xpath-> query（// div [@ id = \ HomePageTabs_cont_3 \]） - > item（0））;

$ b

提取数据

如果我们有 tidy ，用它清理HTML

创建一个新的 DOMDocument 并加载我们的文档（ $ dataT
创建一个XPath请求路径使用XPath向所有div请求id设置为我们想要的，获取集合的第一项（ - > item（0），这将是一个 DOMElement ）并请求DOM输出其XML内容（包括标签本身）希望这是你的'重新寻找...尽管您可能想将它包装在一个函数中。编辑忘记提及： http://rescrape.it/rs.php 为实际脚本输出！编辑2 更正，该网站不是W3C有效的，因此，您需要 tidy ，或者在处理之前将一组正则表达式应用于输入。我会看看我是否可以制定一套解决不一致的问题。编辑3 为我们所有那些没有 tidy 的人增加了一个修复程序。编辑4 无法抗拒。如果你真的喜欢这些值而不是表格，可以使用它来代替echo： $ d = new stdClass（）; $ rows = $ xpath-> query（// div [@ id = \HomePageTabs_cont_3\] // tr）; $ rc = $ rows->长度; for $（$ i = 1; $ i <$ rc-1; $ i ++）{ $ cols = $ xpath->查询（$ rows-> item（$ i） - > ; getNodePath（） / TD）; $ d-> {$ cols-> item（0） - > textContent} = array（（（float）$ cols-> item（1） - > textContent），（（float）$ cols-> item（2） - > textContent））; } 我不了解你，但对我而言，格式不正确的表格。（Welp，需要一段时间才能写出） I have made this: <html> <head> <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script> <script> $(document).ready( function() { $("body").html($("#HomePageTabs_cont_3").html()); } ); </script> </head> <body> <?php echo file_get_contents("http://www.bankasya.com.tr/index.jsp"); ?> </body> </html> When I check my page with Firebug, It gives countless "missing files" (images, css files, js files, etc.) errors. I want to have just a part of the page not of all. This code does what I want. But I am wondering if there is a better way. EDIT: The page does what I need. I do not need all the contents. So iframe is useless to me. I just want the raw data of the div #HomePageTabs_cont_3. 解决方案 Your best bet is PHP server-side parsing. I have written a small snippet to show you how to do this using DOMDocument (and possibly tidyif your server has it, to barf out all the mal-formed XHTML foos). Caveat: outputs UTF-8. You can change this in the constructor of DOMDocument Caveat 2: WILL barf out if its input is neither utf-8 not iso-8859-9. The current page's charset is iso-8859-9 and I see no reason why they would change this. header("content-type: text/html; charset=utf-8"); $data = file_get_contents("http://www.bankasya.com.tr/index.jsp"); // Clean it up if (class_exists("tidy")) { $dataTidy = new tidy(); $dataTidy->parseString($data, array( "input-encoding" => "iso-8859-9", "output-encoding" => "iso-8859-9", "clean" => 1, "input-xml" => true, "output-xml" => true, "wrap" => 0, "anchor-as-name" => false ) ); $dataTidy->cleanRepair(); $data = (string)$dataTidy; } else { $do = true; while ($do) { $start = stripos($data,'<script'); $stop = stripos($data,'</script>'); if ((is_numeric($start))&&(is_numeric($stop))) { $s = substr($data,$start,$stop-$start); $data = substr($data,0,$start).substr($data,($stop+strlen('</script>'))); } else { $do = false; } } // nbsp breaks it? $data = str_replace(" "," ",$data); // Fixes for any element that requires a self-closing tag if (preg_match_all("/<(link|img)([^>]+)>/is",$data,$mt,PREG_SET_ORDER)) { foreach ($mt as $v) { if (substr($v[2],-1) != "/") { $data = str_replace($v[0],"<".$v[1].$v[2]."/>",$data); } } } // Barf out the inline JS $data = preg_replace("/javascript:[^;]+/is","#",$data); // Barf out the noscripts $data = preg_replace("#<noscript>(.+?)</noscript>#is","",$data); // Muppets. Malformed comment = one more regexp when they could just learn to write proper HTML... $data = preg_replace("#<!--(.*?)--!?>#is","",$data); } $DOM = new \DOMDocument("1.0","utf-8"); $DOM->recover = true; function error_callback_xmlfunction($errno, $errstr) { throw new Exception($errstr); } $old = set_error_handler("error_callback_xmlfunction"); // Throw out all the XML namespaces (if any) $data = preg_replace("#xmlns=[\"\']?([^\"\']+)[\"\']?#is","",(string)$data); try { $DOM->loadXML(((substr($data, 0, 5) !== "<?xml") ? '<?xml version="1.0" encoding="utf-8"?>' : "").$data); } catch (Exception $e) { $DOM->loadXML(((substr($data, 0, 5) !== "<?xml") ? '<?xml version="1.0" encoding="iso-8859-9"?>' : "").$data); } restore_error_handler(); error_reporting(E_ALL); $DOM->substituteEntities = true; $xpath = new \DOMXPath($DOM); echo $DOM->saveXML($xpath->query("//div[@id=\"HomePageTabs_cont_3\"]")->item(0)); In order of appearance: Fetch the data If we have tidy, sanitize HTML with it Create a new DOMDocument and load our document ((string)$dataTidy is a short-hand tidy getter) Create an XPath request path Use XPath to request all divs with id set as what we want, get the first item of the collection (->item(0), which will be a DOMElement) and request for the DOM to output its XML content (including the tag itself) Hope it is what you're looking for... Though you might want to wrap it in a function. Edit Forgot to mention: http://rescrape.it/rs.php for the actual script output! Edit 2 Correction, that site is not W3C-valid, and therefore, you'll either need to tidy it up or apply a set of regular expressions to the input before processing. I'm going to see if I can formulate a set to barf out the inconsistencies. Edit 3 Added a fix for all those of us who do not have tidy. Edit 4 Couldn't resist. If you'd actually like the values rather than the table, use this instead of the echo: $d = new stdClass(); $rows = $xpath->query("//div[@id=\"HomePageTabs_cont_3\"]//tr"); $rc = $rows->length; for ($i = 1; $i < $rc-1; $i++) { $cols = $xpath->query($rows->item($i)->getNodePath()."/td"); $d->{$cols->item(0)->textContent} = array( ((float)$cols->item(1)->textContent), ((float)$cols->item(2)->textContent) ); } I don't know about you, but for me, data works better than malformed tables. (Welp, that one took a while to write) 这篇关于如何提高这一点？在另一个域中获取网页的一部分的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

编辑3

编辑4

Edit

Edit 2

Edit 3

Edit 4

如何提高这一点？在另一个域中获取网页的一部分 [英] How to Enhance This? Get a Part of a Web Page in Another Domain

问题描述

编辑

编辑2

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

如何提高这一点？在另一个域中获取网页的一部分 [英] How to Enhance This? Get a Part of a Web Page in Another Domain

问题描述

编辑

编辑2

编辑3

编辑4

Edit

Edit 2

Edit 3

Edit 4

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭