将大量网页内容转化为数组(PHP) [英] Content from large number of web pages into array (PHP)

查看:207
本文介绍了将大量网页内容转化为数组(PHP)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含748个URL的数组($ x):s。现在,我想从每个页面获取一个特定的部分,并把所有这些部分放到一个新的数组中。也就是说,一个包含748个文本的数组,每个文本都来自数组$ x中定义的不同的URL。



这是我到目前为止的代码:

  foreach($ x as $ row){
$ contents = file_get_contents($ row);

$ regex ='/delimiter_start(.*?)delimiter_end/s';
preg_match_all($ regex,$ contents,$ output);





$如果我var_dump $输出我得到一个奇怪的数组,无休止地保持循环的内容,直到我在浏览器中按停止。数组看起来像这样:

  array(2){
[0] =>
array(1){
[0] =>
字符串(4786)字符串1.我想从第一页开始的一个。}

[1] =>
array(1){
[0] =>
string(4755)string 1 again}}

array(2){
[0] =>
array(1){
[0] =>
字符串(8223)字符串2.我想从第二页开始的。}

[1] =>
array(1){
[0] =>
string(8192)string 2 again}}

编辑: / strong>我实际上可以通过$ output [0]检索我正在查找的结果。但是,如何创建一个新的数组,其内容与$ output [0]可在循环外部访问的内容相同? 解决方案

您从preg_match_all看到的输出是标准的,这是因为您收到了匹配和输出数组中完全匹配的内容。

  $ lines = Array(); 
foreach($ x as $ row){
$ contents = file_get_contents($ row);

$ regex ='/delimiter_start(.*?)delimiter_end/s';
preg_match_all($ regex,$ contents,$ output);
if(is_array($ output)&& isset($ output [0])&&!empty($ output [0])){
$ lines [] = $ output [ 0];
}
}
var_dump($ lines);


I have an array ($x) containing 748 URL:s. Now, I want to fetch a specific part from each page and put all those parts into a new array. That is, an array containing 748 pieces of text, each from a different URL defined in array $x.

Here's the code I've got so far:

foreach ($x as $row) {
    $contents = file_get_contents($row);

    $regex = '/delimiter_start(.*?)delimiter_end/s';
    preg_match_all($regex, $contents, $output);
}

If I var_dump $output I get a strange array that endlessly keeps looping content until I press stop in my browser. The array looks like this:

array(2) {
[0]=>
array(1) {
[0]=>
string(4786) "string 1. The one I want from the first page."}

[1]=>
array(1) {
[0]=>
string(4755) "string 1 again"}}

array(2) {
[0]=>
array(1) {
[0]=>
string(8223) "string 2. The one I want from the second page."}

[1]=>
array(1) {
[0]=>
string(8192) "string 2 again"}}

EDIT: I can actually retrieve the results I'm looking for with $output[0]. But how do I create a new array with the same contents as $output[0] that is accessible outside the loop?

解决方案

The output you are seeing from preg_match_all is standard, this is because you receive the matches and the full matched content in the output array.

$lines = Array();
foreach ($x as $row) {
$contents = file_get_contents($row);

$regex = '/delimiter_start(.*?)delimiter_end/s';
preg_match_all($regex, $contents, $output);
    if (is_array($output) && isset($output[0]) && !empty($output[0])){
    $lines[] = $output[0];
}
}
var_dump($lines);

这篇关于将大量网页内容转化为数组(PHP)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆