简单HTML DOM缓存 [英] Simple Html DOM Caching

查看：112 发布时间：2020/9/28 5:06:55 php caching web-scraping

本文介绍了简单HTML DOM缓存的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用简单HTML DOM抓取（获得许可）某些网站。我基本上是用大约每天更新四次的统计数据来刮擦大约50个不同的网站。

I'm using Simple HTML DOM to scrape (with permission) some websites. I basically scrape around 50 different websites with statistical data which is updated around four times a day.

您可以想象需要花费很多时间进行刮擦，因此我需要

As you can imagine it takes times to do the scraping and therefore I need to speed up the process by doing some caching.

我的愿景是：

DATA-PRESENTATION.php //显示所有结果的地方

DATA-PRESENTATION.php // where all the results are shown

SCRAPING.php //完成工作的代码

SCRAPING.php // the code that makes the job

我想在SCRAPING.PHP上设置cron作业，使其每天执行4次，并将所有数据保存在caché中，然后DATA-PRESENTATION.PHP会请求该数据，从而使用户体验更快。

I want to set up a cron job on SCRAPING.PHP in a way it executes 4 times a day and save all the data in caché which then will be requested by DATA-PRESENTATION.PHP making the experience for the user way faster.

我的问题是我应该如何实现这种caché东西？我是PHP的新手，我一直在阅读教程，但是它们对您的帮助不是很大，只有几本书，所以我真的没法学习如何做。

My question is how can I implement this caché thing? I'm very rookie at PHP, I've been reading tutorials but they are not very helpfull and there are just a few so I just couldn't really learn how to do it.

我知道其他解决方案可能正在实现数据库，但我不想这样做。另外，我一直在阅读诸如memcached之类的高端解决方案，但是该站点非常简单并且可以个人使用，因此我不需要那种东西。

I know other solution might be implementing a database but I don't want to do that. Also, I've been reading about high end solutions like memcached, but the site is very simple and for personal use, so I don't need that kind of stuff.

谢谢!!

SCRAPING.PHP

<?php
include("simple_html_dom.php");

// Labour stats
$html7 = file_get_html('http://www.website1.html');
$web_title = $html7->find(".title h1");
$web_figure = $html7->find(".figures h2");

?>

DATA-PRESENTATION.PHP

 <div class="news-pitch">
 <h1>Webiste: <?php echo utf8_encode($web_title[0]->plaintext); ?></h1>
 <p>Unemployment rate: <?php echo utf8_encode($web_figure[0]->plaintext); ?></p>
 </div>

最终代码！非常感谢@jerjer和@ PaulD.Waite，没有您的帮助，我真的无法完成！

FINAL CODE! Many thanks @jerjer and @PaulD.Waite, I couldn't really get this done without your help!

文件：

1- DataPresentation.php //在这里，我显示请求Cache.html

1- DataPresentation.php // here I show the data requested to Cache.html

2- Scraping.php //在这里，我抓取网站，然后将结果保存到Cache.html

2- Scraping.php // here I scrape the sites and then save the results to Cache.html

3- Cache.html //在这里，抓取结果保存

3- Cache.html // here the scraping results are saved

我在Scraping.php上设置了一个Cron作业，告诉它每次都覆盖Cache.html。

I set up a Cron Job on Scraping.php telling it to overwrite Cache.html each time.

1- DataPresentation.php

<?php
include("simple_html_dom.php");

$html = file_get_html("cache/test.html");
$title = $html->find("h1");
echo $title[0]->plaintext;
?>

2- Scraping.php

<?php
include("simple_html_dom.php");

// by adding "->find("h1")" I speed up things as it only retrieves the information I'll be using and not the whole page.
$filename = "cache/test.html";
$content = file_get_html ('http://www.website.com/')->find("h1");
file_put_contents($filename, $content);
?>

3- Cache.html

<h1>Current unemployment 7,2%</h1>

它会立即加载，并通过这种方式进行设置，以确保始终会加载Caché文件。 / p>

It loads immediately and by setting things this way I assure there's always a Caché file to be loaded.

简单HTML DOM缓存 [英] Simple Html DOM Caching

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

简单HTML DOM缓存 [英] Simple Html DOM Caching

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭