PHP抓取一个网站,这是使用cloudflare [英] PHP crawl a website, which is using cloudflare
问题描述
我想从网站(不是我自己的网站)抓取一些特定的值(egnewstext)。
file_get_contents / code>不工作,可疑被php.ini阻止。
所以我试图用curl做,问题是:
我获得的是来自cloudflare的重定向文本。
我的搜寻器应该做类似的操作:
转到页面 - >等待5秒cloudflare重定向 - >卷曲页面。 p>
任何想法如何在云等待时间后抓取页面? (在PHP中)
编辑:所以我尝试了很多东西,问题仍然是一样的。
更具体:它只抓取cloudflare重定向页面。 (所以我得到一个页面重定向到主机,cloudflare在前面当我curl在localhost它需要localhost,所以重定向是obv不工作。)
是没有办法开始保存returend数据后5秒curling?
去页面 - >等待5秒cloudflare重定向 - p>
5秒的插页式广告页面实际上需要在访问者通过检查之前启用JavaScript和Cookie,如果您使用抓取工具或漫游器,则可能无法使用访问该网站。
I want to crawl some specific values (e.g.newstext) from a website (which is not my own).
file_get_contents()
is not working, propably blocked by php.ini.
So i tried to do it with curl, problem is:
All I get is the redirection text from cloudflare.
My crawler should do something like:
go to page -> wait the 5secs cloudflare redirect -> curl the page.
Any ideas how to crawl the page after the cloudfare waiting time? (in PHP)
edit: so i tried a lot of things, problem is still the same..
more specific: it only crawls the cloudflare redirect page. (so i'm getting a page which redirects to the host, cloudflare is in front. when i curl on localhost it takes localhost, so redirect is obv not working.)
Is there no way to start saving returend data after 5secs "curling"?
"go to page -> wait the 5secs cloudflare redirect -> curl the page."
The 5 second interstitial page actually requires that JavaScript and cookies are enabled before a visitor can pass the check, which probably won't work if you're using a crawler or bot to access the site.
这篇关于PHP抓取一个网站,这是使用cloudflare的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!