我可以使用WGET生成给定URL的网站的站点地图吗? [英] Can I use WGET to generate a sitemap of a website given its URL?
问题描述
我需要一个脚本,该脚本可以爬取一个网站并以纯文本或类似格式返回所有已爬网页面的列表;我将其作为站点地图提交给搜索引擎.我可以使用WGET生成网站的站点地图吗?还是有可以执行相同操作的PHP脚本?
I need a script that can spider a website and return the list of all crawled pages in plain-text or similar format; which I will submit to search engines as sitemap. Can I use WGET to generate a sitemap of a website? Or is there a PHP script that can do the same?
推荐答案
wget --spider --recursive --no-verbose --output-file=wgetlog.txt http://somewebsite.com
sed -n "s@.\+ URL:\([^ ]\+\) .\+@\1@p" wgetlog.txt | sed "s@&@\&@" > sedlog.txt
这将创建一个名为sedlog.txt
的文件,其中包含在指定网站上找到的所有链接.您可以使用PHP或Shell脚本将文本文件站点地图转换为XML站点地图.调整wget命令的参数(accept/reject/include/exclude)以仅获取所需的链接.
This creates a file called sedlog.txt
that contains all links found on the specified website. You can use PHP or a shell script to convert the text file sitemap into an XML sitemap. Tweak the parameters of the wget command (accept/reject/include/exclude) to get only the links you need.
这篇关于我可以使用WGET生成给定URL的网站的站点地图吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!