如何生成大型网站的图形站点地图 [英] How to generate graphical sitemap of large website
问题描述
我想为我的网站生成图形站点地图.据我所知,有两个阶段:
I would like to generate a graphical sitemap for my website. There are two stages, as far as I can tell:
- 抓取网站并分析链接关系以提取树状结构
- 生成视觉上令人愉悦的树渲染
有没有人有实现这一目标的建议或经验,或者知道我可以建立的现有工作(最好是在 Python 中)?
Does anyone have advice or experience with achieving this, or know of existing work I can build on (ideally in Python)?
我遇到了一些用于渲染树的 不错的 CSS,但它仅适用于 3 个级别.
I came across some nice CSS for rendering the tree, but it only works for 3 levels.
谢谢
推荐答案
这里有一个python 网络爬虫,这应该是一个很好的起点.您的一般策略是:
Here is a python web crawler, which should make a good starting point. Your general strategy is this:
- 您需要注意永远不会跟踪出站链接,包括位于同一域但高于起点的链接.
- 当您抓取时,该站点会收集映射到每个页面中包含的所有内部 url 列表的页面 url 哈希.
- 检查一下这个列表,为每个唯一的 url 分配一个标记.
- 使用您的 {token => [tokens]} 哈希生成一个 graphviz 文件,该文件将给你一张图
- 将 graphviz 输出转换为图像映射,其中每个节点都链接到其相应的网页
- you need to take care that outbound links are never followed, including links on the same domain but higher up than your starting point.
- as you spider, the site collect a hash of page urls mapped to a list of all the internal urls included in each page.
- take a pass over this list, assigning a token to each unique url.
- use your hash of {token => [tokens]} to generate a graphviz file that will lay out a graph for you
- convert the graphviz output into an imagemap where each node links to its corresponding webpage
你需要做这一切的原因是,正如 leonm 所指出的,网站是图表,而不是树,并且布局图表比在简单的 javascript 和 css 中更难解决.Graphviz 擅长它的工作.
The reason you need to do all this is, as leonm noted, that websites are graphs, not trees, and laying out graphs is a harder problem than you can do in a simple piece of javascript and css. Graphviz is good at what it does.
这篇关于如何生成大型网站的图形站点地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!