如何生成大型网站的图形站点地图 [英] How to generate graphical sitemap of large website

查看:23
本文介绍了如何生成大型网站的图形站点地图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为我的网站生成图形站点地图.据我所知,有两个阶段:

I would like to generate a graphical sitemap for my website. There are two stages, as far as I can tell:

  1. 抓取网站并分析链接关系以提取树状结构
  2. 生成视觉上令人愉悦的树渲染

有没有人有实现这一目标的建议或经验,或者知道我可以建立的现有工作(最好是在 Python 中)?

Does anyone have advice or experience with achieving this, or know of existing work I can build on (ideally in Python)?

我遇到了一些用于渲染树的 不错的 CSS,但它仅适用于 3 个级别.

I came across some nice CSS for rendering the tree, but it only works for 3 levels.

谢谢

推荐答案

这里有一个python 网络爬虫,这应该是一个很好的起点.您的一般策略是:

Here is a python web crawler, which should make a good starting point. Your general strategy is this:

  • 您需要注意永远不会跟踪出站链接,包括位于同一域但高于起点的链接.
  • 当您抓取时,该站点会收集映射到每个页面中包含的所有内部 url 列表的页面 url 哈希.
  • 检查一下这个列表,为每个唯一的 url 分配一个标记.
  • 使用您的 {token => [tokens]} 哈希生成一个 graphviz 文件,该文件将给你一张图
  • 将 graphviz 输出转换为图像映射,其中每个节点都链接到其相应的网页
  • you need to take care that outbound links are never followed, including links on the same domain but higher up than your starting point.
  • as you spider, the site collect a hash of page urls mapped to a list of all the internal urls included in each page.
  • take a pass over this list, assigning a token to each unique url.
  • use your hash of {token => [tokens]} to generate a graphviz file that will lay out a graph for you
  • convert the graphviz output into an imagemap where each node links to its corresponding webpage

你需要做这一切的原因是,正如 leonm 所指出的,网站是图表,而不是树,并且布局图表比在简单的 javascript 和 css 中更难解决.Graphviz 擅长它的工作.

The reason you need to do all this is, as leonm noted, that websites are graphs, not trees, and laying out graphs is a harder problem than you can do in a simple piece of javascript and css. Graphviz is good at what it does.

这篇关于如何生成大型网站的图形站点地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆