如何有效地在django中提供大量的站点地图 [英] How to efficiently serve massive sitemaps in django

查看:118
本文介绍了如何有效地在django中提供大量的站点地图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的站点地图中有大约150K页的网站。我正在使用站点地图索引生成器来创建站点地图,但实际上,我需要一种缓存方式,因为建立150个站点的每个1,000个链接每个都在我的服务器上是残酷的。[1]



我可以用memcached缓存这些网站地图页面,这是我在网站其他地方使用的...但是,这是很多的Sitemaps,它将完全填充memcached ....所以这不起作用。



我认为我需要的是使用数据库作为缓存的方法,只有当它们有更改时才生成它们(由于网站地图索引的结果只是改变最新的几个网站地图页面,因为其余的都是一样的。)[2]但是,尽可能接近我只能使用一个缓存后端与django。



如果Google没有杀死我的数据库或memcached,Google如何才能准备好这些网站地图?



任何想法?



[1]我将其限制在每个站点地图页面上的1,000个链接,因为生成最多50,000个链接,只是没有发生。 / p>

[2]例如,如果我有sitemap.xml?page = 1,page = 2 ... sitemap.xml?page = 50,我只需要更改sitemap.xml?page = 50,直到它已满1000个链接,那么我可以永远这样做,并专注于第51页,直到它已满,缓存它永远等等。



编辑,2012-05-12:这仍然是一个问题,我在Django的网站地图框架中使用文件缓存大约一年后,终于放弃了。相反,我正在使用Solr在一个非常简单的视图中生成所需的链接,然后将它们传递给Django模板。这个极大地简化了我的站点地图,使他们的表现很好,截至目前,我已经达到了大约2,250,000个链接。如果你想这样做,只需查看站点地图模板 - 这一切都很明显。你可以在这里看到代码: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/src/tip/alert/casepage/sitemap.py

解决方案

我有一个类似的问题,并决定使用django将静态媒体中的网站地图文件写入磁盘,并让网络服务器提供服务。我每隔几个小时就打电话重新生成网站地图,因为我的内容不是更频繁地更改。但是这取决于你的内容需要多长时间编写文件。



我使用了一个带有cron作业的django自定义命令,但是使用cron作业卷曲更容易



这是我如何使用curl,我有apache将/sitemap.xml作为静态文件发送,而不是通过django:

  curl -o /path/sitemap.xml http://example.com/generate/sitemap.xml 


I have a site with about 150K pages in its sitemap. I'm using the sitemap index generator to make the sitemaps, but really, I need a way of caching it, because building the 150 sitemaps of 1,000 links each is brutal on my server.[1]

I COULD cache each of these sitemap pages with memcached, which is what I'm using elsewhere on the site...however, this is so many sitemaps that it would completely fill memcached....so that doesn't work.

What I think I need is a way to use the database as the cache for these, and to only generate them when there are changes to them (which as a result of the sitemap index means only changing the latest couple of sitemap pages, since the rest are always the same.)[2] But, as near as I can tell, I can only use one cache backend with django.

How can I have these sitemaps ready for when Google comes-a-crawlin' without killing my database or memcached?

Any thoughts?

[1] I've limited it to 1,000 links per sitemap page because generating the max, 50,000 links, just wasn't happening.

[2] for example, if I have sitemap.xml?page=1, page=2...sitemap.xml?page=50, I only really need to change sitemap.xml?page=50 until it is full with 1,000 links, then I can it pretty much forever, and focus on page 51 until it's full, cache it forever, etc.

EDIT, 2012-05-12: This has continued to be a problem, and I finally ditched Django's sitemap framework after using it with a file cache for about a year. Instead I'm now using Solr to generate the links I need in a really simple view, and I'm then passing them off to the Django template. This greatly simplified my sitemaps, made them perform just fine, and I'm up to about 2,250,000 links as of now. If you want to do that, just check out the sitemap template - it's all really obvious from there. You can see the code for this here: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/src/tip/alert/casepage/sitemap.py

解决方案

I had a similar issue and decided to use django to write the sitemap files to disk in the static media and have the webserver serve them. I made the call to regenerate the sitemap every couple of hours since my content wasn't changing more often than that. But it will depend on your content how often you need to write the files.

I used a django custom command with a cron job, but curl with a cron job is easier.

Here's how I use curl, and I have apache send /sitemap.xml as a static file, not through django:

curl -o /path/sitemap.xml http://example.com/generate/sitemap.xml

这篇关于如何有效地在django中提供大量的站点地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆