寻找易于被搜索引擎索引的唯一ID模式 [英] Look for unique ID pattern which easy indexed by search engines

查看:108
本文介绍了寻找易于被搜索引擎索引的唯一ID模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

或来自国家漏洞
数据库 - CVE-2010-1428或来自红帽 - RHSA-2010:0376或OID
- 1.3.6.1.4.1.311或来自UUID / GUID
- 550e8400- e29b-41d4-a716-446655440000



我想将多个作业添加到UID中。查看下一个...



我开发博客软件,并且有想在每个帖子的
主体中放置唯一ID,这样可以很容易地识别出本地存储的副本是
对应于远程发布的副本。



另外我想发布到许多不同的博客服务,所以如果一个
下来,文章将可以从另一个。因此,链接可以
死了,但如果我添加UID - 任何人都可以尝试网络搜索找到
另一项服务的帖子!

另外,这也允许收集一些传播
统计数据的文章。许多网站只是复制内容(抄写和
重写机器人和人)破坏搜索引擎。使用UID我
很容易可以识别这样的网站...

所以我的问题是如何制作UID(以哪种形式),因此它将是
很容易被搜索引擎索引(网络,如Google / Yahoo和b $ b公司,例如Lucene / Solr / Sphinx / Xapian /等)。

b b $ b p关于搜索引擎的一些限制,比如:

  • only> =每个搜索部分的3个字符

  • 它没有像gfh6wytrh6wu56he5gahj763



  • 这样的索引灰尘,所以这个任务并不容易......



    任何建议都会被赞赏(books / blog articles / etc)。

    http://en.wikipedia.org/wiki/Tag_URIrel =nofollow>标签URI ,如 RFC 4151



    它们是全球唯一的,每个拥有域名或电子邮件地址至少一天的人都可以使用mint他们。



    请注意,这些URI只有 i dentify ,他们没有找到。所以标签URI并没有说明发布内容的地方。



    假设您网站的域名是example.com。如果您创建博客帖子,则可以创建以下标签URI:

      tag:example.com,2012-12:cute -cat 

    请注意,此URI中的日期不是发布日期!它必须是您拥有该域名(或电子邮件地址)的(过去)日期。如果您在2003年注册了您的域名,则始终可以使用以 tag:example.com,2004:(不是2003开头,因为2003意味着 2003-01-01,这可能是你还没有拥有该域名的时间),然后是一个受你控制的(唯一的)字符串。但是,如果您喜欢,当然可以随时使用发布日期。但不要使用未来的日期。


    Like from Microsoft - "KB2756872" or from National Vulnerability Database - "CVE-2010-1428" or from Red Hat - "RHSA-2010:0376" or from OIDs - "1.3.6.1.4.1.311" or from UUID/GUID - "550e8400-e29b-41d4-a716-446655440000".

    I want to put several jobs to UIDs. See next...

    I develop blog software and have idea to put unique ID in body of each post so can easily identify that copy from local storage is correspond to remote published copy.

    Also I want to post to many different blogging services so if one is down articles will be accessible from another. So link can dead but if I add UID - anyone can try web-search to find post on another service!

    Also this allow to gather some article spreading statistics. Many sites just replicate content (copy-writing and rewriting bots and people) to broke search engines. With UID I easily can identify such sites...

    So my question how is to make UIDs (in which form) so it would be easily indexed by search engines (web, like Google/Yahoo, and corporate, like Lucene/Solr/Sphinx/Xapian/etc).

    I know about some limitation of search engine like:

    • only >= 3 chars for each search part
    • it was not indexed dust like gfh6wytrh6wu56he5gahj763

    so this task s not easy...

    Any advice is appreciated (books/blog articles/etc).

    解决方案

    You could use Tag URIs, as defined by RFC 4151.

    They are globally unique, and everyone who owned a domain name or an email address for at least a day can mint them.

    Note that these URIs only identify, they don’t locate. So a Tag URI doesn’t say anything about where something is published.

    Let’s say your site’s domain is "example.com". If you create a blog post, you could create the following Tag URI:

    tag:example.com,2012-12:cute-cat
    

    Note that the date in this URI is not a publication date! It must be a (past) date on which you owned the domain (resp. email address). If you registered your domain in 2003, you could always use Tag URIs starting with tag:example.com,2004: (not "2003", because "2003" would mean "2003-01-01", which might be a time where you didn’t own the domain yet), followed by a (unique) string under your control. However, if you like you could always use the publication date, of course. But don’t use future dates.

    这篇关于寻找易于被搜索引擎索引的唯一ID模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆