什么标签模式最有效/最有效? [英] What tag schema(s) are the most efficient/effective?

查看:15
本文介绍了什么标签模式最有效/最有效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

http://tagging.pui.ch/post/37027745720/tags-database-schemas

Stackoverflow 的标签处理是迄今为止我见过的最好的.

Stackoverflow's tag handling is among the best that I've seen so far.

有谁知道我可以从中获得一些想法的模式模式吗?

Does anyone know if it is a schema pattern I could get some ideas from?

否则,我只是在寻找其他人成功实施的标签架构的建议.

Otherwise, I'm just looking for suggestions on what tag schemas others have successfully implemented.

推荐答案

这完全取决于数据量和内容与标签的分布和密度比

It all depends on data volumes and content to tag distribution and density ratios

如果您的标签分布和密度比率较低(典型的人工生成数据),您可以简单地为数据使用的每个可能的标签集合生成一个唯一的 ID 或哈希值.将标签集合"ID 与每个带有这些标签的数据实例相关联

If you have a low tag distribution and density ratio (typical human generated data) you can simply generate an unique id or hash for each possible collection of tags in use by the data. Associate the 'tag collection' id with each data instance with those tags

这对于许多形式的人工生成数据都非常有效

This can work surprisingly well for many forms of human generated data

例如Stackoverflow 有大约 500,000 个问题和大约 20,000 个标签(太多的欺骗标签!).大多数问题的标签少于五个.在最坏的情况下,您将有 500,000 个标签集合"ID 来关联,但更现实的是,您将有数千个

e.g. Stackoverflow has ~500,000 questions, and ~20,000 tags (too many dupe-ish tags!). Most questions have less than five tags. At worst case scenario you will have 500,000 'tag collection' id's to associate , but more realistically you will have several thousand

您还必须对标签收集"集合进行实例跟踪或垃圾收集,因为特定的标签组合不再使用

You also will either have to have instance tracking or garbage collection on the 'tag collection' collection as specific combination of tags fall out of use

例如

  • 标签:id、tagName
  • TagCollection:id、instanceCount
  • TagCollectionTag:tagCollectionIId、tagId
  • 数据:id、标题、内容、tagCollectionId

如果使用散列(集合的所有标签上的散列),插入标签会很快.否则你必须搜索 TagCollection 和 TagCollectionTag 集合,但无论如何这不应该太大

Inserting tags is fast if a hash is used (hash on all tags of the collection). Otherwise you have to search the TagCollection and TagCollectionTag collections, but this should not be too large anyway

搜索速度很快;在 TagCollectionTag 中搜索包含特定标签集的实例,然后查找具有任何这些 tagCollectionId 的数据行

Searching is fast; search TagCollectionTag for instances containing the specific set of tags, and then find data rows with any of those tagCollectionId's

希望这不会太令人困惑:-)

Hope that wasn't too confusing :-)

这篇关于什么标签模式最有效/最有效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆