什么标签模式是最有效/最有效的? [英] What tag schema(s) are the most efficient/effective?

查看:214
本文介绍了什么标签模式是最有效/最有效的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

http://tagging.pui.ch / post / 37027745720 / tags-database-schemas

Stackoverflow的标签处理是迄今为止看到的最好的。

Stackoverflow's tag handling is among the best that I've seen so far.

有谁知道是否是一个模式模式,我可以从中获得一些想法?

Does anyone know if it is a schema pattern I could get some ideas from?

否则,我只是在寻找建议

Otherwise, I'm just looking for suggestions on what tag schemas others have successfully implemented.

推荐答案

这一切都取决于数据量和内容来标记分布和密度比例

It all depends on data volumes and content to tag distribution and density ratios

如果您具有低标签分布和密度比(典型的人工生成数据),则可以简单地为数据使用的每个可能的标签集合生成唯一的ID或散列。将标签收集标识与每个数据实例与这些标签相关联

If you have a low tag distribution and density ratio (typical human generated data) you can simply generate an unique id or hash for each possible collection of tags in use by the data. Associate the 'tag collection' id with each data instance with those tags

对于许多形式的人类生成数据,这可以非常好地工作。

This can work surprisingly well for many forms of human generated data

例如Stackoverflow有〜500,000个问题,〜20,000个标签(太多的dupe-ish标签!)。大多数问题的标签少于五个。在最糟糕的情况下,您将拥有500,000个标签收集ID,但更实际的是您将有几千个

e.g. Stackoverflow has ~500,000 questions, and ~20,000 tags (too many dupe-ish tags!). Most questions have less than five tags. At worst case scenario you will have 500,000 'tag collection' id's to associate , but more realistically you will have several thousand

您还需要具有实例跟踪或垃圾

You also will either have to have instance tracking or garbage collection on the 'tag collection' collection as specific combination of tags fall out of use

例如

    $ b标签收集集合中的标签集合作为特定组合使用$ b
  • 标签:id,tagName

  • TagCollection:id,instanceCount

  • TagCollectionTag:tagCollectionIId,tagId

  • 数据:id,title,content,tagCollectionId

  • Tag: id, tagName
  • TagCollection: id, instanceCount
  • TagCollectionTag: tagCollectionIId, tagId
  • Data: id, title, content, tagCollectionId

如果使用哈希,插入标签是快速的收藏)。否则,您必须搜索TagCollection和TagCollectionTag集合,但这不应该太大了

Inserting tags is fast if a hash is used (hash on all tags of the collection). Otherwise you have to search the TagCollection and TagCollectionTag collections, but this should not be too large anyway

搜索速度快;搜索TagCollectionTag用于包含特定标签集的实例,然后使用任何tagCollectionId的

Searching is fast; search TagCollectionTag for instances containing the specific set of tags, and then find data rows with any of those tagCollectionId's

查找数据行希望不会太混乱: - )

Hope that wasn't too confusing :-)

这篇关于什么标签模式是最有效/最有效的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆