Web爬网Rails应用程序过建模了吗? [英] Web-scraping Rails App Getting Over-Modelled?

查看:77
本文介绍了Web爬网Rails应用程序过建模了吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对我是否过度建模我的应用提出一些意见.在此应用程序中,我保存了从网站下载的html元数据.我下载了meta标签,并将它们作为数组的一部分.对于meta_tags数组中的每个元素,我想保存该元素.但是,我需要考虑以下情况,例如,存在两个机器人元数据(一个用于索引,一个用于跟踪).因此,我最初的想法是通过创建"meta_tags"表并将所有元标记保存到表中来解决此问题.那样会使站点表保持精简.我只希望指定站点表具有许多meta_tags.

I'd like some opinons on whether I'm over-modeling my app. In this app, I'm saving off html meta data I download from websites. I download the meta tags and make them part of an an array. For each element in the meta_tags array, I want to save that element. But I need to account for situations where, for instance, there are two robots meta metas (one for index and one for follow). So my initial thought was to solve this by creating a "meta_tags" table and saving any meta tags off to their. That woud keep the sites table lean. I would just specify that the site table has many meta_tags.

但是后来我意识到meta_tags将有很多重复的条目.例如,如果我有两个带有两个机器人元标记的网站(同样,一个用于索引,一个用于跟随),那么当我只有两个唯一记录时,该表上有四行.因此,现在我想我应该让Sites模型进行html的下载,然后有一个单独的名为元标记"的模型,该模型列出所有唯一的元标记.然后,我将通过称为"site_meta_tags"的联接表将sites表与meta_tags表相关联,该联接表标识哪个站点具有哪些meta标签.那是设置它的最好方法吗?还是让我变得太复杂了?

But then I realized that the meta_tags is going to have a lot of duplicate entries. For instance, if I have two websites that have two robots meta tags (again, one for index and one for follow), then I've got four rows on that table, when I only have two unique records. So now I'm thinking that I should have the sites model do the downloading of html and then have a separate model called "meta tags" that lists all unique meta tags. And then I would associate the sites table with the meta_tags table through a join table called "site_meta_tags" that identifies which site had which meta tags. Is that the best way to set this up? Or am I making this too complicated?

更新:我在此处发布了后续问题: Rails应用程序具有保存模型间的麻烦

UPDATE: I posted a follow up question here: Rails app has trouble with inter-model saving

推荐答案

模型和关联的正确"数量取决于您的用例和约束.如果数据库空间有限,则数据库规范化可能更有意义.如果您想要更快的查找,则非规范化可能更有意义.如果您需要优化某些种类的查找,请为此安排模型和关系.所有这些都说明了,如果您只是原型制作,现在就不用担心太多了-从有意义的东西开始,然后看看会发生什么.

The "right" number of models and associations depends on your use cases and constraints. If database space is at a premium, database normalization might make more sense. If you want faster lookups, denormalization might make more sense. If you need to optimize certain kinds of lookups, arrange your models and relations for that. All of this said, if you are just prototyping, don't worry too much right now -- start with something that makes sense and see what happens.

如果您希望能够双向查找,那么您描述的方式(多对多关系)对我来说听起来不错:

The way you described (a many to many relationship) sounds fine to me if you want to be able to lookup in both directions:

  1. 首先获取元标记,然后找到关联的网站
  2. 首先针对网站,然后找到相关的元标记

(注意:不要忘记添加索引.)

(Note: don't forget to add your indexes.)

顺便说一下,在Rails中,对于多对多联接表,Rails约定是将两个表名按字母顺序排列在一起.因此,默认情况下将是"meta_tags_sites"而不是"sites_meta_tags".请参阅活动记录关联指南中的"has_and_belongs_to_many"部分.

By the way, in Rails, for a many to many join table, the Rails convention is to alphabetize the two table names before sticking them together. So it would be "meta_tags_sites" not "sites_meta_tags" by default. See the "has_and_belongs_to_many" section in A Guide to Active Record Associations.

这篇关于Web爬网Rails应用程序过建模了吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆