在数据库中发现事实上的外键的工具? [英] Tools for discovering de facto foreign keys in databases?

查看:181
本文介绍了在数据库中发现事实上的外键的工具?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



在数据库中快速查看信息的一个好方法是应用一个自动创建所有表的数据库图表的工具。我的经验,这样的工具使用外键作为关系,我尝试的大多数数据库都不包含这些关系。当然,它们满足对应于外键的约束,但是不强制执行它们。我将最终得到一个由一堆不相关的表组成的图表。

所以我在找的是能够计算未声明的外键和


  • 在数据库图表中使用它们作为表关系,或者
  • 生成SQL代码对于相应的外键声明



你知道有什么工具,如果可能的话,可以免费的吗? $ b

解决方案

有趣的问题。您正在寻找解析数据库模式和数据,以确定哪些表是相关的或应该相互关联,没有任何严格的关系定义。实际上,你试图推断一个关系。



我看到两种方法可以推断出这种关系。首先让我说,你的方法可能会有所不同,取决于你正在使用的数据库。一些问题值得思考(我不想要答案,但值得反思)
$ b $ ul

  • 是这些内部企业遵循某种一致的命名约定或模式的系统?
  • 或者它们是否在任何时间在任何地方遇到过的野外数据库?

  • 你准备做什么样的假设?

  • 您是否希望在结果中获得更多的误报或漏报?


    这种推论几乎肯定会给出错误的结果,并建立在很多假设上。所以我提供了两种我将一致使用的方法。 / p>

    通过结构/命名(符号分析)推断关系


    通用数据库设计是在表名之后(例如 CustomerId Customer )命名PK列,或者也可以命名PK列只需 Id



    与另一个FK关系的表经常将其相关列命名为相关表。在 Order 表中我希望一个 CustomerId 列引用 CustomerId Customer 表中的code> / Id 列。

    这种类型的分析将包括:


    • 检查表中相似的短语/单词的列

    • 查找与其他表的名称类似的列名称
    • 检查包含其他列名称的列名称(例如 FirstCustomerId & SecondCustomerId 都引用 Customer 中的 CustomerId c $ c


      b通过数据推断关系(统计分析)



      查看数据,正如您建议您在评论中所做的那样,将允许您确定可能引用。如果 Order 表中的 CustomerId 列包含 Id中不存在的值 Customer 表中的列,那么有理由质疑这是一个有效的关系(尽管你永远不知道!)

      一个简单的数据分析形式是使用日期和时间。创建彼此靠近的行更有可能彼此相关。如果对于创建的每个 Order 行,在几秒钟之内还存在2到5 Item 行,那么两者之间的关系是可能的。



      更详细的分析可能会查看使用的值的范围和分布。

      例如,如果你的 Order 表有一个 St_Id 列 - 你可以推断使用符号分析该列可能与 State 表或 Status 表有关。 St_Id 列有6个离散值,90%的记录被2个值覆盖。 State 表有200行, Status 表有9行。你可以很合理地推断出 St_Id 列与 Status 表相关 - 它给出了更多的行覆盖(2/3的行是'used',而只有3%的行在 State 表中被使用)。



      如果您对现有数据库执行数据分析以收集真实生活数据,我希望可以使用一些模式作为结构推理的指南。当一个有大量记录的表有一个重复很多次(不一定按顺序)的值很少的列时,这个列更可能涉及一个相对较少的行数的表。



      总结



      祝您好运。这是一个感兴趣的问题,我刚刚提出了一些想法,但这是一个很大的尝试&错误,数据收集和性能调整情况。

      A good way to quickly survey the information in a database is to apply a tool that automatically creates a database diagram of all tables and all relationships between them.

      In my experience, such tools use foreign keys as the relationships, which most of the databases I try them do not contain. Sure, they satisfy constraints corresponding to foreign keys, but do not enforce them. And I'll end up with a 'diagram' consisting of a bunch of unrelated tables.

      So what I'm looking for is software that can compute "undeclared foreign keys" and either

      • uses them as table relations in a database diagram, or
      • generates SQL code for corresponding foreign key declarations

      Do you know any tools, free if possible, that can already do this?

      解决方案

      Interesting question. You're looking to parse a database schema and data to determine which tables are relevant or should be related to each other, without any strict definition of the relationship. In effect, you're trying to infer a relationship.

      I see two ways that you can infer such a relationship. First let me say that your approach might vary depending on the databases you're working with. A number of questions spring to mind (I don't want answers, but they are worth reflecting on)

      • are these in-house enterprise systems that follow some consistent naming convention or pattern?
      • or are they 'in-the-wild' databases that you come across anywhere, at any time?
      • what sort of assumptions are you prepared to make?
      • would you prefer to get more false positives or false negatives in your result?

      Note that this type of inference will almost certainly give false results, and is built on a lot of assumptions.

      So I offer two approachs that I'd use in concert.

      Inferring a relationship through structure / naming (symbolic analysis)

      Common database design is to name a PK column after the table name (e.g. CustomerId on table Customer), or alternatively name the PK column simply Id.

      A table with a FK relationship to another often names its related column the same as the related table. In the Order table I'd expect a CustomerId column which refers to the CustomerId / Id column in the Customer table.

      This type of analysis would include

      • inspecting columns across tables for similar phrases / words
      • looking for columns names that are similar to the names of other tables
      • checking for column names that contain the name of other column (e.g. FirstCustomerId & SecondCustomerId both refer to the CustomerId column in the Customer table)

      Inferring a relationship through data (statistical analysis)

      Looking at data, as you suggest you have done in your comments, will allow you to determine 'possible' references. If the CustomerId column in the Order table contains values which don't exist in the Id column of the Customer table then it's reasonable to question that this is a valid relationship (although you never know!)

      A simple form of data analysis is using dates and times. Rows that were created with close proximity to one another are more likely to be related to one another. If, for every Order row that was created, there also exist between 2 and 5 Item rows created within a few seconds, then a relationship between the two is likely.

      A more detailed analysis might look at the range and distribution of used values.

      For example, if your Order table has a St_Id column - you might infer using symbolic analysis that the column is likely to relate to either a State table or a Status table. The St_Id column has 6 discrete values, and 90% of the records are covered by 2 values. The State table has 200 rows, and the Status table has 9 rows. You could quite reasonably infer that the St_Id column relates to the Status table - it gives a more greater coverage of the rows of the table (2/3 of the rows are 'used', whereas only 3% of the rows in the State table would be used).

      If you perform data analysis on existing databases to gather 'real life data', I'd expect some patterns that could be used as guides to structure inference. When a table with a large number of records has a column with a small number of values repeated many times (not necessarily in order), it's more likely to this column relates to a table with a correspondingly small number of rows.

      In summary

      Best of luck. It's an interested problem, I've just thrown some ideas out there but this is very much a trial & error, data gathering and performance tuning situation.

      这篇关于在数据库中发现事实上的外键的工具?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆