平板的红移性能与尺寸和事实的关系 [英] Redshift Performance of Flat Tables Vs Dimension and Facts

查看:97
本文介绍了平板的红移性能与尺寸和事实的关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在平面OLTP表上创建尺寸模型(不在3NF中).

I am trying to create dimensional model on a flat OLTP tables (not in 3NF).

有些人认为不需要维模型表,因为报告的大多数数据都显示为单个表.但是该表包含的内容超出了我们所需的300列.我还是应该将平面表划分为维度和事实,还是直接在报表中使用平面表?

There are people who are thinking dimensional model table is not required because most of the data for the report present single table. But that table contains more than what we need like 300 columns. Should I still separate flat table into dimensions and facts or just use the flat tables directly in the reports.

推荐答案

仅出于报告目的创建表时(通常在数据仓库中使用),习惯上创建宽,具有非标准化数据的平面表,原因是:

When creating tables purely for reporting purposes (as is typical in a Data Warehouse), it is customary to create wide, flat tables with non-normalized data because:

  • 查询更容易
  • 它避免了因果关系用户可能会感到困惑和容易出错的JOIN
  • 查询运行速度更快(尤其是对于使用列数据存储的数据仓库系统)
  • It is easier to query
  • It avoids JOINs that can be confusing and error-prone for causal users
  • Queries run faster (especially for Data Warehouse systems that use columnar data storage)

此数据格式非常适合报告,但不适合应用程序的常规数据存储-用于OLTP的数据库应使用规范化表.

This data format is great for reporting, but is not suitable for normal data storage for applications — a database being used for OLTP should use normalized tables.

不要担心具有大量列-这对于数据仓库来说是很正常的.但是,300列确实听起来很大,这表明它们不一定被明智地使用.因此,您可能要检查是否需要它们.

Do not be worried about having a large number of columns — this is quite normal for a Data Warehouse. However, 300 columns does sound rather large and suggests that they aren't necessarily being used wisely. So, you might want to check whether they are required.

许多列的一个很好的例子是具有使编写WHERE子句容易的标志,例如WHERE customer_is_active,而不必连接到另一个表并弄清楚它们是否在过去30天中使用过该服务.这些列将需要每天重新计算,但是对于查询数据非常方便.

A great example of many columns is to have flags that make it easy to write WHERE clauses, such as WHERE customer_is_active rather than having to join to another table and figuring out whether they have used the service in the past 30 days. These columns would need to be recalculated daily, but are very convenient for querying data.

底线::使用数据仓库时,您应该将易用性置于性能之上.然后,找出如何使用数据仓库系统(如Amazon Redshift)来优化访问,该系统旨在非常有效地处理此类数据.

Bottom line: You should put ease of use above performance when using Data Warehousing. Then, figure out how to optimize access by using a Data Warehousing system such as Amazon Redshift that is designed to handle this type of data very efficiently.

这篇关于平板的红移性能与尺寸和事实的关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆