哪个数据仓库和BI工具适合我? [英] Which data warehouse and BI Tool is right for me?

查看:76
本文介绍了哪个数据仓库和BI工具适合我?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


我们是在AWS云上具有基于SaaS解决方案的ISV.我们的OLTP关系数据库位于5个单独的SQL Server上,因为我们为每个帐户(客户)保留一个数据库.目前,我们在这5台SQL Server上拥有5,000多个数据库.
我们的Dare Warehouse当前是一个3节点的AWS Redshift集群.对于我们将其用于内部仪表板,它的性能相对较好.我们使用SQL Server SSIS ETL包全天候(24/7)抽取数据,目前在生产SQL Server之间存在时滞 Redshift大约需要15分钟.我们希望将这种滞后保持在同一水平.
考虑到一些客户对更现代的BI工具的新要求,我尝试了市场上许多不同的BI工具,发现它们在处理Redshift Direct Query时都具有较慢的性能(无需将数据导入到他们的 自己的平台),其中大多数非常昂贵,超出了我们的预算.
在所有这些BI工具中,我喜欢MS Power BI,因为它启动起来非常便宜,并且可以为实际生产使用定价.与其他BI工具一样,Power BI的问题在于,除非我们将数据导入其平台,否则仪表板的呈现将非常缓慢.
另外,由于Redshift只能有一个排序顺序(没有索引),因此我分配了一个包括Account_ID的排序顺序复合键.当我们发出需要在另一个字段上进行数据的查询(不包含account_ID的查询)时,Redshift的效果不佳.
您对使用哪种数据仓库和BI工具有建议?感谢您的帮助.

Hi,
We are an ISV with a SaaS based solution on AWS cloud. Our OLTP relational databases are on 5 individual SQL Servers as we keep one database for each account (customer). Currently, we have more than 5,000 databases on these 5 SQL Servers.
Our Dare Warehouse is currently a 3 node AWS Redshift cluster. It is performing relatively well for what we are using it for, our inhouse-built dashboarding. We pump data 24/7 using SQL Server SSIS ETL packages and currently the lag between production SQL Server and Redshift is about 15 minutes. We want to keep this lag at the same level.
Given the new requirements from some of the customers for a more modern BI tool, I experimented with many different BI tools in the market and found that all of them have slow performance when dealing with Redshift Direct Query (without importing data to their own platform) and most of them are very expensive and are out of our budget.
Among all of these BI tools, I liked MS Power BI as it is very cheap to start and fairly priced for real production use. The problem with Power BI, like other BI tool, is that unless we import data into its platform, dashboards will be very slow to render. 
Also, since Redshift can only have one sort order (no indexing), I allocated a sort order composite key that includes Account_ID. Redshift doesn't perform well when we issue queries that require data on another field (queries that don't include account_ID).
Do you have a suggestion on which Data warehouse and BI tool to use? I appreciate your help.

推荐答案

AL.M,

典型的企业Power BI实现由承载已开发数据模型的Power BI Server实例组成,并且可以通过已开发/已发布的报表和仪表板将数据模型访问到Power BI Server.在后端,有服务 从原始数据所在的数据仓库实例中提取数据.为了更新数据模型,设置了计划以使用来自数据源的最新原始数据刷新数据模型.这样可以减轻从数据源到Power BI的任何延迟 报表使用者,并在所有报表之间创建一定程度的一致性(报表使用者使用相同的数据模型和报表/仪表板,而不是驻留在用户笔记本电脑上的单个报表/仪表板,等等).

A typical enterprise Power BI implementation consists of a Power BI Server instance that hosts the developed data model, and the data model is accessed via developed/published reports and dashboards to the Power BI Server. On the backend, there are services that pull data from the Data Warehouse instance(s) where the raw data resides. To update the data model, a schedule is set to refresh the data model with the latest raw data from the data source. This mitigates any lag from the data source to the Power BI report consumer, as well as, creates a level of consistency across all reports (report consumers access the same data model and reports/dashboards instead of individual reports/dashboards that reside on user's laptop, etc.).

对于您选择的Azure数据仓库,Azure中有两个主要选项:

As for your Azure Data Warehouse of choice, there are two primary options in Azure:

Azure SQL数据仓库

Azure上的雪花

如果还有其他问题,请告诉我们.

Please let us know if there are additional questions.

此致

迈克


这篇关于哪个数据仓库和BI工具适合我?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆