将Power BI连接到S3存储桶 [英] Connecting Power BI to S3 Bucket

查看:83
本文介绍了将Power BI连接到S3存储桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Power BI和Redshift的新手,需要一些指导.

Need some guidance as I am new to Power BI and Redshift ..

我的原始JSON数据以.gz文件的形式存储在Amazon S3存储桶中(每个.gz文件都有多行JSON数据)我想将Power BI连接到Amazon s3 Bucket.到目前为止,根据我的研究,我得到了三种方法:

My Raw JSON data is stored in Amazon S3 bucket in the form of .gz files (Each .gz file has multiple rows of JSON data) I wanted to connect Power BI to Amazon s3 Bucket. As of now based on my research I got three ways:

  1. Amazon S3是一项Web服务,并支持REST API.我们可以尝试使用网络数据源获取数据

问题:是否可以解压缩.gz文件(在S3存储桶中或在Power BI中),从S3中提取JSON数据并连接到Power BI

Question: Is it possible to unzip the .gz file (inside the S3 bucket or Inside Power BI), extract JSON data from S3 and connect to Power BI

  1. 将数据从Amazon S3导入到Amazon Redshift.使用SQL工作台在Redshift内部进行所有数据操作.使用Amazon Redshift连接器在Power BI中获取数据

问题1 :Redshift是否允许从S3存储桶中加载.gzzipped JSON数据?如果是,是否可以直接执行?还是必须为其编写任何代码?

Question 1: Does Redshift Allows Loading .gzzipped JSON data from the S3 bucket? If Yes, is it directly possible or do I have to write any code for it?

问题2:我有S3帐户,我是否必须单独购买Redshift帐户/空间?费用是多少?

Question 2: I have the S3 account, do I have to separately purchase Redshift Account/Space? What is the cost?

  1. 通过Azure Data Factory将数据从AWS S3存储桶移动到Azure Data Lake Store,使用Azure Data Lake Analytics(U-SQL)转换数据,然后将数据输出到PowerBI

U-SQL识别文件扩展名为.gz的GZip压缩文件,并在提取过程中自动将其解压缩.如果我的压缩文件包含JSON数据行,此过程是否有效?

U-SQL recognize GZip compressed files with the file extension .gz and automatically decompress them as the part of the Extraction process. Is this process valid, if my gzipped files contain JSON data rows?

如果有其他方法,请允许我,也请您提出宝贵的建议.

Please let me if there is any other method, also your valuable suggestions on this post.

预先感谢.

推荐答案

关于您的第一个问题:我最近也遇到了类似的问题(但是提取了一个csv),我想注册我的解决方案.

About your first Question: I've just faced a similar issue recently (but extracting a csv) and I would like to register my solution.

Power BI仍然没有直接的插件来下载S3存储桶,但是您可以使用python脚本来完成.获取数据->Python脚本

Power BI still don't have a direct plugin to download S3 buckets, but you can do it using a python script. Get data --> Python Script

PS .:确保将boto3和pandas库安装在Power BI选项中告知的Python主目录的同一文件夹(或子文件夹)中,或在Anaconda库文件夹中(c:\ users \ USERNAME \ anaconda3 \ lib \ site-packages).

PS.: make sure that boto3 and pandas libraries are installed in the same folder (or subfolders) of the Python home directory you informed in Power BI options, OR in Anaconda library folder (c:\users\USERNAME\anaconda3\lib\site-packages).

用于Python脚本选项的Power BI窗口

import boto3
import pandas as pd

bucket_name= 'your_bucket'
folder_name= 'the folder inside your bucket/'
file_name = r'file_name.csv'  # or .json in your case
key=folder_name+file_name

s3 = boto3.resource(
    service_name='s3',
    region_name='your_bucket_region',  ## ex: 'us-east-2'
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY
)

obj = s3.Bucket(bucket_name).Object(key).get()
df = pd.read_csv(obj['Body'])   # or pd.read_json(obj['Body']) in your case

   

数据框将作为新查询导入(在本例中为"df")

The dataframe will be imported as a new query( named "df", in this example case)

显然,pandas库也可以获取压缩文件(例如.gz).请参阅以下主题:

Apparently pandas library can also also get a zipped file (.gz for example). See the following topic: How can I read tar.gz file using pandas read_csv with gzip compression option?

这篇关于将Power BI连接到S3存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆