将 Power BI 连接到 S3 存储桶 [英] Connecting Power BI to S3 Bucket

查看:58
本文介绍了将 Power BI 连接到 S3 存储桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要一些指导,因为我是 Power BI 和 Redshift 的新手..

Need some guidance as I am new to Power BI and Redshift ..

我的原始 JSON 数据以 .gz 文件的形式存储在 Amazon S3 存储桶中(每个 .gz 文件都有多行 JSON 数据)我想将 Power BI 连接到 Amazon s3 Bucket.截至目前,根据我的研究,我得到了三种方法:

My Raw JSON data is stored in Amazon S3 bucket in the form of .gz files (Each .gz file has multiple rows of JSON data) I wanted to connect Power BI to Amazon s3 Bucket. As of now based on my research I got three ways:

  1. Amazon S3 是一种 Web 服务,支持 REST API.我们可以尝试使用网络数据源来获取数据

问题:是否可以解压缩 .gz 文件(在 S3 存储桶或 Power BI 内部),从 S3 中提取 JSON 数据并连接到 Power BI

Question: Is it possible to unzip the .gz file (inside the S3 bucket or Inside Power BI), extract JSON data from S3 and connect to Power BI

  1. 将数据从 Amazon S3 导入 Amazon Redshift.使用 SQL 工作台在 Redshift 中进行所有数据操作.使用 Amazon Redshift 连接器在 Power BI 中获取数据

问题 1:Redshift 是否允许从 S3 存储桶加载 .gzzzipped JSON 数据?如果是,是直接可能的还是我必须为此编写任何代码?

Question 1: Does Redshift Allows Loading .gzzipped JSON data from the S3 bucket? If Yes, is it directly possible or do I have to write any code for it?

问题 2: 我有 S3 帐户,是否需要单独购买 Redshift 帐户/空间?费用是多少?

Question 2: I have the S3 account, do I have to separately purchase Redshift Account/Space? What is the cost?

  1. 通过 Azure 数据工厂将数据从 AWS S3 存储桶移动到 Azure Data Lake Store,使用 Azure Data Lake Analytics (U-SQL) 转换数据,然后将数据输出到 PowerBI

U-SQL 识别文件扩展名为 .gz 的 GZip 压缩文件,并在提取过程中自动解压缩它们.如果我的 gzip 文件包含 JSON 数据行,此过程是否有效?

U-SQL recognize GZip compressed files with the file extension .gz and automatically decompress them as the part of the Extraction process. Is this process valid, if my gzipped files contain JSON data rows?

如果有其他方法请告诉我,也请您对本帖提出宝贵的建议.

Please let me if there is any other method, also your valuable suggestions on this post.

提前致谢.

推荐答案

关于您的第一个问题:我最近遇到了类似的问题(但提取了一个 csv),我想注册我的解决方案.

About your first Question: I've just faced a similar issue recently (but extracting a csv) and I would like to register my solution.

Power BI 仍然没有用于下载 S3 存储桶的直接插件,但您可以使用 python 脚本来完成.获取数据 -->Python脚本

Power BI still don't have a direct plugin to download S3 buckets, but you can do it using a python script. Get data --> Python Script

PS.:确保 boto3 和 pandas 库安装在您在 Power BI 选项中告知的 Python 主目录的同一文件夹(或子文件夹)中,或在 Anaconda 库文件夹中 (c:usersUSERNAMEanaconda3libsite-packages).

PS.: make sure that boto3 and pandas libraries are installed in the same folder (or subfolders) of the Python home directory you informed in Power BI options, OR in Anaconda library folder (c:usersUSERNAMEanaconda3libsite-packages).

Python 脚本选项的 Power BI 窗口

import boto3
import pandas as pd

bucket_name= 'your_bucket'
folder_name= 'the folder inside your bucket/'
file_name = r'file_name.csv'  # or .json in your case
key=folder_name+file_name

s3 = boto3.resource(
    service_name='s3',
    region_name='your_bucket_region',  ## ex: 'us-east-2'
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY
)

obj = s3.Bucket(bucket_name).Object(key).get()
df = pd.read_csv(obj['Body'])   # or pd.read_json(obj['Body']) in your case

   

数据框将作为新查询导入(在本例中名为df")

The dataframe will be imported as a new query( named "df", in this example case)

显然 pandas 库也可以获取压缩文件(例如 .gz).请参阅以下主题:如何使用带有 gzip 压缩选项的 pandas read_csv 读取 tar.gz 文件?

Apparently pandas library can also also get a zipped file (.gz for example). See the following topic: How can I read tar.gz file using pandas read_csv with gzip compression option?

这篇关于将 Power BI 连接到 S3 存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆