如何使用Python在PowerBI中制作可重现的数据样本? [英] How to make a reproducible data sample in PowerBI using Python?
问题描述
这是一个自我回答的帖子。为什么?因为缺少数据样本,所以Power BI中的许多问题都无法回答。此外,许多人似乎想知道如何使用Python在Power BI中编辑数据表。当然,全世界都需要在Power BI中更广泛地使用Python。有人认为您必须将Python代码段应用于其他位置加载的现有表。我对这篇文章的回答将向您展示如何在原本为空的Power BI文件中用几行代码构建一个(相当大的)数据样本。
因此,如何在Power BI中使用Python构建数据样本并对其进行更改?
I将向您展示如何构建包含分类和数值的 10000
行的数据集。我正在使用Python库
现在,使用 Transform>运行Python脚本
,在上面插入代码段,然后单击确定
来获取此代码:
您现在有了一个包含2列3行的初步表。这是在Power BI中实现Python的相当整洁的细节。运行代码段后,您可以使用以下三种不同的数据集。 数据集
是默认构造的,但由于我们是从一个空表开始的,所以它是空的。如果我们从其他数据开始,则运行Python脚本
的第一行说明了此表的用途。#'dataset'保存输入数据为此脚本
。它以熊猫数据框的形式构造。最后一个表 df_metadata
只是我们真正感兴趣的数据集的简短描述: df_dataset
,但是我ve已将其添加到组合中,以说明您在代码段中创建的所有数据框将对您可用。您可以通过单击名称旁边的表
选择要继续处理的表。
就是这样!现在,您有了一个混合数据类型表,可以继续使用Python或Power BI本身进行工作:
在这里您可以:
- 继续努力使用任何菜单选项的表格
- 插入另一个Python脚本
- 复制原始数据框并通过创建
继续使用其他版本通过右键单击
:查询
下的表
引用
This is a self-answered post. Why? Because many questions in Power BI go unanswered because of lacking data samples. Also, many seem to wonder how to edit data tables in Power BI using Python. And, of course, the world needs a more wide-spread usage of Python in Power BI. Some think that you have to apply a Python snippet to an existing table loaded elsewhere. My answer to this post will show you how to build a (fairly big) data sample with a few lines of code in an otherwise empty Power BI file.
So, how can you build a data sample and make changes to it using Python in Power BI?
I'll show you how to build a dataset of 10000
rows that contains both categorical and numerical values. I'm using the Python libraries numpy and pandas for the data generation and table operations, respectively. The snippet below simply draws a random element from two lists 10000
times to build two columns with a few street and city names, and adds a list of random numbers into the mix. Then I'm using pandas to organize the data in a dataframe. Using Python in the Power BI Power Query Editor
, your input has to be a table, and your output has to be a pandas dataframe.
Python snippet:
import numpy as np
import pandas as pd
np.random.seed(123)
streets=['Broadway', 'Bowery', 'Houston Street']
cities=['New York', 'Chicago', 'Baltimore']
rows = 1000
lst_cities=np.random.choice(cities,rows).tolist()
lst_streets=np.random.choice(streets,rows).tolist()
lst_numbers= np.random.randint(low=0, high=100, size=rows).tolist()
df_dataset=pd.DataFrame({'City':lst_cities,
'Street':lst_streets,
'ID':lst_numbers})
df_metadata = pd.DataFrame([df_dataset.shape])
Power BI:
In Power BI Desktop, click Enter Data
to go to the Power Query Editor
. In the following dialog window, do absolutely nothing but clicking OK
. The result is an empty table and two steps under Applied steps
:
Now, use Transform > Run Python Script
, insert the snippet above and click OK
to get this:
You now have a preliminary table with 2 columns and 3 rows. And this is a pretty neat detail of the implementation of Python in Power BI. These are three different datasets that are made available to you after running your snippet. Dataset
is constructed by default, but is empty since we started out with an empty table. If we started out with some other data, the first line of the Run Python Script
explains the purpose of this table # 'dataset' holds the input data for this script
. And it is constructed in the form of a pandas dataframe. The last table df_metadata
is only a brief description of the dataset we're really interested in: df_dataset
, but I've added it to the mix in order to illustrate that all dataframes made by you in your snippet will be available to you. You chose which table to continue working on by clicking Table
next to the name.
And that's it! You now have a table of mixed datatypes to keep working on either using Python or Power BI itself:
From here you can:
- Keep working on your table using any menu option
- Insert another Python script
- Duplicate your original dataframe and keep working on another version by creating a
Reference
by right-clickingTable
underQueries
:
这篇关于如何使用Python在PowerBI中制作可重现的数据样本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!