Power BI中的Python可视化中时间序列的最佳数据格式是什么? [英] What is the best data format for a time series in a Python Visualization in Power BI?
问题描述
截至2018年8月9日,Power BI支持Python可视化。他们以前曾支持R可视化,但是我仍然觉得这些集成有点尴尬。让我告诉你我的意思:
假设您有一个包含时间序列数据的表,其中第一行包含名称 Date和 Value,内容分别是格式为yyyy-mm-dd的日期和数字:
Date,Value
2017-01-12,1
2017-01-13,4
2017-01-14,2
2017-01-15, 4
2017-01-16,2
2017-01-17,2
2017-01-18,2
2017-01-19,5
2017 -01-20,5
2017-01-21,5
2017-01-22,5
2017-01-23,6
2017-01-24,3
2017-01-25,6
2017-01-26,6
2017-01-27,5
2017-01-28,8
2017- 01-29,4
2017-01-30,2
如果存储该数据集作为文本文件,如 timerseries.csv
并使用导入数据| Text / CSV ,您会在 VISUALIZATIONS |下找到表格FIELDS ,如下所示:
您可以使用 VISUALIZATIONS检查表|表并获取:
通过此设置,您应该认为自己已经准备好使用此美丽的新功能来释放 Py VISUALIZATION 的功能:
如果单击该按钮,您将获得以下信息:
< a href = https://i.stack.imgur.com/PRC28.png rel = nofollow noreferrer>
然后您会被告知
将字段拖放到可视化窗格的值区域中以开始
脚本编写
如果以<$开头c $ c>值,您将在编辑器中获得此默认设置:
并且如果您遵循Power BI团队在
但这对我来说在这里就结束了。
如果编辑器中的默认数据框具有标准数据框的功能,则您应该能够引用该数据框中的一列,并使用此代码段轻松进行打印:
将matplotlib.pyplot导入为plt
plt.plot(dataset ['Value'])
plt.show()
但是当您运行它时,它会返回错误:
细节至少可以说得很详尽。
ve还尝试导入日期
和值
,并且我尝试使用<$ c $直接绘制数据框c> dataset.plot(),但似乎没有任何效果。我还尝试过通过这种方式将日期层次结构剥离为简单日期:
那么,关于数据格式,导入方法和/或代码片段的任何想法吗?
谢谢您的任何建议!
编辑1-根据Foxan Ng的回答:
在值字段中添加两列:
这仍然会返回带有以下内容的错误:
TypeError:from_bounds()接受4个位置参数,但给出了6个
我没有遇到您提到的错误。您是否在两列中都放入了值
?
import matplotlib.pyplot as plt
plt.plot(dataset ['Date'],dataset ['Value'])
plt.show()
已用M更新查询:
let
源= Csv.Document(File.Contents( C:\您的目录。 .\timerseries.csv),[Delimiter =,,Columns = 2,Encoding = 1252,QuoteStyle = QuoteStyle.None]),
# Promoted Headers = Table.PromoteHeaders(Source,[PromoteAllScalars = true]),
#更改的类型 = Table.TransformColumnTypes(#提升的标题,{{日期,类型日期},{值,Int64.Type}})
在
#更改的类型
As of today, August 9 2018, Power BI supports Python Visualizations. They've had support for R Visualizations before, but I still find these integrations to be a bit awkward. Let me show you what I mean:
Let's say that you have a table with time series data, where the top row containts the names 'Date' and 'Value', and the contents are dates of the form yyyy-mm-dd and a number, respectively:
Date,Value
2017-01-12,1
2017-01-13,4
2017-01-14,2
2017-01-15,4
2017-01-16,2
2017-01-17,2
2017-01-18,2
2017-01-19,5
2017-01-20,5
2017-01-21,5
2017-01-22,5
2017-01-23,6
2017-01-24,3
2017-01-25,6
2017-01-26,6
2017-01-27,5
2017-01-28,8
2017-01-29,4
2017-01-30,2
If you store that dataset as a textfile like timerseries.csv
and import it using Get Data | Text/CSV, you get a table uner VISUALIZATIONS | FIELDS, like this:
You can inspect your table using VISUALIZATIONS | Table and get:
With this setup, one should think that you were all set for unleashing the power of a Py VISUALIZATION using this beautiful new feature:
If you click that, you get this:
And you're told to
Drag fields into the Values area in the Visualization pane to start scripting
If you start with Value
, you get this default setup in the editor:
And if you follow the instructions given by the Power BI team in the August 2018 feature summary you should be able to make a matplotlib plot quite easily.
But this is where it ends for me at the time being.
If the default dataframe in the editor shares the features of a standard dataframe, you should be able to reference a column in that dataframe and easily make a plot with this snippet:
import matplotlib.pyplot as plt
plt.plot(dataset['Value'])
plt.show()
But when you run it, it onlu returns an error:
And the details are elaborate to say the least.
I've also tried to import both Dates
and Values
, and I've tried plotting the dataframe directly with dataset.plot()
, but nothing seems to be working. I've also tried stripping the date hierarchy down to simple dates this way:
So, any ideas on the dataformat, import method and/or the snippet?
Thank you for any suggestions!
EDIT 1 - Following the answer from Foxan Ng:
Add both columns in the Value field:
This still returns an error edning with:
TypeError: from_bounds() takes 4 positional arguments but 6 were given
I didn't encounter errors that you've mentioned. Have you dropped in both columns into Values
?
import matplotlib.pyplot as plt
plt.plot(dataset['Date'], dataset['Value'])
plt.show()
UPDATED with M query:
let
Source = Csv.Document(File.Contents("C:\your-directory..\timerseries.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}, {"Value", Int64.Type}})
in
#"Changed Type"
这篇关于Power BI中的Python可视化中时间序列的最佳数据格式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!