使用Python/Numpy中的单词构建转换矩阵 [英] Building a Transition Matrix using words in Python/Numpy
问题描述
我正在尝试使用此数据构建3x3转换矩阵
Im trying to build a 3x3 transition matrix with this data
days=['rain', 'rain', 'rain', 'clouds', 'rain', 'sun', 'clouds', 'clouds',
'rain', 'sun', 'rain', 'rain', 'clouds', 'clouds', 'sun', 'sun',
'clouds', 'clouds', 'rain', 'clouds', 'sun', 'rain', 'rain', 'sun',
'sun', 'clouds', 'clouds', 'rain', 'rain', 'sun', 'sun', 'rain',
'rain', 'sun', 'clouds', 'clouds', 'sun', 'sun', 'clouds', 'rain',
'rain', 'rain', 'rain', 'sun', 'sun', 'sun', 'sun', 'clouds', 'sun',
'clouds', 'clouds', 'sun', 'clouds', 'rain', 'sun', 'sun', 'sun',
'clouds', 'sun', 'rain', 'sun', 'sun', 'sun', 'sun', 'clouds',
'rain', 'clouds', 'clouds', 'sun', 'sun', 'sun', 'sun', 'sun', 'sun',
'clouds', 'clouds', 'clouds', 'clouds', 'clouds', 'sun', 'rain',
'rain', 'rain', 'clouds', 'sun', 'clouds', 'clouds', 'clouds', 'rain',
'clouds', 'rain', 'sun', 'sun', 'clouds', 'sun', 'sun', 'sun', 'sun',
'sun', 'sun', 'rain']
目前,我使用一些临时词典和一些单独计算每种天气概率的列表来完成此任务.它不是一个漂亮的解决方案.有人可以指导我提供更合理的解决方案吗?
Currently, Im doing it with some temp dictionaries and some list that calculates the probability of each weather separately. Its not a pretty solution. Can someone please guide me with a more reasonable solution to this problem?
self.transitionMatrix=np.zeros((3,3))
#the columns are today
sun_total_count = 0
temp_dict={'sun':0, 'clouds':0, 'rain':0}
total_runs = 0
for (x, y), c in Counter(zip(data, data[1:])).items():
#if column 0 is sun
if x is 'sun':
#find the sum of all the numbers in this column
sun_total_count += c
total_runs += 1
if y is 'sun':
temp_dict['sun'] = c
if y is 'clouds':
temp_dict['clouds'] = c
if y is 'rain':
temp_dict['rain'] = c
if total_runs is 3:
self.transitionMatrix[0][0] = temp_dict['sun']/sun_total_count
self.transitionMatrix[1][0] = temp_dict['clouds']/sun_total_count
self.transitionMatrix[2][0] = temp_dict['rain']/sun_total_count
return self.transitionMatrix
对于每种类型的天气,我都需要计算第二天的概率
for every type of weather I need to calculate the probability for the next day
推荐答案
为此,我喜欢pandas
和itertools
的组合.该代码块比上面的代码块长一点,但是不要将冗长与速度混为一谈. (window
函数应该非常快;大熊猫部分肯定会变慢.)
I like a combination of pandas
and itertools
for this. The code block is a bit longer than the above, but don't conflate verbosity with speed. (The window
func should be very fast; the pandas portion will be slower admittedly.)
首先,创建一个窗口"功能.这是itertools食谱中的一本.这将使您进入过渡元组的一个列表(状态1到状态2).
First, make a "window" function. Here's one from the itertools cookbook. This gets you to a list of tuples of transitions (state1 to state2).
from itertools import islice
def window(seq, n=2):
"Sliding window width n from seq. From old itertools recipes."""
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
# list(window(days))
# [('rain', 'rain'),
# ('rain', 'rain'),
# ('rain', 'clouds'),
# ('clouds', 'rain'),
# ('rain', 'sun'),
# ...
然后使用pandas groupby + value counts操作获取从每个状态1到每个状态2的转换矩阵:
Then use a pandas groupby + value counts operation to get a transition matrix from each state1 to each state2:
import pandas as pd
pairs = pd.DataFrame(window(days), columns=['state1', 'state2'])
counts = pairs.groupby('state1')['state2'].value_counts()
probs = (counts / counts.sum()).unstack()
您的结果如下:
print(probs)
state2 clouds rain sun
state1
clouds 0.13 0.09 0.10
rain 0.06 0.11 0.09
sun 0.13 0.06 0.23
这篇关于使用Python/Numpy中的单词构建转换矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!