使用Python/Numpy中的单词构建转换矩阵 [英] Building a Transition Matrix using words in Python/Numpy

查看:135
本文介绍了使用Python/Numpy中的单词构建转换矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用此数据构建3x3转换矩阵

Im trying to build a 3x3 transition matrix with this data

days=['rain', 'rain', 'rain', 'clouds', 'rain', 'sun', 'clouds', 'clouds', 
  'rain', 'sun', 'rain', 'rain', 'clouds', 'clouds', 'sun', 'sun', 
  'clouds', 'clouds', 'rain', 'clouds', 'sun', 'rain', 'rain', 'sun',
  'sun', 'clouds', 'clouds', 'rain', 'rain', 'sun', 'sun', 'rain', 
  'rain', 'sun', 'clouds', 'clouds', 'sun', 'sun', 'clouds', 'rain', 
  'rain', 'rain', 'rain', 'sun', 'sun', 'sun', 'sun', 'clouds', 'sun', 
  'clouds', 'clouds', 'sun', 'clouds', 'rain', 'sun', 'sun', 'sun', 
  'clouds', 'sun', 'rain', 'sun', 'sun', 'sun', 'sun', 'clouds', 
  'rain', 'clouds', 'clouds', 'sun', 'sun', 'sun', 'sun', 'sun', 'sun', 
  'clouds', 'clouds', 'clouds', 'clouds', 'clouds', 'sun', 'rain', 
  'rain', 'rain', 'clouds', 'sun', 'clouds', 'clouds', 'clouds', 'rain', 
  'clouds', 'rain', 'sun', 'sun', 'clouds', 'sun', 'sun', 'sun', 'sun',
  'sun', 'sun', 'rain']

目前,我使用一些临时词典和一些单独计算每种天气概率的列表来完成此任务.它不是一个漂亮的解决方案.有人可以指导我提供更合理的解决方案吗?

Currently, Im doing it with some temp dictionaries and some list that calculates the probability of each weather separately. Its not a pretty solution. Can someone please guide me with a more reasonable solution to this problem?

self.transitionMatrix=np.zeros((3,3))

#the columns are today
sun_total_count = 0
temp_dict={'sun':0, 'clouds':0, 'rain':0}
total_runs = 0
for (x, y), c in Counter(zip(data, data[1:])).items():
    #if column 0 is sun
    if x is 'sun':
        #find the sum of all the numbers in this column
        sun_total_count +=  c
        total_runs += 1
        if y is 'sun':
            temp_dict['sun'] = c
        if y is 'clouds':
            temp_dict['clouds'] = c
        if y is 'rain':
            temp_dict['rain'] = c

        if total_runs is 3:
            self.transitionMatrix[0][0] = temp_dict['sun']/sun_total_count
            self.transitionMatrix[1][0] = temp_dict['clouds']/sun_total_count
            self.transitionMatrix[2][0] = temp_dict['rain']/sun_total_count

return self.transitionMatrix

对于每种类型的天气,我都需要计算第二天的概率

for every type of weather I need to calculate the probability for the next day

推荐答案

为此,我喜欢pandasitertools的组合.该代码块比上面的代码块长一点,但是不要将冗长与速度混为一谈. (window函数应该非常快;大熊猫部分肯定会变慢.)

I like a combination of pandas and itertools for this. The code block is a bit longer than the above, but don't conflate verbosity with speed. (The window func should be very fast; the pandas portion will be slower admittedly.)

首先,创建一个窗口"功能.这是itertools食谱中的一本.这将使您进入过渡元组的一个列表(状态1到状态2).

First, make a "window" function. Here's one from the itertools cookbook. This gets you to a list of tuples of transitions (state1 to state2).

from itertools import islice

def window(seq, n=2):
    "Sliding window width n from seq.  From old itertools recipes."""
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

# list(window(days))
# [('rain', 'rain'),
#  ('rain', 'rain'),
#  ('rain', 'clouds'),
#  ('clouds', 'rain'),
#  ('rain', 'sun'),
# ...

然后使用pandas groupby + value counts操作获取从每个状态1到每个状态2的转换矩阵:

Then use a pandas groupby + value counts operation to get a transition matrix from each state1 to each state2:

import pandas as pd

pairs = pd.DataFrame(window(days), columns=['state1', 'state2'])
counts = pairs.groupby('state1')['state2'].value_counts()
probs = (counts / counts.sum()).unstack()

您的结果如下:

print(probs)
state2  clouds  rain   sun
state1                    
clouds    0.13  0.09  0.10
rain      0.06  0.11  0.09
sun       0.13  0.06  0.23

这篇关于使用Python/Numpy中的单词构建转换矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆