从Pandas数据框创建矩阵以显示连通性-2 [英] Creating a matrix from Pandas dataframe to display connectedness - 2

查看:94
本文介绍了从Pandas数据框创建矩阵以显示连通性-2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是对从Pandas数据框创建矩阵以显示连通性.区别在于矩阵.

This is a follow-up question to Creating a matrix from Pandas dataframe to display connectedness. The difference is in the matrix.

我在熊猫数据框中使用这种格式的数据:

I have my data in this format in a pandas dataframe:

Customer_ID  Location_ID
Alpha             A
Alpha             B
Alpha             C
Beta              A
Beta              B
Beta              D

我想研究客户的流动性模式.我的目标是确定客户最常去的位置集群.我认为以下矩阵可以提供此类信息:

I want to study the mobility patterns of the customers. My goal is to determine the clusters of locations that are most frequented by customers. I think the following matrix can provide such information:

   A  B  C  D
A  0  0  0  0
B  2  0  0  0
C  1  1  0  0
D  1  1  0  0

如何在Python中这样做?

How do I do so in Python?

我的数据集非常大(成千上万的客户和大约一百个位置).

My dataset is quite large (hundreds of thousands of customers and about a hundred locations).

推荐答案

为了完整起见,这是我先前回答的修改后的版本.基本上,您在更新矩阵时会添加一个条件:if edge > node:

Just for completeness, here's the modified version of my previous answer. Basically, you add a condition when updating the matrix: if edge > node:

import pandas as pd

#I'm assuming you can get your data into a pandas data frame:
data = {'Customer_ID':[1,1,1,2,2,2],'Location':['A','B','C','A','B','D']}
df = pd.DataFrame(data)

#Initialize an empty matrix
matrix_size = len(df.groupby('Location'))
matrix = [[0 for col in range(matrix_size)] for row in range(matrix_size)]

#To make life easier, I made a map to go from locations 
#to row/col positions in the matrix
location_set = list(set(df['Location'].tolist()))
location_set.sort()
location_map = dict(zip(location_set,range(len(location_set))))

#Group data by customer, and create an adjacency list (dyct) for each
#Update the matrix accordingly
for name,group in df.groupby('Customer_ID'):
    locations = set(group['Location'].tolist())
    dyct = {}
    for i in locations:
        dyct[i] = list(locations.difference(i))

    #Loop through the adjacency list and update matrix
    for node, edges in dyct.items(): 
        for edge in edges:
            #Add this condition to create bottom half of the symmetric matrix
            if edge > node:
                matrix[location_map[edge]][location_map[node]] +=1

这篇关于从Pandas数据框创建矩阵以显示连通性-2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆