尝试映射具有重复值的Series时发生InvalidIndexError [英] InvalidIndexError when trying to map Series with duplicate values

查看:155
本文介绍了尝试映射具有重复值的Series时发生InvalidIndexError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将医院名称映射到其英国邮政编码.我在那些医院有脊柱外科手术的csv(在英国被称为信任"),csv是kate_spine.csv

I am trying to map the names of hospitals to their UK postcodes. I have a csv of spine surgery in those hospitals (know as 'Trusts' in the UK), the csv is kate_spine.csv

我正在从中导入一列(信任")以简化操作.

I am importing one column from it (Trust) to simplify things.

import pandas as pd
spine = pd.read_csv('~/Dropbox/Work/NNAP/Spine/Kate_W/kate_spine2.csv', usecols = ['Trust'])

显示导入:

spine.head()


Trust
0   THE WALTON CENTRE NHS FOUNDATION TRUST
1   CAMBRIDGE UNIVERSITY HOSPITALS NHS FOUNDATION ...
2   KING'S COLLEGE HOSPITAL NHS FOUNDATION TRUST
3   LEEDS TEACHING HOSPITALS NHS TRUST
4   NT424

这些是信任名称,并具有索引. 我的邮政编码位于csv all_all.csv中.我将文件导入为一列,为了简化也信任". 下面的表格格式很差,但其中有邮政编码.

These are the trust names and have an index. My postcodes are in the csv all_all.csv. I am importing the file as one column, also 'Trust' to simplify. The format of the table is poor below but the postcodes are there.

postcodes_all = pd.read_csv('all_all.csv', index_col = 'Trust')
postcodes_all.head()

    Unnamed: 0  postcode
Trust       
MANCHESTER UNIVERSITY NHS FOUNDATION TRUST  0   M13 9WL
SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST  1   SR4 7TP
WORCESTERSHIRE HEALTH AND CARE NHS TRUST    2   WR5 1JR
SOLENT NHS TRUST    3   SO19 8BR
SHROPSHIRE COMMUNITY HEALTH NHS TRUST   4   SY3 8XL

我正在尝试使用map从14,000个csv中获取大约200个代码.这是我的代码:

I am trying use map to get about 200 codes from a csv of 14,000. Here's my code:

spine['Trust'].map(postcodes_all['postcode'])

和错误:

InvalidIndexError                         Traceback (most recent call last)
<ipython-input-6-25212fe14f16> in <module>
----> 1 spine['Trust'].map(postcodes_all['postcode'])

~/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in map(self, arg, na_action)
   3826         dtype: object
   3827         """
-> 3828         new_values = super()._map_values(arg, na_action=na_action)
   3829         return self._constructor(new_values, index=self.index).__finalize__(self)
   3830 

~/anaconda3/lib/python3.7/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action)
   1275                 values = self.values
   1276 
-> 1277             indexer = mapper.index.get_indexer(values)
   1278             new_values = algorithms.take_1d(mapper._values, indexer)
   1279 

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2983         if not self.is_unique:
   2984             raise InvalidIndexError(
-> 2985                 "Reindexing only valid with uniquely" " valued Index objects"
   2986             )
   2987 

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

信任"列中的主干文件确实包含重复的值,因为每一行都描述了信任"中各个医生的外科手术活动,并且该系列中最多有10位医生(因此,有10个重复的信任"名称).我想到在提取唯一的Trust名称后尝试这样做.理想情况下,尽管我希望能够将它复制到该系列中.

The spine file in the Trust column does contain duplicate values as each row describes the individual doctors surgical activity within the Trust and there will be up to 10 doctors (therefore 10 duplicate Trust names) in the series. I thought of trying this after extracting unique Trust names. Ideally though I would like to be able to do it to the series with its duplicates.

推荐答案

信任"列中的主干文件确实包含重复的值,因为每一行都描述了信任"中各个医生的外科手术活动,并且该系列中最多有10位医生(因此,有10个重复的信任"名称).

The spine file in the Trust column does contain duplicate values as each row describes the individual doctors surgical activity within the Trust and there will be up to 10 doctors (therefore 10 duplicate Trust names) in the series.

这就是问题所在.当索引重复时,pandas不知道使用哪个值.请参见下面的示例.

That's the issue. pandas does not know which value to use when there index duplicates. See the example below.

import pandas as pd

s = pd.Series(['cat', 'dog', 'rabbit', 'cat'])
s

## Out
0       cat
1       dog
2    rabbit
3       cat
dtype: object

s2 = pd.Series(['carnivore', 'omnivore', 'herbivore', 'carnivore'])
# Set the value of `s` as the index of `s2`, since map looks at the Series index.
s2.index = s
s2

## Out
cat       carnivore
dog        omnivore
rabbit    herbivore
cat       carnivore
dtype: object


由于在s2的索引中出现了两次cat,因此大熊猫不知道将s2映射到s时要使用哪个值(可以说是一对一的)将动物映射到猫的喂养行为).因此,立即尝试使用map会抛出InvalidIndexError:


Since there are two occurrences of cat in the index of s2, pandas does not know which of their values to use when mapping s2 to s (you can say that there is a one to two mapping of animal to feeding behavior for cat). Therefore, trying to use map now will throw InvalidIndexError:

s.map(s2)

## Out
---------------------------------------------------------------------------

InvalidIndexError                         Traceback (most recent call last)

<ipython-input-43-1950a0742767> in <module>()
----> 1 s.map(s2)


~/miniconda3/envs/ds/lib/python3.7/site-packages/pandas/core/series.py in map(self, arg, na_action)
   3826         dtype: object
   3827         """
-> 3828         new_values = super()._map_values(arg, na_action=na_action)
   3829         return self._constructor(new_values, index=self.index).__finalize__(self)
   3830 


~/miniconda3/envs/ds/lib/python3.7/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action)
   1275                 values = self.values
   1276 
-> 1277             indexer = mapper.index.get_indexer(values)
   1278             new_values = algorithms.take_1d(mapper._values, indexer)
   1279 


~/miniconda3/envs/ds/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2983         if not self.is_unique:
   2984             raise InvalidIndexError(
-> 2985                 "Reindexing only valid with uniquely" " valued Index objects"
   2986             )
   2987 


InvalidIndexError: Reindexing only valid with uniquely valued Index objects


您将需要检查重复的值并决定使用哪个值.您可以这样做:


You will need to check the duplicate values and decide which one to use. You can do it like this:

s2[s2.index.duplicated(keep=False)]

## Out
cat    carnivore
cat    carnivore
dtype: object

在这种情况下,cat的两个值都相同,我们可以摆脱其中的任何一个(您的描述表示在您的情况下是相同的).如果它们不同,则必须选择保留哪个.

In this case, both values of cat are the same and we can get rid of either one (which you description indicates is the same in your case). If they were different, you would have to choose which one to keep.

# `~` negates/inverses the indexing
s2 = s2[~s2.index.duplicated()]
s2

## Out
cat       carnivore
dog        omnivore
rabbit    herbivore
dtype: object

s2现在具有从动物到喂养行为的一对一映射,我们可以安全地将s2映射到s.

s2 now has a one to one mapping of animal to feeding behavior and we can safely map s2 onto s.

s.map(s2)

## Out
0    carnivore
1     omnivore
2    herbivore
3    carnivore
dtype: object

这篇关于尝试映射具有重复值的Series时发生InvalidIndexError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆