确定数据框中元素的位置 [英] Determine Position of Element in Dataframe

查看:46
本文介绍了确定数据框中元素的位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个返回数据帧中元素位置的函数.- 数据框中的值存在重复项- 数据框约 10*2000- 该函数将使用 applymap() 应用于数据帧

# 初始数据框df = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})

<块引用>

示例:

get_position(2) 不清楚,因为它可能是R1"或R2".我是想知道是否有另一种方式让 python 知道哪个位置元素保持 - 可能在 applymap() 操作期间

df.rank(axis=1,pct=True)

#intial 数据框df_initial = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})

step1)

df_rank = df_initial.rank(axis=1,pct=True)

step2)

#根据各自值的百分比构建组定义函数103(x):如果 0.0 <= x <= 0.1:P1.append(get_column_name1(x))返回 xelif 0.1 

step3)

# 尝试获取相应值的列名称# 我的想法是确定每个值的位置,然后编写一个函数def get_column_name1(x)#返回值列名称

步骤 4)

#应用函数P1=[]P2=[]P3=[]P4=[]P5=[]P6=[]P7=[]P8=[]P9=[]P10=[]P11=[]df_rank.applymap(function103).head()

解决方案

如果需要在 DataFrame 中按值索引或列名,请使用 numpy.where 用于位置,然后选择转换为 numpy 数组的所有索引或列值:

df = pd.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})i, c = np.where(df == 2)打印 (i, c)[0 1] [1 0]打印(df.index.values[i])[0 1]打印(df.columns.values[c])['R2''R1']

i, c = np.where(df == 2)df1 = df.rank(axis=1,pct=True)打印 (df1)R1 R2 R30 1.000000 0.666667 0.3333331 0.333333 0.666667 1.0000002 0.666667 1.000000 0.333333打印 (df1.iloc[i, c])R2 R10 0.666667 1.0000001 0.666667 0.333333打印 (df1.where(df == 2).dropna(how='all').dropna(how='all',axis=1))R1 R20 南 0.6666671 0.333333 南

或者:

out = df1.stack()[df.stack() == 2].rename_axis(('idx','cols')).reset_index(name='val')打印)idx 列值0 0 R2 0.6666671 1 R1 0.333333

您的函数的解决方案 - 需要通过 reshape 创建的一列 DataFrame 进行迭代并提取 Series.name,与列名相同:

def get_column_name1(x):返回 x.name

<小时>

P1=[]P2=[]P3=[]P4=[]P5=[]P6=[]P7=[]P8=[]P9=[]P10=[]P11=[]定义函数103(x):如果 0.0 <= x[0] <= 0.1:P1.append(get_column_name1(x))返回 xelif 0.1 

<小时>

a = df_rank.stack().reset_index(level=0, drop=True).to_frame().apply(function103, axis=1)

<小时>

打印 (P4)['R3','R1','R3']

Im searching for a function that Returns the Position of an element in a dataframe. - there is duplicates in the dataframe amongst the values - dataframe About 10*2000 - the function will be applied on a dataframe using applymap()

# initial dataframe

    df = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})

Example:

get_position(2) is not clear as it could be either "R1" or "R2". I am wondering if there is another way that python knows which Position the element holds - possibly during the applymap() Operation

Edit:

df.rank(axis=1,pct=True)

EDIT2:

#intial dataframe

df_initial = pandas.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})

step1)

df_rank = df_initial.rank(axis=1,pct=True)

step2)

# Building Groups based on the percentage of the respective value

    def function103(x):

        if 0.0 <= x <= 0.1:
            P1.append(get_column_name1(x))
            return x
        elif 0.1 < x <= 0.2:
            P2.append(get_column_name1(x))
            return x
        elif 0.2 < x <= 0.3:
            P3.append(get_column_name1(x))
            return x
        elif 0.3 < x <= 0.4:
            P4.append(get_column_name1(x))
            return x
        elif 0.4 < x <= 0.5:
            P5.append(get_column_name1(x))
            return x
        elif 0.5 < x <= 0.6:
            P6.append(get_column_name1(x))
            return x
        elif 0.6 < x <= 0.7:
            P7.append(get_column_name1(x))
            return x
        elif 0.7 < x <= 0.8:
            P8.append(get_column_name1(x))
            return x
        elif 0.8 < x <= 0.9:
            P9.append(get_column_name1(x))
            return x
        elif 0.9 < x <= 1.0:
            P10.append(get_column_name1(x))
            return x
        else:
            return x

step3)

# trying to get the columns Name of the the respective value
# my idea was to determine the Position of each value to then write a function

    def get_column_name1(x)

#to return the values column Name

step 4)

# apply the function

P1=[]
P2=[]
P3=[]
P4=[]
P5=[]
P6=[]
P7=[]
P8=[]
P9=[]
P10=[]
P11=[]
df_rank.applymap(function103).head()

解决方案

If need index or columns names by value in DataFrame use numpy.where for positions and then select all index or columns values converted to numpy array:

df = pd.DataFrame({"R1": [8,2,3], "R2": [2,3,4], "R3": [-3,4,-1]})

i, c = np.where(df == 2)
print (i, c)
[0 1] [1 0]

print (df.index.values[i])
[0 1]

print (df.columns.values[c])
['R2' 'R1']

EDIT:

i, c = np.where(df == 2)

df1 = df.rank(axis=1,pct=True)
print (df1)
         R1        R2        R3
0  1.000000  0.666667  0.333333
1  0.333333  0.666667  1.000000
2  0.666667  1.000000  0.333333

print (df1.iloc[i, c])
         R2        R1
0  0.666667  1.000000
1  0.666667  0.333333

print (df1.where(df == 2).dropna(how='all').dropna(how='all', axis=1))
         R1        R2
0       NaN  0.666667
1  0.333333       NaN

Or:

out = df1.stack()[df.stack() == 2].rename_axis(('idx','cols')).reset_index(name='val')
print (out)
   idx cols       val
0    0   R2  0.666667
1    1   R1  0.333333

EDIT:

Solution for your function - need iterate by one column DataFrame created by reshape and extract Series.name, what is same like column name:

def get_column_name1(x):
    return x.name


P1=[]
P2=[]
P3=[]
P4=[]
P5=[]
P6=[]
P7=[]
P8=[]
P9=[]
P10=[]
P11=[]

def function103(x):

    if 0.0 <= x[0] <= 0.1:
        P1.append(get_column_name1(x))
        return x
    elif 0.1 < x[0] <= 0.2:
        P2.append(get_column_name1(x))
        return x
    elif 0.2 < x[0] <= 0.3:
        P3.append(get_column_name1(x))
        return x
    elif 0.3 < x[0] <= 0.4:
        P4.append(get_column_name1(x))
        return x
    elif 0.4 < x[0] <= 0.5:
        P5.append(get_column_name1(x))
        return x
    elif 0.5 < x[0] <= 0.6:
        P6.append(get_column_name1(x))
        return x
    elif 0.6 < x[0] <= 0.7:
        P7.append(get_column_name1(x))
        return x
    elif 0.7 < x[0] <= 0.8:
        P8.append(get_column_name1(x))
        return x
    elif 0.8 < x[0] <= 0.9:
        P9.append(get_column_name1(x))
        return x
    elif 0.9 < x[0] <= 1.0:
        P10.append(get_column_name1(x))
        return x
    else:
        return x


a = df_rank.stack().reset_index(level=0, drop=True).to_frame().apply(function103, axis=1)


print (P4)
['R3', 'R1', 'R3']

这篇关于确定数据框中元素的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆