根据索引AND列名逐个单元格填充整个数据帧? [英] fill in entire dataframe cell by cell based on index AND column names?

查看:54
本文介绍了根据索引AND列名逐个单元格填充整个数据帧?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中行索引和列标题应确定每个单元格的内容.我正在使用以下df的更大版本:

I have a dataframe where the row indices and column headings should determine the content of each cell. I'm working with a much larger version of the following df:

df = pd.DataFrame(index = ['afghijklde', 'afghijklmde', 'ade', 'afghilmde', 'amde'], 
                  columns = ['ae', 'azde', 'afgle', 'arlde', 'afghijklbcmde'])

具体来说,我想应用自定义函数edit_distance()或等效功能(请参见此处 (用于功能代码)计算两个字符串之间的差异得分.两个输入是行名和列名.以下方法可以工作,但是非常慢:

Specifically, I want to apply the custom function edit_distance() or equivalent (see here for function code) which calculates a difference score between two strings. The two inputs are the row and column names. The following works but is extremely slow:

for seq in df.index:
    for seq2 in df.columns:
        df.loc[seq, seq2] = edit_distance(seq, seq2) 

这会产生我想要的结果:

This produces the result I want:

            ae  azde    afgle   arlde   afghijklbcmde
afghijklde  8    7        5       6          3
afghijklmde 9    8        6       7          2
ade         1    1        3       2          10
afghilmde   7    6        4       5          4
amde        2    1        3       2          9

执行此操作的更好方法是什么(也许使用applymap()?).我尝试使用applymap()applydf.iterrows()进行的所有操作均返回了AttributeError: "'float' object has no attribute 'index'"类型的错误.谢谢.

What is a better way to do this, perhaps using applymap() ?. Everything I've tried with applymap() or apply or df.iterrows() has returned errors of the kind AttributeError: "'float' object has no attribute 'index'" . Thanks.

推荐答案

结果证明,还有一种更好的方法.上面的onepan词典理解答案很好,但是以随机顺序返回df索引和列.使用嵌套的.apply()可以以大约相同的速度完成相同的操作,并且不会更改行/列的顺序.关键是不要挂断先命名df的行和列,然后再填充值.取而代之的是,先将未来的索引和列视为独立的熊猫系列.

Turns out there's an even better way to do this. onepan's dictionary comprehension answer above is good but returns the df index and columns in random order. Using a nested .apply() accomplishes the same thing at about the same speed and doesn't change the row/column order. The key is to not get hung up on naming the df's rows and columns first and filling in the values second. Instead, do it the other way around, initially treating the future index and columns as standalone pandas Series.

series_rows = pd.Series(['afghijklde', 'afghijklmde', 'ade', 'afghilmde', 'amde'])
series_cols = pd.Series(['ae', 'azde', 'afgle', 'arlde', 'afghijklbcmde'])

df = pd.DataFrame(series_rows.apply(lambda x: series_cols.apply(lambda y: edit_distance(x, y))))
df.index = series_rows
df.columns = series_cols

这篇关于根据索引AND列名逐个单元格填充整个数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆