在 Pandas DataFrame 中将文本(带有名称和值)列拆分为多列 [英] Split a text(with names and values) column into multiple columns in Pandas DataFrame
本文介绍了在 Pandas DataFrame 中将文本(带有名称和值)列拆分为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的算法速度有问题,太慢了.我有一个大数据框,想根据其他名称和值创建列.我正在寻找可能在 Pandas 中的解决方案.在运行之前,我不知道未来列的大小.这是一个简单的架构.
I have problem with speed of my algorithm, is too slow. I have a big dataframe and wanna create columns depends on the name and value in other. I am looking for a solution maybe in Pandas. Before running I don't know the size of the future columns. Here is a simple schema.
"column"<==>"value"<br>"column"<==> "value"<br>...
我的数据框
id | params |
---|-----------------
0 |currency<=>PLN<br>price<=>72.14<br>city<==>Berlin
---|-----------------
1 |price<=>90<br>area<=>72.14<br>city<==>San Francisco<br>rooms<==>2<br>is_Free<==>1
---|-----------------
我想要这样的东西
id | price | currency | city | rooms | is_Free| area|
---|------ |----------|--------------|-------|--------|------
0| 72.14 | PLN | Berlin | NaN | NaN | NaN|
---|-------|----------|--------------|-------|--------|------
1| 90 | NaN | San Francisco| 2 | 1 | 90 |
我的解决方案:
def add_parameters(df):
for i,row in df.iterrows():
parameters_list = row.params.split("<br>")
for parameter in parameters_list:
elem_list = parameter.split("<=>")
if elem_list[0] and elem_list[1] != '':
df.loc[i, elem_list[0]] = elem_list[1]
return df
谢谢
推荐答案
这是解决问题的一种方式.
This is one way of approaching the problem.
import re
# handle multiple seperator.
sep = re.compile(r"(<.*>)")
def split(value):
ret = {}
for s in value.split("<br>"):
# search if seperator exists in the string & split based on sep.
if sep.search(s):
split_ = s.split(sep.search(s).group())
ret[split_[0]] = split_[1]
return ret
print(df['params'].apply(lambda x : split(x)).apply(pd.Series))
输出
currency price city area rooms is_Free
0 PLN 72.14 Berlin NaN NaN NaN
1 NaN 90 San Francisco 72.14 2 1
这篇关于在 Pandas DataFrame 中将文本(带有名称和值)列拆分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文