与单个列相比,跨数据框一次剥离空间的优雅方法 [英] Elegant way to strip spaces at once across dataframe than individual columns

查看:48
本文介绍了与单个列相比,跨数据框一次剥离空间的优雅方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框hash_file,它有两列VARIABLEconcept_id.

I have a dataframe hash_file and it has two columns VARIABLE and concept_id.

hash_file = pd.DataFrame({'VARIABLE':['Tes ','Exam ','Evaluation '],'concept_id': [1,2,3]})

要在这两列的值中去除空格,我使用下面的代码

To strip spaces in the values of these two columns, I use the below code

hash_file['VARIABLE']=hash_file['VARIABLE'].astype(str).str.strip()
hash_file['concept_id']=hash_file['concept_id'].astype(str).str.strip()

尽管这种方法很好用,但我不能使用这种方法,因为我的实际数据框有150多个列.

Though this works fine, I can't use this approach because my real dataframe has more than 150 columns.

无论如何,是否要一次从所有列及其值中删除空格?像一行吗?

Is there anyway to strip spaces from all the column and its values at once? Like in one line?

更新屏幕截图

推荐答案

通过 Series.str.strip . DataFrame.apply.html"rel =" nofollow noreferrer> DataFrame.apply :

cols = hash_file.select_dtypes(object).columns
hash_file[cols] = hash_file[cols].apply(lambda x: x.str.strip())

如果字符串中没有缺失值:

If no missing values in strings:

cols = hash_file.select_dtypes(object).columns
hash_file[cols] = hash_file[cols].applymap(lambda x: x.strip())

性能:

[9000 rows x 150 columns] (50% strings columns)


hash_file = pd.DataFrame({'VARIABLE':['Tes ','Exam ','Evaluation '],'concept_id': [1,2,3]})
hash_file = pd.concat([hash_file] * 3000, ignore_index=True)
hash_file = pd.concat([hash_file] * 75, ignore_index=True, axis=1)


In [14]: %%timeit
    ...: cols = hash_file.select_dtypes(object).columns
    ...: hash_file[cols] = hash_file[cols].applymap(lambda x: x.strip())
    ...: 
338 ms ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [15]: %%timeit
    ...: cols = hash_file.select_dtypes(object).columns
    ...: hash_file[cols] = hash_file[cols].apply(lambda x: x.str.strip())
    ...: 
368 ms ± 7.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [16]: %%timeit
    ...: cols = hash_file.select_dtypes(object).columns
    ...: hash_file[cols] = hash_file[cols].stack().str.strip().unstack()
    ...: 
818 ms ± 17.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [17]: %%timeit
    ...: hash_file.astype(str).applymap(lambda x: x.strip())
    ...: 
1.09 s ± 21.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]: %%timeit
    ...: hash_file.astype(str).apply(lambda x: x.str.strip())
    ...: 
1.2 s ± 32.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [19]: %%timeit
    ...: hash_file.astype(str).stack().str.strip().unstack()
    ...: 
2 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

这篇关于与单个列相比,跨数据框一次剥离空间的优雅方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆