删除多余空白时,Python Pandas错误 [英] Python pandas error while removing extra white space

查看:170
本文介绍了删除多余空白时,Python Pandas错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用命令清除多余空格的数据框中的列.数据框有近800万条记录

I am trying to clean a column in data frame of extra white space using command. The data frame has close to 8 million records

datt2.My_variable=datt2.My_variable.str.replace('\s+', ' ')

我最终遇到错误

MemoryError                               Traceback (most recent call last)
<ipython-input-10-158a51cfaa3d> in <module>()
----> 1 datt2.My_variable=datt2.My_variable.str.replace('\s+', ' ')

c:\python27\lib\site-packages\pandas\core\strings.pyc in replace(self, pat, repl, n, case, flags)
   1504     def replace(self, pat, repl, n=-1, case=True, flags=0):
   1505         result = str_replace(self._data, pat, repl, n=n, case=case,
-> 1506                              flags=flags)
   1507         return self._wrap_result(result)
   1508 

c:\python27\lib\site-packages\pandas\core\strings.pyc in str_replace(arr, pat, repl, n, case, flags)
    334         f = lambda x: x.replace(pat, repl, n)
    335 
--> 336     return _na_map(f, arr)
    337 
    338 

c:\python27\lib\site-packages\pandas\core\strings.pyc in _na_map(f, arr, na_result, dtype)
    152 def _na_map(f, arr, na_result=np.nan, dtype=object):
    153     # should really _check_ for NA
--> 154     return _map(f, arr, na_mask=True, na_value=na_result, dtype=dtype)
    155 
    156 

c:\python27\lib\site-packages\pandas\core\strings.pyc in _map(f, arr, na_mask, na_value, dtype)
    167         try:
    168             convert = not all(mask)
--> 169             result = lib.map_infer_mask(arr, f, mask.view(np.uint8), convert)
    170         except (TypeError, AttributeError):
    171 

pandas\src\inference.pyx in pandas.lib.map_infer_mask (pandas\lib.c:65837)()

pandas\src\inference.pyx in pandas.lib.maybe_convert_objects (pandas\lib.c:56806)()

MemoryError:

推荐答案

问题:我正在尝试清理多余空格的数据框中的一列 ...
datt2.My_variable=datt2.My_variable.str.replace('\s+', ' ')

Question: I am trying to clean a column in data frame of extra white space ...
datt2.My_variable=datt2.My_variable.str.replace('\s+', ' ')

请发表评论,我能正确理解您的expression吗?

Please comment, do I understand your expression correctly?

 pandas       Column           Column              DataSeries
 DataFrame     Name           DataSeries             Methode
|-^-|       |----^-----|   |-------^-------|  |----------^----------|
datt2       .My_variable = datt2.My_variable  .str.replace('\s+', ' ')


我很确定使用re.sub与使用pandas.str.replace(...)相同,但是没有复制整个column数据.


I'm pretty sure using re.sub is the same as use pandas.str.replace(...), but without copy the whole column Data.

来自pandas文档:
Series.str.replace(pat,repl,n = -1,case = True,flags = 0)
将Series/Index中出现的pattern/regex替换为其他字符串.
等效于str.replace()或re.sub().

From the pandas doc:
Series.str.replace(pat, repl, n=-1, case=True, flags=0)
Replace occurrences of pattern/regex in the Series/Index with some other string.
Equivalent to str.replace() or re.sub().


尝试使用纯python,例如:


Try pure python, for instance:

    import re
    for idx in df.index:
        df.loc[idx, 'My_variable'] = re.sub('\s\s+', ' ', df.loc[idx, 'My_variable'])  

注意:考虑使用'\ s \ s +'代替"\ s +".
使用'\ s +'将一个空白替换为一个空白,这是没有用的.

Note: Consider to use '\s\s+' instead of '\s+'.
Using '\s+' will replace ONE BLANK with ONE BLANK, which is useless.

使用Python:3.4.2-pandas:0.19.2 进行了测试
回来,如果这对您有用,请标记您的问题为已回答,或者评论为什么不这样做.

Tested with Python:3.4.2 - pandas:0.19.2
Come back and Flag your Question as answered if this is working for you or comment why not.

这篇关于删除多余空白时,Python Pandas错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆