使用pd.read_clipboard时,如何处理其中包含空格的列名称? [英] How do you handle column names having spaces in them when using pd.read_clipboard?
问题描述
这是我很长一段时间以来遇到的一个实际问题.
This is a real problem I've faced for a long time.
采用此数据框:
A B THRESHOLD
NaN NaN NaN
-0.041158 -0.161571 0.329038
0.238156 0.525878 0.110370
0.606738 0.854177 -0.095147
0.200166 0.385453 0.166235
使用pd.read_clipboard
复制非常容易.但是,如果列名之一带有空格:
It is easy enough to copy using pd.read_clipboard
. However, if one of the column names has a space:
A B Col #3
NaN NaN NaN
-0.041158 -0.161571 0.329038
0.238156 0.525878 0.110370
0.606738 0.854177 -0.095147
0.200166 0.385453 0.166235
然后,它的读法如下:
A B Col #3
0 NaN NaN NaN NaN
1 -0.041158 -0.161571 0.329038 NaN
2 0.238156 0.525878 0.110370 NaN
3 0.606738 0.854177 -0.095147 NaN
4 0.200166 0.385453 0.166235 NaN
如何预防?
推荐答案
在这种情况下,我要做的是将所有列分开两个或多个空格,然后将sep ='\ s \ s +'用作分隔符,这样,当我确实具有单个空格的列标题(例如上方的Col#3)时,会将其视为一列.
What I do in this situation is that I make all my columns two or more spaces apart, then I use sep='\s\s+' for my delimiter, this way when I do have column headings with a single space such as, Col #3 above it treats it as one column.
A B Col #3
NaN NaN NaN
-0.041158 -0.161571 0.329038
0.238156 0.525878 0.110370
0.606738 0.854177 -0.095147
0.200166 0.385453 0.166235
df = pd.read_clipboard(sep='\s\s+')
您确实会收到此警告,但是您可以忽略此警告,因为这样做的正确性.或者,如果您的OCD表现最好,您可以放下engine='python'
. :)
You do get this warning, but you can ignore it since it as done it right. Or you could put the engine='python'
if your OCD gets the best of you. :)
C:\ Program 文件\ Anaconda3 \ lib \ site-packages \ pandas \ io \ clipboards.py:63: ParserWarning:回退到"python"引擎,因为"c" 引擎不支持正则表达式分隔符(分隔符> 1个字符,并且 与'\ s +'不同的被解释为regex);你可以避免这个 通过指定engine ='python'发出警告.返回 read_table(StringIO(text),sep = sep,** kwargs)
C:\Program Files\Anaconda3\lib\site-packages\pandas\io\clipboards.py:63: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'. return read_table(StringIO(text), sep=sep, **kwargs)
print(df)
A B Col #3
0 NaN NaN NaN
1 -0.041158 -0.161571 0.329038
2 0.238156 0.525878 0.110370
3 0.606738 0.854177 -0.095147
4 0.200166 0.385453 0.166235
这篇关于使用pd.read_clipboard时,如何处理其中包含空格的列名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!