如何翻译“字节"?对象转换为pandas Dataframe,Python3.x中的文字字符串? [英] How to translate "bytes" objects into literal strings in pandas Dataframe, Python3.x?

查看:109
本文介绍了如何翻译“字节"?对象转换为pandas Dataframe,Python3.x中的文字字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Python3.x pandas DataFrame,其中某些列是用字节表示的字符串(例如在Python2.x中)

I have a Python3.x pandas DataFrame whereby certain columns are strings which as expressed as bytes (like in Python2.x)

import pandas as pd
df = pd.DataFrame(...)
df
       COLUMN1         ....
0      b'abcde'        ....
1      b'dog'          ....
2      b'cat1'         ....
3      b'bird1'        ....
4      b'elephant1'    ....

当我使用df.COLUMN1按列访问时,看到Name: COLUMN1, dtype: object

When I access by column with df.COLUMN1, I see Name: COLUMN1, dtype: object

但是,如果我按元素访问,它是一个字节"对象

However, if I access by element, it is a "bytes" object

df.COLUMN1.ix[0].dtype
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'dtype'

如何将它们转换为常规"字符串?也就是说,如何摆脱此b''前缀?

How do I convert these into "regular" strings? That is, how can I get rid of this b'' prefix?

推荐答案

您可以使用矢量化的

You can use vectorised str.decode to decode byte strings into ordinary strings:

df['COLUMN1'].str.decode("utf-8")

要对多个列执行此操作,您可以仅选择str列:

To do this for multiple columns you can select just the str columns:

str_df = df.select_dtypes([np.object])

全部转换:

str_df = str_df.stack().str.decode('utf-8').unstack()

然后您可以将转换后的cols与原始df cols换掉:

You can then swap out converted cols with the original df cols:

for col in str_df:
    df[col] = str_df[col]

这篇关于如何翻译“字节"?对象转换为pandas Dataframe,Python3.x中的文字字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆