pandas 字符串数据类型 [英] pandas string data types
问题描述
我想为熊猫read_csv指定数据类型.快速浏览一下在指定类型后起作用但不起作用的事物.后者为什么不起作用?
I want to specify data types for pandas read_csv. Here's a quick look at something that does work and then doesn't when types are specified. Why doesn't the latter work?
import io
import pandas as pd
csv = """foo,1234567,a,1
foo,2345678,b,3
bar,3456789,b,5
"""
df = pd.read_csv(io.StringIO(csv),
names=["fb", "num", "loc", "x"])
print(df)
df = pd.read_csv(io.StringIO(csv),
names=["fb", "num", "loc", "x"],
dtype=["|S3", "np.int64", "|S1", "np.int8"])
print(df)
我已经进行了更新,以使这一点更加简单,希望在BrenBarn的建议中更加清楚.我的真实数据集要大得多,但是我想使用该方法为导入时的所有数据生成类型.
I've updated to make this much simpler and, hopefully, clearer on BrenBarn's suggestion. My real dataset is much larger, but I'd like to use the method to generate types for all my data on import.
推荐答案
正如Jeff指出的那样,我的语法不好.名称和类型必须压缩到dic样式的关系列表中.下面的代码可以工作,但是请注意,您不能dtype字符串宽度.您只能将其定义为对象.
As Jeff indicated, my syntax was bad. The names and types have to be zipped into a dic style list of relationships. The code below works, but note that you can't dtype a string width; you can only define it as an object.
import pandas as pd
import io
csv = """foo,1234567,a,1
foo,2345678,b,3
bar,3456789,b,5
"""
df = pd.read_csv(io.StringIO(csv),
names = ["fb", "num", "ab", "x"],
dtype = {"fb" : object, "num" : np.int64, "ab" : object, "x" : np.int8})
print(df)
这篇关于 pandas 字符串数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!