pandas read_csv dtype读取所有列,但很少读取为字符串 [英] Pandas read_csv dtype read all columns but few as string

查看:56
本文介绍了 pandas read_csv dtype读取所有列,但很少读取为字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Pandas读取一堆CSV.将选项json传递给dtype参数,以告诉pandas将哪些列读取为字符串而不是默认值:

I'm using Pandas to read a bunch of CSVs. Passing an options json to dtype parameter to tell pandas which columns to read as string instead of the default:

dtype_dic= { 'service_id':str, 'end_date':str, ... }
feedArray = pd.read_csv(feedfile , dtype = dtype_dic)

在我的情况下,将 所有 列除少数几个特定列之外均应读取为字符串.因此,我不想将几个列定义为dtype_dic中的str,而是仅将我选择的几列设置为int或float.有办法吗?

In my scenario, all the columns except a few specific ones are to be read as strings. So instead of defining several columns as str in dtype_dic, I'd like to set just my chosen few as int or float. Is there a way to do that?

这是循环遍历具有不同列的各种CSV的循环,因此在将整个csv读取为字符串(dtype=str)后进行直接列转换将不容易,因为我不会立即知道csv包含哪些列. (我宁愿花精力在dtype json中定义所有列!)

It's a loop cycling through various CSVs with differing columns, so a direct column conversion after having read the whole csv as string (dtype=str), would not be easy as I would not immediately know which columns that csv is having. (I'd rather spend that effort in defining all the columns in the dtype json!)

但是,如果有一种方法可以处理要转换为数字的列名列表,而不会错误地指出该csv中不存在该列,那么是的,如果没有其他方法,那将是一个有效的解决方案在csv阅读阶段本身中完成此操作的方法.

But if there's a way to process the list of column names to be converted to number without erroring out if that column isn't present in that csv, then yes that'll be a valid solution, if there's no other way to do this at csv reading stage itself.

注意:这听起来像以前的问题 a>,但那里的答案却走了一条非常不同的路径(与布尔相关),这不适用于此问题.请不要将其标记为重复项!

Note: this sounds like a previously asked question but the answers there went down a very different path (bool related) which doesn't apply to this question. Pls don't mark as duplicate!

推荐答案

编辑-抱歉,我误读了您的问题.更新了我的答案.

EDIT - sorry, I misread your question. Updated my answer.

您可以将整个csv读取为字符串,然后将所需的列转换为其他类型,如下所示:

You can read the entire csv as strings then convert your desired columns to other types afterwards like this:

df = pd.read_csv('/path/to/file.csv', dtype=str)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
    df[col] = df[col].astype(col_type)

另一种方法是,如果您确实想在读取文件时为所有列指定正确的类型,而不在之后进行更改:仅读取列名(无行),然后使用那些列名来填写字符串

Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings

col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)

这篇关于 pandas read_csv dtype读取所有列,但很少读取为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆