Python-如何使用多个分隔符拆分列值 [英] Python - How to split column values using multiple separators

查看:732
本文介绍了Python-如何使用多个分隔符拆分列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取.csv文件并创建熊猫数据框.我从此数据帧中获取一个值,该值应该是其中包含逗号分隔值的列表"项.但是它作为字符串"项出现,我必须使用分隔符来拆分其中的值.

I am reading a .csv file and creating a Panda Dataframe. From this Dataframe I am fetching a value which is supposed to be a "list" item with comma separated values in it. But it comes out as a "string" item and I have to use a separator to split the values in it.

例如:我有一个名称为"column_names"的字符串变量,其值小于以下值

For example : I have a string variable by name "column_names" with below values

column_names = "First_Name, Last_Name,Middle_Name"
column_names = column_name.split(',')

请注意第二个值之前的空格.因此,当我打印此变量时,我将在第二个元素之前留一个空格,这将在从该变量提取值时进一步造成麻烦.

Please note the space before the second value. So when I print this variable, I would be getting a space before the second element which will further create trouble while extracting values from this variable.

print(column_names)

['First_Name','Last_Name','Middle_Name']

['First_Name', ' Last_Name', 'Middle_Name']

为了克服这个问题,如果我让分隔符和实际的分隔符一起留有一个空格(这里是','),那么这些值将无法正确分割,如下所示

In order to overcome this, if I keep separator to have a space along with actual separator (here it will be ', ' ), then the values are not getting splitted properly as seen below

column_names = "First_Name, Last_Name,Middle_Name"
column_names = column_names.split(', ')
print(column_names)

['First_Name','Last_Name,Middle_Name']

['First_Name', 'Last_Name,Middle_Name']

分割时请注意逗号右边的空格.使用此分隔符,我只能获得两个值,而不是三个值.

Notice the space to the right of comma while splitting. Using this separator, I am able to get only two values instead of three values.

我的问题是该变量可能包含逗号分隔的值以及逗号左右两侧的空格,或者根本没有空格.我必须用一个命令处理所有情况(如果可能的话).诸如在拆分时提供多个分隔符值之类.

例如:column_names.split(','|','|',').

For example : column_names.split(','|', '|' ,').

不确定是否有这样的东西,但是任何指向此方法的指针都会有所帮助.

Not sure whether there is any as such but any pointers to this will be helpful.

推荐答案

这是CSV的常见问题.幸运的是,只需正确读取CSV即可将其压在萌芽中,因此您以后不必进行所有不必要的后处理.

This is a common issue with CSVs. Fortunately, you can nip this in the bud, simply by reading your CSV properly, so you don't have to do all this unnecessary post-processing later.

使用read_csv读取数据框时,将正则表达式传递给sep \ delimiter-

When reading your dataframe with read_csv, pass a regex to sep\ delimiter -

df = pd.read_csv(..., sep='\s*,\s*', engine='python')

现在,df.columns应该是字符串列表.

Now, df.columns should be a list of strings.

这篇关于Python-如何使用多个分隔符拆分列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆