Python-如何使用多个分隔符拆分列值 [英] Python - How to split column values using multiple separators
问题描述
我正在读取.csv文件并创建熊猫数据框.我从此数据帧中获取一个值,该值应该是其中包含逗号分隔值的列表"项.但是它作为字符串"项出现,我必须使用分隔符来拆分其中的值.
I am reading a .csv file and creating a Panda Dataframe. From this Dataframe I am fetching a value which is supposed to be a "list" item with comma separated values in it. But it comes out as a "string" item and I have to use a separator to split the values in it.
例如:我有一个名称为"column_names"的字符串变量,其值小于以下值
For example : I have a string variable by name "column_names" with below values
column_names = "First_Name, Last_Name,Middle_Name"
column_names = column_name.split(',')
请注意第二个值之前的空格.因此,当我打印此变量时,我将在第二个元素之前留一个空格,这将在从该变量提取值时进一步造成麻烦.
Please note the space before the second value. So when I print this variable, I would be getting a space before the second element which will further create trouble while extracting values from this variable.
print(column_names)
['First_Name','Last_Name','Middle_Name']
['First_Name', ' Last_Name', 'Middle_Name']
为了克服这个问题,如果我让分隔符和实际的分隔符一起留有一个空格(这里是','),那么这些值将无法正确分割,如下所示
In order to overcome this, if I keep separator to have a space along with actual separator (here it will be ', ' ), then the values are not getting splitted properly as seen below
column_names = "First_Name, Last_Name,Middle_Name"
column_names = column_names.split(', ')
print(column_names)
['First_Name','Last_Name,Middle_Name']
['First_Name', 'Last_Name,Middle_Name']
分割时请注意逗号右边的空格.使用此分隔符,我只能获得两个值,而不是三个值.
Notice the space to the right of comma while splitting. Using this separator, I am able to get only two values instead of three values.
我的问题是该变量可能包含逗号分隔的值以及逗号左右两侧的空格,或者根本没有空格.我必须用一个命令处理所有情况(如果可能的话).诸如在拆分时提供多个分隔符值之类.
例如:column_names.split(','|','|',').
For example : column_names.split(','|', '|' ,').
不确定是否有这样的东西,但是任何指向此方法的指针都会有所帮助.
Not sure whether there is any as such but any pointers to this will be helpful.
推荐答案
这是CSV的常见问题.幸运的是,只需正确读取CSV即可将其压在萌芽中,因此您以后不必进行所有不必要的后处理.
This is a common issue with CSVs. Fortunately, you can nip this in the bud, simply by reading your CSV properly, so you don't have to do all this unnecessary post-processing later.
使用read_csv
读取数据框时,将正则表达式传递给sep
\ delimiter
-
When reading your dataframe with read_csv
, pass a regex to sep
\ delimiter
-
df = pd.read_csv(..., sep='\s*,\s*', engine='python')
现在,df.columns
应该是字符串列表.
Now, df.columns
should be a list of strings.
这篇关于Python-如何使用多个分隔符拆分列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!