如何在Python中以正确的方式将一列分为2个? [英] How can I split a column into 2 in the correct way in Python?
问题描述
我正在从网站上通过网络抓取表格,并将其放入Excel文件中。我的目标是以正确的方式将列分为2列。
I am web-scraping tables from a website, and I am putting it to the Excel file. My goal is to split a columns into 2 columns in the correct way.
我要拆分的列: STATUS
The columns what i want to split: "STATUS"
我想要这种形式:
第一个示例:估计下午3:17->估计下午3:17
First example: Estimated 3:17 PM --> Estimated and 3:17 PM
第二个示例:延迟3:00 PM->延迟和3:00 PM
Second example: Delayed 3:00 PM --> Delayed and 3:00 PM
第三个示例:取消->取消和(空单元格)
Third example: Canceled --> Canceled and (empty cell)
因此,我需要分隔第一个单词(在第一列中),然后分隔下一个字符。
So, I need to separete the FIRST word (in the first column), and after that the next characters.
我该怎么做?
这里是我的相关代码,该代码已经包含格式化代码。
Here my relevant code, which is already contains a formatting code.
df2 = pd.DataFrame(datatable,columns = cols)
df2['a'] = df2['FLIGHT'].str[:2]
df2['b'] = df2['FLIGHT'].str[2:].str.zfill(4)
df2["UPLOAD_TIME"] = datetime.now()
mask = np.column_stack([df2[col].astype(str).str.contains(r"Scheduled", na=True) for col in df2])
df3 = df2.loc[~mask.any(axis=1)]
if os.path.isfile("output.csv"):
df1 = pd.read_csv("output.csv", sep=";")
df4 = pd.concat([df1,df3])
df4.to_csv("output.csv", index=False, sep=";")
else:
df3.to_csv
df3.to_csv("output.csv", index=False, sep=";")
这是我表中的excel prt sc:
Here the excel prt sc from my table:
推荐答案
您可以使用 str。拆分
- n = 1
用于按第一个空格和 expand = True
返回 DataFrame
,可以将其分配给新列:
You can use str.split
- n=1
for split by first whitespace and expand=True
for return DataFrame
, which can be assign to new columns:
df2[['c','d']] = df2['STATUS'].str.split(n=1, expand=True)
示例:
df2 = pd.DataFrame({'STATUS':['Estimated 3:17 PM','Delayed 3:00 PM']})
df2[['c','d']] = df2['STATUS'].str.split(n=1, expand=True)
print (df2)
STATUS c d
0 Estimated 3:17 PM Estimated 3:17 PM
1 Delayed 3:00 PM Delayed 3:00 PM
如果输入中没有空格,则输出中没有
:
df2 = pd.DataFrame({'STATUS':['Estimated 3:17 PM','Delayed 3:00 PM', 'Canceled']})
df2[['c','d']] = df2['STATUS'].str.split(n=1, expand=True)
print (df2)
STATUS c d
0 Estimated 3:17 PM Estimated 3:17 PM
1 Delayed 3:00 PM Delayed 3:00 PM
2 Canceled Canceled None
,如果需要替换无
来清空字符串,请使用 fillna
:
and if need replace None
to empty string use fillna
:
df2[['c','d']] = df2['STATUS'].str.split(n=1, expand=True)
df2['d'] = df2['d'].fillna('')
print (df2)
STATUS c d
0 Estimated 3:17 PM Estimated 3:17 PM
1 Delayed 3:00 PM Delayed 3:00 PM
2 Canceled Canceled
这篇关于如何在Python中以正确的方式将一列分为2个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!