调用Python文件后,Excel CSV输出出现错误 [英] Excel CSV output appearing wrong after calling Python File
问题描述
当前在.csv中的以下输出中苦苦挣扎,它们中的各种随机字符是不应存在的玩家名称和值
Currently struggling with the following output in .csv where their is various random character within the is the players names and values where there shouldn't be
(我已经给出下面的输出图片)
我想知道代码中哪里出了错,我在努力消除随机字符
I'm wondering where I'm going wrong in the code where I'm struggling to eliminate the random characters
我正在尝试删除以下字符,例如Â,Ã,©,‰等。
有什么建议吗?
Python代码
#importing
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/47.0.2526.106 Safari/537.36'}
#calling websites
page = "https://www.transfermarkt.co.uk/transfers/transferrekorde/statistik/top/plus/0/galerie/0?saison_id=2000"
pageTree = requests.get(page, headers=headers)
pageSoup = BeautifulSoup(pageTree.content, 'html.parser')
#calling players names
Players = pageSoup.find_all("a", {"class": "spielprofil_tooltip"})
#Let's look at the first name in the Players list.
Players[0].text
#calling value of players
Values = pageSoup.find_all("td", {"class": "rechts hauptlink"})
#Let's look at the first name in the Values list.
Values[0].text
PlayersList = []
ValuesList = []
for i in range(0,25):
PlayersList.append(Players[i].text)
ValuesList.append(Values[i].text)
df = pd.DataFrame({"Players":PlayersList,"Values":ValuesList})
df.to_csv('2000.csv', index=False)
df.head()
=============================== ===================================
====================================================================
我的Excel输出
推荐答案
...
utf8_bom = '\xEF\xBB\xBF'
with open('2000.csv', 'w') as csv_file:
csv_file.write(utf8_bom)
df.to_csv(csv_file, index=False, mode='a')
说明:BOM是字节顺序标记(q.v.)。如果Excel在CSV文件的开头找到了它,它将使用它来确定编码,在您的情况下为UTF-8(对于Python 3,默认编码是正确的)。
Explanation: The BOM is the byte order mark (q.v.). If Excel finds it at the beginning of the CSV file, it uses it to determine the encoding, which in your case is UTF-8 (the default encoding – correctly – for Python 3).
编辑
正如Mark Tolonen所指出的,上面是以下代码:
As Mark Tolonen pointed out, the compact version of the above is the following code:
df.to_csv('2000.csv', encoding='utf-8-sig', index=False)
-sig
编码的名称代表签名,即开头的BOM,Microsoft软件使用它来检测编码。另请参见<< a href = https://docs.python.org/3.7/library/codecs.html#encodings-and-unicode rel = nofollow noreferrer>编码和Unicode 部分a href = https://docs.python.org/3/library/codecs.html rel = nofollow noreferrer> codecs
手册。
The -sig
in the name of the encoding stands for "signature", i.e., the BOM at the beginning which is used by Microsoft software to detect the encoding. See also the Encodings and Unicode section of the codecs
manual.
这篇关于调用Python文件后,Excel CSV输出出现错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!