'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in...

强人锁男 · 发表于 2018-9-18 18:35:20

在读取和解析Amazon XML文件时我收到以下错误：
'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)
从我到目前为止在线阅读的内容来看，错误是因为XML文件是UTF-8，Python希望将其作为ASCII编码字符处理。是否有一种简单的方法可以使错误消失并让我的程序在读取时打印XML？

社会诚哥 · 发表于 2018-9-18 18:36:21

尝试首先将unicode字符串编码为ascii：
unicodeData.encode('ascii', 'ignore')
'ignore'部分会告诉它只是跳过那些字符。从python文档：
>>> u = unichr(40960) + u'abcd' + unichr(1972)
>>> u.encode('utf-8')
'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
'abcd'
>>> u.encode('ascii', 'replace')
'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
'ꀀabcd޴'
您可能想阅读这篇文章：http：//www.joelonsoftware.com/articles/Unicode.html，我发现它非常有用，作为正在进行的基础教程。读完之后，你会觉得你只是猜测要使用什么命令（或者至少发生在我身上）。

污妖王 · 发表于 2018-9-18 18:36:56

更好的解决方案：
if type(value) == str:
# Ignore errors even if the string is not proper UTF-8 or has
# broken marker bytes.
# Python built-in function unicode() can do this.
value = unicode(value, "utf-8", errors="ignore")
else:
# Assume the value object has proper __unicode__() method
value = unicode(value)
如果您想了解更多有关原因的信息：
http://docs.plone.org/manage/troubleshooting/unicode.html#id1

令狐少侠 · 发表于 2018-9-18 18:37:36

不要在脚本中硬编码环境的字符编码; 直接打印Unicode文本：
assert isinstance(text, unicode) # or str on Python 3
print(text)
如果您的输出重定向到文件（或管道）; 你可以使用PYTHONIOENCODINGenvvar来指定字符编码：
$ PYTHONIOENCODING=utf-8 python your_script.py >output.utf8
否则，python your_script.py将正常运行就是-你的区域设置用于将文本编码（上POSIX检查：LC_ALL，LC_CTYPE，LANGenvvars中-设置LANG，如果必要将使用UTF-8）。