UnicodeDecodeError: 'ASCII' codec unable to decode byte 0xef of position 1: s...
我在尝试将字符串编码为UTF-8时遇到了一些问题。我尝试过很多东西,包括使用string.encode('utf-8')和unicode(string),但是我得到了错误:UnicodeDecodeError: 'ASCII' codec unable to decode byte 0xef of position 1: serial number out of range (128)
这是我的字符串:
(。・ω・。)ノ
我不知道出了什么问题,
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
这与您的终端编码未设置为UTF-8有关。这是我的终端
$ echo $LANG
en_GB.UTF-8
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59)
on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(。・ω・。)ノ
>>>
在我的终端上,该示例适用于上述,但如果我摆脱了LANG设置,那么它将无法工作
$ unset LANG
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59)
on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
>>>
尝试:
string.decode('utf-8')
# or:
unicode(string, 'utf-8')
问题在于print命令,因为它将Unicode字符串转换为控制台编码,并且控制台无法显示字符串。尝试将字符串写入文件,并使用支持Unicode的一些不错的编辑器查看结果:
import codecs
s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
s1 = s.decode('utf-8')
f = codecs.open('out.txt', 'w', encoding='utf-8')
f.write(s1)
f.close()
然后你会看到(。・ω・。)ノ。
页:
[1]