Most interesting tutorial I've found: http://boodebr.org/main/python/all-about-python-and-unicode
Keypoints:
- unicode are "symbols" or "objects" (no fixed computer representation, don't think bytes) and codecs transform them into binary strings (so you can print, store in disk, sent across network...).
- a unicode string example with some greek characters: unicodeString = u"abc_\u03a0\u03a3\u03a9.txt"
- you shouldn't 'print' a unicode string without encoding it first (by default Python will encode in ascii which can leads to errors if there are non ascii characters)
- you can print a unicode "representation": print repr(unicodeString)
- you encode with the .encode method: binary = unicodeString.encode("utf-8")
- you can see the binary result like this: print "UTF-8", repr(unicodeString.encode('utf-8'))
- print "ASCII",unicodeString.encode('ascii','replace') #will replace non-codable characters with '?'
- from binary to unicode: unicode(utf8_string,'utf-8') # you must specify the encoding if not Python assumes it's ascii
- once you have a Unicode object, it behaves exactly like a regular string object, so there is no new syntax to learn (other than the \u and \U escapes)
* http://www.jorendorff.com/articles/unicode/python.htmlfrom my wikinote
* http://evanjones.ca/python-utf8.html
* http://vim.sourceforge.net/tips/tip.php?tip_id=246
* http://farmdev.com/thoughts/23/what-i-thought-i-knew-about-unicode-in-python-amounted-to-nothing/
Aucun commentaire:
Enregistrer un commentaire