Encoding Charactersets - may the force be with you
Martin Hoermann
Understanding and repairing garbled text (Mojibake) is despite Unicode a permanent ongoing task in IT projects. Garbled text is the result of text being decoded using an unintended character encoding.
Example: Die UTF-8 Selbsthilfegruppe trifft sich heute Abend im grünen Saal
This talks explains how to analyze and fix such encoding problems with python. The topics of this talk contains:
- difference between grapheme and codepoints
- Unicode vs. UTF-8
- decoding and encoding files, database result sets, REST-APIs calls
- the unicodedata module
- handling of ISO charsets in the unicode world
This talk shows short code examples for real world problems and solutions.
Martin Hoermann
Affiliation: ORDIX AG
Working for over 25 years for ORDIX AG as consultant in topics databases and programming. Focused on programming python in the last years. Giving lectures for beginners and advanced customers. Having lots of fun in edutainment difficult but all-day problems.
visit the speaker at: Homepage