Understanding and repairing garbled text (Mojibake) is despite Unicode a permanent ongoing task in IT projects. Garbled text is the result of text being decoded using an unintended character encoding.

Example: Die UTF-8 Selbsthilfegruppe trifft sich heute Abend im grünen Saal

This talks explains how to analyze and fix such encoding problems with python. The topics of this talk contains:

  • difference between grapheme and codepoints
  • Unicode vs. UTF-8
  • decoding and encoding files, database result sets, REST-APIs calls
  • the unicodedata module
  • handling of ISO charsets in the unicode world

This talk shows short code examples for real world problems and solutions.

Martin Hoermann

Affiliation: ORDIX AG

Working for over 25 years for ORDIX AG as consultant in topics databases and programming. Focused on programming python in the last years. Giving lectures for beginners and advanced customers. Having lots of fun in edutainment difficult but all-day problems.

visit the speaker at: Homepage