Archive

Posts Tagged ‘non Latin characters’

WordPress – non Latin characters in URL

March 15th, 2011 No comments

If title has non-ASCII letters, those letters cannot be directly put in URL so they are percent-encoded. This is processed in sanitize_title_with_dashes_original() and utf8_uri_encode() .

The problem is, these two functions normalizes too much for replacing to small letters.

For example, in unit test data, “Zhang Ziyi(in Chinese)” is currently converted to “%e7%ab%a0%e5%ad%90%e6%80%a1″, however, this should be “%E7%AB%A0%E5%AD%90%E6%80%A1″.
Read more…