WordPress – non Latin characters in URL
If title has non-ASCII letters, those letters cannot be directly put in URL so they are percent-encoded. This is processed in sanitize_title_with_dashes_original() and utf8_uri_encode() .
The problem is, these two functions normalizes too much for replacing to small letters.
For example, in unit test data, “Zhang Ziyi(in Chinese)” is currently converted to “%e7%ab%a0%e5%ad%90%e6%80%a1″, however, this should be “%E7%AB%A0%E5%AD%90%E6%80%A1″.
I have been using my quick patch for months on my Japanese WordPress blog, and for me it is working. The patch is, wp-includes/formatting.php at the end of the function sanitize_title_with_dashes(),
$title = trim($title, ‘-’);
$title = preg_replace( ‘/%([a-fA-F0-9]{2})/e’, “‘%’.strtoupper(‘\\1′)”, $title);
return $title;
It is quite a symptomatic treatment though.
http://core.trac.wordpress.org/ticket/6697
Recent Comments