Home > Wordpress > WordPress – non Latin characters in URL

WordPress – non Latin characters in URL

If title has non-ASCII letters, those letters cannot be directly put in URL so they are percent-encoded. This is processed in sanitize_title_with_dashes_original() and utf8_uri_encode() .

The problem is, these two functions normalizes too much for replacing to small letters.

For example, in unit test data, “Zhang Ziyi(in Chinese)” is currently converted to “%e7%ab%a0%e5%ad%90%e6%80%a1″, however, this should be “%E7%AB%A0%E5%AD%90%E6%80%A1″.

I have been using my quick patch for months on my Japanese WordPress blog, and for me it is working. The patch is, wp-includes/formatting.php at the end of the function sanitize_title_with_dashes(),

$title = trim($title, ‘-’);
$title = preg_replace( ‘/%([a-fA-F0-9]{2})/e’, “‘%’.strtoupper(‘\\1′)”, $title);
return $title;

It is quite a symptomatic treatment though.

http://core.trac.wordpress.org/ticket/6697
  1. No comments yet.
  1. No trackbacks yet.
You must be logged in to post a comment.