Oct 18

Are you working with foreign languages or non-Latin character sets in WordPress?

Are you experiencing this problem:

  • You write a post in a foreign language that uses a non-latin script, e.g. Chinese, Korean or Japanese. When you click ‘save’ or ‘publish’, the text is then re-displayed as unreadable garbage characters.

It is likely that your WordPress MySQL database has been created with the latin1 character set instead of the UTF8-character set which is able to correctly represent most foreign languages. Many web hosting providers have set latin1 as the default for MySQL.

I experienced this problem a while back, when I was setting up a website for a client who’s in the business of providing accommodation to international students and working holidaymakers coming to Australia. He wanted the website to have translated versions of each WordPress page in the 8 most common foreign languages spoken by his clients – French, German, Swedish, Korean, Japanese, Spanish, Portuguese and Chinese. To enable support for multiple languages in WordPress, I used the plugin WPML – The WordPress Multilingual Plugin.

Things were going well, but I hit a snag when I started to add a Korean page – it looked fine when I pasted it into the visual editor, but as soon as I saved the page and viewed it, the text rendered as unreadable garbage. After doing some research and a lot of testing and debugging, I eventually solved the problem:

Incorrect Advice That You Should NOT Follow

Whilst searching for the solution to this problem, I found two pieces of very bad advice that had been repeated various times

1. Do NOT comment out the DB_CHARSET constant in wp-config.php

/** Database Charset to use in creating database tables. */
define('DB_CHARSET', 'utf8');

2. In the WordPress admin menu, do NOT change “Settings->Reading->Encoding for Pages and Feeds” from UTF-8 to something else

Following the above bad advice might appear to ‘fix’ the appearance of some languages, but others will still display incorrectly. Furthermore, these changes will cause your text to be encoded incorrectly, so that when you do implement the correct fix, your foreign-language pages will have bad character mappings and be filled with incorrect characters. You will then have to correct or re-enter the text from scratch.

The Correct Solution

To fix this problem, you need to change the default character set/collation of your WordPress site’s MySQL database to UTF-8/utf8_general_ci and then convert all the current table data.

The traditional way to do this is to run various “ALTER DATABASE” and “ALTER TABLE” SQL queries within phpMyAdmin or via the command-line mysql client, and you can find some great instructions here:

http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL

Unfortunately, this procedure is tedious.

There is however a fantastic plugin named Convert WP Database to UTF-8. This plugin adds a sub-menu page named “UTF-8 DB Converter” to the Plugins Menu. Simple click on “Start converting” and the plugin will automatically execute the required SQL queries to alter the character set and collation of all your existing WordPress tables.

You only need to do it once per website, and the problem is fixed for good. You can then uninstall the plugin.

Although this plugin has always run without a hitch for me, it is proper practice to make a backup of your WordPress database before attempting this procedure, just in case something goes wrong. Most web hosts provide wizards that let you easily backup your MySQL databases, but you can do this using phpMyAdmin or a WordPress plugin like

Also, you should place your website in maintenance mode using a plugin like WP Maintenance Mode to stop users accessing your website while the character set conversion is in progress. The time taken will vary depending on how big your database is, but for me, the conversion has always been completed in less than 1 minute.

Once you have done this, all your foreign language text should save and display correctly.

 

2 Responses to “Fix Problems Displaying Non-Latin Character Sets in WordPress”

  1. Timo Says:

    Thank you SO MUCH for this post…
    After 3-4 hours of bad advices and frustration…

    the plug in: “Convert WP Database to UTF-8” saved my day and solved the issue!

    Just confirm… your advice in this post is also correct for Arabic language.

    so… if someone is unable to read arabic characters in wordpress blog,
    or if he/she is seeing ????? marks instead of the Arabic characters…

    all what’s needed is to Convert the WP DB to UTF-8 (using the plugin you mentioned).

    Thanks again,

    Timo

  2. Darleen Witmer Says:

    Great post and yes there is a lot of sites reporting to comment the line out in wpconfig – which of course is a bandaid,, but not a correct solution.

Add Your Comments

css.php