Correct strange character encodings after wordpress upgrade

If after upgrading or restoring a WordPress installation you notice strange characters like “’”, ““”, —, and “” in your posts or comments then somehow your character encoding got messed up. This might have happened because you were pasting text from Microsoft Word or some other terrible program you shouldn’t be using. Supposedly WordPress previously used a latin character encoding and now uses utf-8.

To fix the stranger characters, first uninstall Microsoft Word and resolve to never use it again and then use these mysql statements to correct your content. After uninstalling Microsoft Word you’ll also probably want to install LibreOffice which is a free and open source office suite.

Remember to backup your database before running any of these sql statements.

First, update the collation of your database tables:

ALTER TABLE `wp_posts` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci
ALTER TABLE `wp_comments` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci

Next, you can run the actual commands to replace the strange characters with the right ones.

update wp_posts set post_content = replace(post_content, '’', '\'');
update wp_posts set post_content = replace(post_content, '“', '\"');
update wp_posts set post_content = replace(post_content, 'â€', '\"');
update wp_posts set post_content = replace(post_content, 'Â', ' ');

update wp_comments set comment_content = replace(comment_content, '’', '\'');
update wp_comments set comment_content = replace(comment_content, '“', '\"');
update wp_comments set comment_content = replace(comment_content, 'â€', '\"');
update wp_comments set comment_content = replace(comment_content, 'Â', ' ');

If you find any more strange characters that need replacing, just figure out what they are supposed to be and update the sql statements accordingly. One thing you need to be aware of is the order that you replace the incorrect characters. If you were to replace “—, before you replaced “’”, then you would be left with the “™” character by itself and you might accidentally replace “— with the wrong character. Therefore you should replace the longer string of characters first before replacing the shorter ones.

Some other people have written WordPress plugins to correct this problem the right way by setting the character encodings correctly in the database which would therefore correct all the characters instead of some of them. However, I couldn’t seem to get them to work in my particular installation. More information about this solution can be found here and here.

Leave a Reply

Your email address will not be published. Required fields are marked *