Removing any and all inline styles from the_content()

The question:

For one of my current projects, I had to transfer blogposts from an old WordPress site to my project.

Things went smoothly until I’ve seen that all the posts were copy pasted from Word, leaving this before pretty much every paragraph:

<span style="font-size: medium; font-family: georgia,palatino;">

And at some places things like these:

<p style="text-align: justify;">
<p style="text-align: justify;"><span style="font-size: medium; font-family: georgia,palatino;"><strong><span style="color: #000000;">

So because I don’t have the 40 hours (even less the patience) to just go into every post (there’s about 100) and remove those unwanted tags, I’m looking for a filter that would just remove all style (except maybe if it contains text-decoration:underline) elements before outputting the_content()

Is there such a thing?

The Solutions:

Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.

Method 1

If we want to remove all inline styles, then just simply need to add the following code in functions.php.

add_filter('the_content', function( $content ){
    //--Remove all inline styles--
    $content = preg_replace('/ style=("|')(.*?)("|')/','',$content);
    return $content;
}, 20);

Method 2

Just add this to your functions.php.

Note: This filter works at the time of saving/updating the post.

add_filter( 'wp_insert_post_data' , 'filter_post_data' , '99', 2 );

function filter_post_data( $data , $postarr ) {

    $content = $data['post_content'];

    $content = preg_replace('#<p.*?>(.*?)</p>#i', '<p>1</p>', $content);
    $content = preg_replace('#<span.*?>(.*?)</span>#i', '<span>1</span>', $content);
    $content = preg_replace('#<ol.*?>(.*?)</ol>#i', '<ol>1</ol>', $content);
    $content = preg_replace('#<ul.*?>(.*?)</ul>#i', '<ul>1</ul>', $content);
    $content = preg_replace('#<li.*?>(.*?)</li>#i', '<li>1</li>', $content);

    $data['post_content'] = $content;

    return $data;
}

Note: This filter works at the time when function the_content() is executed.

add_filter( 'the_content', 'the_content_filter', 20 );

function the_content_filter( $content ) {
    $content = preg_replace('#<p.*?>(.*?)</p>#i', '<p>1</p>', $content);
    $content = preg_replace('#<span.*?>(.*?)</span>#i', '<span>1</span>', $content);
    $content = preg_replace('#<ol.*?>(.*?)</ol>#i', '<ol>1</ol>', $content);
    $content = preg_replace('#<ul.*?>(.*?)</ul>#i', '<ul>1</ul>', $content);
    $content = preg_replace('#<li.*?>(.*?)</li>#i', '<li>1</li>', $content);
    return $content;
}

Method 3

I tried the method above with the saving/updating but didn’t worked for me so I went from another approach. I exported the whole wp_posts table, opened it in Sublime and did a regex replace. I used style="*.*?" to find all cases and replaced them with emptyness. Then droped the old table’s content and imported the new one.

If any one try this method – please make sure you have a clear back up in case there are some other post types in the wp_post table and the things got bit messy.

Method 4

I would check out the content_save_pre filter, and probably apply some fancy regex at that point.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Comment