Ask your WordPress questions! Pay money and get answers fast! Comodo Trusted Site Seal
Official PayPal Seal

mySQL query: remove first IMG tag and surrounding link from DB WordPress

  • SOLVED

I have a series of posts held in wp_posts, several of which have a big image at the beginning.

I've used a plugin to pull this image and set it as a 'featured image' inside WP, but now I need to search through the post_content field and remove the first image <strong>and</strong> the containing link inside the post.

<em>Not all posts have images</em>, but if they do, they'll take one of the two following formats:

<a href="http://www.mysite.com/mycategoryname/postname"><img class="spotlight" src="http://www.mysite.com/pix/spotlight/picturename.jpg" alt="" /></a>

or

<img src="http://www.mysite.com/pix/spotlight/picturename.jpg" class="spotlight">

The images are a mix of jpg/gif/png, and these code blocks are always right at the very start of the field.

It's only a single-run operation, so doesn't have to be hugely clean, or all be contained in one SQL query - just needs to get the job done. Ideally it would remove any whitespace from the top of the field as well, but that's not essential.

Answers (2)

2011-04-19

Maor Barazany answers:

You should export your database to a .sql dump file, than make a copy of it to backup.
Then you should write a small function to <strong>file_get_contents</strong> of that sql file, and run on it a <strong>preg_replace</strong> that will find the pattern you mentioned and replace it with null.
Then, import the new database file again to MySQL.


Jon comments:

I have the whole DB exported, dropped into Sequel (running on MAMP in any event) - I just need an idea of what SQL query to run, or alternatively, what to drop into the functions file - that's where I'm completely stuck!


Maor Barazany comments:

You cannot do it with SQL query.
Create a php file and load that file.
You should be familiar with regular expressions to write pattern that will find the pattern you specified above and replace it.

Something like
<?php
$orig_file = file_get_contents('/path/to/filename.sql');
$pattern = '/regex_pattern_here/';
$replace_with = '';
$replace = preg_replace($pattern, $replace_with, $orig_file);
file_put_contents('/path/to/new_file.sql', $replace);
?>


Then you will have a new file created with name new_file.sql that contains the replacements and you will import it to the database.

If you need to try and find the regular expression pattern, it will take sometime, and cannot be done for 5$.

2011-04-19

Luke America answers:

First, Maor Barazany is correct 100%. Here's a mod of his snippet template with the pattern replacement code. As in your examples, it relies on images being in this path: "pix/spotlight/".

This should work quickly and flawlessly. As for compensation, I'd prefer you pay Maor the full 5 bucks, then "tip" me an amount that you consider fair.


<?php
// 1: get the SQL content from the file
$orig_file = file_get_contents('/path/to/filename.sql');

// 2: link/image combo replacement
$pattern = '/<a (.+?)><img (.+?)pix\/spotlight(.+?)<\/a>/i';
$replace_with = '';
$replace = preg_replace($pattern, $replace_with, $orig_file);

// 3: image only replacement
$pattern = '/<img (.+?)pix\/spotlight(.+?)\/>/i';
$replace_with = '';
$replace = preg_replace($pattern, $replace_with, $replace);

// 4: write the updated SQL content to a file
file_put_contents('/path/to/new_file.sql', $replace);
?>


Jon comments:

Cheers Luke - just to check 2 things:

* This will only strip the first image (there are some posts with images further down the post, which I don't want to strip out)
* This will work if the /img directory is defined as a relative path (still contains /pix/spotlight, but not mysite.com)

I'll give this a go once I hear back from you and tip a bit more once it's working.

Thanks again!


Luke America comments:

Jon, this will strip all images AND image-link-combos that are in the path "pix/spotlight/" ... as in your two examples.

It doesn't have to be a relative path ... it just has to contain "pix/spotlight/" in the path to the image (the link's path in inconsequential) ... only the image path matters. Also, the "mysite.com" portion is inconsequential ... only "pix/spotlight/" for the image location is taken into account.

If you have other images stored in a path that includes "pix/spotlight/" they will also be stripped. In fact, if that is the case, neither his solution nor my pattern replacement mod will work. Moreover, if you do have other images stored in that specific folder, there is no automated code method to remove the links/images.

I assumed (as did Maor, I'm sure) that you use the "pix/spotlight/" folder specifically for these spotlight images to be stored.

Also, when you're done and you've tested your site, you may want to delete those image files.



Jon comments:

OK, getting there...all of this is great, and I don't think the additional images are a problem as, as you say, they're not 'spotlights' - only problem is that the output SQL file is blank here? Am I missing anything obvious?

Happy to tip an extra, say, $15 to get this up and running?


Luke America comments:

My guess is that your server path is incorrect ... thus, the original file is never actually accessed ... nothing from nothing leaves nothing.

You might also try setting the file permissions on the folder and blank output file to 777 (with your FTP program) ... temporarily.

If these two don't solve it, you can email me a ZIP of the SQL file ... I'll run the filter routine on one of my servers ... and send it back to you.

[email protected]


Luke America comments:

Also, you might try putting the PHP file in the same folder as the SQL file. This way you avoid possible path issues.

OR ... let's revise the steps and the filter function a little. The last two steps can be handled differently to your preference.

1. create a folder named "test"
2. set the permissions for "test" folder to 777 (may not be needed)
3. create a blank file inside of this folder named "db_filter.php"
4. put the code below into "db_filter.php"
5. copy your SQL file to the same folder and name it "db_old.sql"
6. open "http://mysite.com/test/db_filter.php" in your browser
7. It will now display progress messages...
8. refresh your FTP file list (usually F5) to view the newly created file
9. When done, open "db_new.sql" to make sure it has content
10. If not, let me know (stop here)
11. If so, create a new database:
a. use the same password as the original database
b. import "db_new.sql" into the new database
12. now, update wp-config.php with the new database name

I've done a test run and this works correctly ... and quickly.


<?php

// 1: get the SQL content from the file
$orig_file = file_get_contents('db_old.sql');
if (strlen($orig_file) == 0)
{
echo 'db_old.sql file is empty!';
exit;
}
else
{
echo 'db_old.sql read successful.<br/><br/>';
}


// 2: link/image combo replacement
$pattern = '/<a (.+?)><img (.+?)pix\/spotlight(.+?)<\/a>/i';
$replace_with = ' '; // that's a blank space
$replace = preg_replace($pattern, $replace_with, $orig_file);
echo 'link/image replacements completed.<br/><br/>';


// 3: image-only replacement
$pattern = '/<img (.+?)pix\/spotlight(.+?)>/i';
$replace_with = ' '; // that's a blank space
$replace = preg_replace($pattern, $replace_with, $replace);
echo 'image-only replacements completed.<br/><br/>';


// 4: remove double spaces
$replace = str_replace(' ', ' ', $replace); // two blank spaces and an empty string
echo 'clean up completed.<br/><br/>';


// 5: write the updated SQL content to a file
file_put_contents('db_new.sql', $replace);
echo 'open db_new.sql to view the content.<br/><br/>';

?>


Jon comments:

Still seems to be outputting a blank file, even though all of these pop up successfully:

db_old.sql read successful.
link/image replacements completed.
image-only replacements completed.
clean up completed.
open db_new.sql to view the content.


I could send this through, but the problem is that once I work out what's going on here and have the site set up, I'm going to need to go through the whole process again the latest version of the content prior to launch. I guess if we can work out what's causing it, I can probably fix and repeat when I need to?


Luke America comments:

Did you do a CHMOD on the "test" folder to 777? Also, try 755.

After any failed attempt, try deleting the empty file, db_new.sql.

BTW, I've now run the code on three servers and it's worked correctly in each scenario.






Jon comments:

Just tried those again, but no luck. I've also tried on a couple of servers so I guess there might be something in the DB that is causing this to go wrong. I'm away until Monday, but I'll email you through codesnippets when I get back if that's OK with you? I'm happy to pay for you to give me a hand and think we may have exhausted all we can do here.


Luke America comments:

Sure; you can contact me when you get back. At this point, I can't offer any explanation as to why you get a resultant empty file. Maybe you could send me the first half of the SQL file to see what I encounter.