Use DOM To Remove HTML Tag

We better use DOM to parse HTML content than Regex. For example we have a content like this.

  1. <p><img class="aligncenter size-full wp-image-3172" src="//wordpress.org/news/files/2014/04/theme1.jpg" alt="theme" width="1003" height="558">
  2. <p class="wp-caption-text">Wordpress Theme</p><br>
  3. Looking for a new theme should be easy and fun. Lose yourself in the boundless supply of free WordPress.org themes with the beautiful new theme browser.</p>

We want to strip out the p tag with class=”wp-caption-text” and the content (“WordPress Theme”). The tag is right below the the image. I made a function to accomplish the task. Here is the code.

  1. <?php
  2. function remove_content_in_tag($content,$tag,$attribute){
  3. $doc=new DOMDocument();
  4. libxml_use_internal_errors(true);
  5. $doc->loadHTML($content);
  6. $xpath=new DOMXPath($doc);
  7. $nlist=$xpath->query("//".$tag);
  8. if($attribute != "")
  9. $nlist=$xpath->query("//".$tag."[@".$attribute."]");
  10.  
  11. for($i=0;$i<$nlist->length;$i++){
  12. $node=$nlist->item($i);
  13. $node->parentNode->removeChild($node);
  14. }
  15.  
  16. $c_modified=$doc->saveHTML();
  17. return $c_modified;
  18. }
  19. ?>

To call the function and to remove the tags, You can use it like this.

  1. <?php
  2. echo remove_content_in_tag($content,"p","class='wp-caption-text'");
  3. ?>

Here is the complete code.

  1. <?php
  2. $content = <<<EOF
  3. <p><img class="aligncenter size-full wp-image-3172" src="//wordpress.org/news/files/2014/04/theme1.jpg" alt="theme" width="1003" height="558"><p class="wp-caption-text">Wordpress Theme</p><br>
  4. Looking for a new theme should be easy and fun. Lose yourself in the boundless supply of free WordPress.org themes with the beautiful new theme browser.</p>
  5. EOF;
  6.  
  7. //echo $content;
  8.  
  9. echo remove_content_in_tag($content,"p","class='wp-caption-text'");
  10. //echo remove_content_in_tag($content,"p","");
  11. //echo remove_content_in_tag($content,"img","");
  12.  
  13. function remove_content_in_tag($content,$tag,$attribute){
  14. $doc=new DOMDocument();
  15. libxml_use_internal_errors(true);
  16. $doc->loadHTML($content);
  17. $xpath=new DOMXPath($doc);
  18. $nlist=$xpath->query("//".$tag);
  19. if($attribute != "")
  20. $nlist=$xpath->query("//".$tag."[@".$attribute."]");
  21.  
  22. for($i=0;$i<$nlist->length;$i++){
  23. $node=$nlist->item($i);
  24. $node->parentNode->removeChild($node);
  25. }
  26.  
  27. $c_modified=$doc->saveHTML();
  28. return $c_modified;
  29. }
  30. ?>

Loading