Ask your WordPress questions! Pay money and get answers fast! Comodo Trusted Site Seal
Official PayPal Seal

Regex to replace line breaks but with exclusions for HTML lists? WordPress

  • SOLVED

Currently I use the following regex to replace line ends with <br/> tags:

preg_replace('/(\015\012)|(\015)|(\012)/','<br/>',$text)

I now discovered an issue when the text includes an ordered or unordered HTML list:

text text
<ul>
<li>blabla</li>
<li>bla
bla</li>
</ul>
text text


which looks like this after the regex is run:

text text<br/>
<ul><br/>
<li>blabla</li><br/>
<li>bla<br/>
bla</li><br/>
</ul><br/>
text text


This leads to problems with tinyMCE as the extra <br/> within the HTML list add another unordered list within the list and thus breaks the original list, see the following HTML output in tinyMCE after the regex is applied:

<ul>
<ul>
<li>blabla</li>
</ul>
</ul>
&nbsp;
<ul>
<li>bla
bla</li>
</ul>
&nbsp;

&nbsp;
text text


So I would need someone to adapt the regex to
1. replace all new lines like the original regex
2. BUT ignores new lines directly after <ul>, <ol> and </li> tags and
3. DOES NOT ignore new lines between <li> and </li>

so the result should look like this:

text text<br/>
<ul>
<li>blabla</li>
<li>bla<br/>
bla</li>
</ul><br/>
text text


As I am not sure if the above will work with tinyMCE as desired, I would also need an alternative regex which
1. replaces all new lines like the original regex
2. BUT ignores new lines betweet <ul> and </ul> tags as well as <ol> and </ol> tags

so that the result looks like this:

text text<br/>
<ul>
<li>blabla</li>
<li>bla
bla</li>
</ul><br/>
text text


Thanks!

Answers (1)

2014-05-16

Dbranes answers:

Hi, you could try this


$text = preg_replace( '/(\015\012)|(\015)|(\012)/','<br />', $text );
//$text = nl2br( $text );

$from = array(
'#<ul>(\s)*(<br\s*/?>)*(\s)*<li>#si',
'#</li>(\s)*(<br\s*/?>)*(\s)*<li>#si',
'#</li>(\s)*(<br\s*/?>)*(\s)*</ul>#si',
'#<ol>(\s)*(<br\s*/?>)*(\s)*</li>#si',
'#</li>(\s)*(<br\s*/?>)*(\s)*</ol>#si',
);
$to = array(
'<ul><li>',
'</li><li>',
'</li></ul>',
'<ol><li>',
'</li></ol>'
);

$text = preg_replace( $from, $to ,$text);


Then we could adjust it further.


rmaxwell comments:

using nlbr() is a good approach, but for the second part I would need a more generic approach


rmaxwell comments:

oh, I think I answered too quickly - could work. will test and let you know the results


Dbranes comments:

It was probably to greedy, so I adjusted it a little bit (please see the updated answer)

This seems to work for few list I tested.


rmaxwell comments:

looks quite good so far. Is there a reason why you chose not to use nl2br()? shouldnt that actually do the same as my regex? even perhaps a bit faster & more compatible?


rmaxwell comments:

well, seems like we´re nearly there with that code. There´s just one issue left:

as a HTML list has the default style display:block; the following text automatically starts in a new line (if there is no float). So adding a <br/> after the </ul> is not needed, as this adds an extra blank space to the output that cant be removed. So the actual output should not be

text text<br/>
<ul>
<li>blabla</li>
<li>bla<br/>
bla</li>
</ul><br/>
text text


but

text text<br/>
<ul>
<li>blabla</li>
<li>bla<br/>
bla</li>
</ul>
text text


Is it possible to please change your regex to not remove the first linebreak after a </ul> and </ol> tag?


Dbranes comments:

ok great, I was just testing your code instead of nl2br.

You could try something like:

$from = array(
'#<ul>\s*(<br\s*/?>)*\s*<li>#si',
'#</li>\s*(<br\s*/?>)*\s*<li>#si',
'#</li>\s*(<br\s*/?>)*\s*</ul>#si',
'#<ol>\s*(<br\s*/?>)*\s*</li>#si',
'#</li>\s*(<br\s*/?>)*\s*</ol>#si',
'#</ul>\s*(<br\s*/?>){1}#si',
'#</ol>\s*(<br\s*/?>){1}#si',
);
$to = array(
'<ul><li>',
'</li><li>',
'</li></ul>',
'<ol><li>',
'</li></ol>',
'</ul>',
'</ol>'
);


where I use <em>{1}</em> to match a single instance of <em><br /></em>.


rmaxwell comments:

thanks! I also removed the <br/> before <ol> and <ul> and fixed one issue with <ol><br/><li> from your code and now everything works :-) thanks a lot for your help!!!

for the records the pattern used:

$from = array(
'#<ul>(\s)*(<br\s*/?>)*(\s)*<li>#si',
'#</li>(\s)*(<br\s*/?>)*(\s)*<li>#si',
'#</li>(\s)*(<br\s*/?>)*(\s)*</ul>#si',
'#<ol>(\s)*(<br\s*/?>)*(\s)*<li>#si',
'#</li>(\s)*(<br\s*/?>)*(\s)*</ol>#si',
'#(<br\s*/?>){1}\s*<ul>#si',
'#(<br\s*/?>){1}\s*<ol>#si',
'#</ul>\s*(<br\s*/?>){1}#si',
'#</ol>\s*(<br\s*/?>){1}#si',
);
$to = array(
'<ul><li>',
'</li><li>',
'</li></ul>',
'<ol><li>',
'</li></ol>',
'<ul>',
'<ol>',
'</ul>',
'</ol>'
);