If your web application accepts content or input from your users, it’s nice to be able to display it in a useful format back to them. For example, some web sites auto-link or convert text presented in a URL format as a hyperlink to improve the user experience. The user may type in the following URL into a form.
On display, our auto-linking script would then convert that to:
It’s also nice to provide more web savvy users the ability to use certain HTML tags (like
em) in an unescaped format when it’s displayed back to the user. We have taken the approach provided by Chirs Shiflett to allow HTML and prevent XSS. And so when the user enters the following into a field.
It will be escaped to prevent any XSS attacks.
And then run through an HTML sanitize script to allow certain safe tags to be displayed properly.
When used in combination (and in a way to prevent security breaches), auto-linking and allowing approved HTML tags can lead to some unexpected formatting. The problem with using the two techniques together is that the auto-linking script has to be smart enough to not link anything inside of an
a tag. For example, this would cause the following input
which has a url inside of a link to convert to the undesirable :
<a href="<a href="http://particletree.com">http://particletree.com</a>">Particletree</a>
None of the PHP auto-linking scripts that we found accounted for this and so we had to add the following regex look behind as a solution.
$text = preg_replace("'(?<!=\")(http|ftp)://([\w\+\-\@\=\?\.\%\/\:\&\;~\|]+)(\.)?'", "<a title=\"Go to \\1://\\2\" href=\"\\1://\\2\">\\1://\\2</a>", $text);
Hope that helps others looking for a similar solution.