If your web application accepts content or input from your users, it’s nice to be able to display it in a useful format back to them. For example, some web sites auto-link or convert text presented in a URL format as a hyperlink to improve the user experience. The user may type in the following URL into a form.

http://particletree.com

On display, our auto-linking script would then convert that to:

<a href="http://particletree.com">http://particletree.com</a>

It’s also nice to provide more web savvy users the ability to use certain HTML tags (like a, strong, em) in an unescaped format when it’s displayed back to the user. We have taken the approach provided by Chirs Shiflett to allow HTML and prevent XSS. And so when the user enters the following into a field.

<a href="http://particletree.com">Particletree</a>

It will be escaped to prevent any XSS attacks.

&lt;a href=&quot;http://particletree.com&qout;&gt;Particletree&lt;/a&gt;

And then run through an HTML sanitize script to allow certain safe tags to be displayed properly.

<a href="http://particletree.com">Particletree</a>

When used in combination (and in a way to prevent security breaches), auto-linking and allowing approved HTML tags can lead to some unexpected formatting. The problem with using the two techniques together is that the auto-linking script has to be smart enough to not link anything inside of an a tag. For example, this would cause the following input

<a href="http://particletree.com">Particletree</a>

which has a url inside of a link to convert to the undesirable :

<a href="<a href="http://particletree.com">http://particletree.com</a>">Particletree</a>

None of the PHP auto-linking scripts that we found accounted for this and so we had to add the following regex look behind as a solution.

$text = preg_replace("'(?<!=\")(http|ftp)://([\w\+\-\@\=\?\.\%\/\:\&\;~\|]+)(\.)?'", "<a title=\"Go to \\1://\\2\" href=\"\\1://\\2\">\\1://\\2</a>", $text);

Hope that helps others looking for a similar solution.

HTML Form Builder
Ryan Campbell

Smarter Auto-Linking by Ryan Campbell

This entry was posted 2 years ago and was filed under Notebooks.
Comments are currently closed.

· 11 Comments! ·

  1. Joao Prado Maia · 2 years ago

    Very useful, thanks.

  2. Markus · 2 years ago

    It is only a matter of execution sequence. If the auto-linking happen before the HTML cleanup, there would be no conflicts.

  3. Ryan Campbell · 2 years ago

    Markus, we do a good amount with Smarty, so we made the autolinking available as a modifier. The unescaping, on the other hand, is a good amount of code and is needed everywhere, so we do that on the PHP side. It is a specific oddity we ran into, but one that others may run into, so it still may help in some circumstances.

  4. Russell · 2 years ago

    Another good thing is to not auto-link things inside

    <pre> </pre>

    tags. This is a very annoying ‘feature’ in all of 37signals’ products.

  5. Niyaz PK · 2 years ago

    Thanks for that.

  6. hypotheek · 2 years ago

    what a nice solution! going to test it out! thanks alot!

  7. sean · 2 years ago

    Hey, I want to learn about apis and how to build one and can find no good book on it, do you know any good book that i could buy my email address is in this comment.

  8. gossard · 2 years ago

    As a corollary to the point about Fitts’s Law not addressing movement in multiple dimensions or amidst distractions, consider that the notion of a target is relative. Sure you may want the user to click a particular button, but if the layout provides all germane interaction handles in visual clusters, that cluster can become the initial target. During the movement toward the larger target, the user may subdivide and segregate specific target from distraction. By the time they’ve discerned their specific target, the distance that they’ll need to cross is lessened, and they’re already in motion.

    No big science here. Just sharing a thought :)

  9. maomao · 2 years ago

    Markus, we do a good amount with Smarty, so we made the autolinking available as a modifier. The unescaping, on the other hand, is a good amount of code and is needed everywhere, so we do that on the PHP side. It is a specific oddity we ran into, but one that others may run into, so it still may help in some circumstances.

  10. stone KID · 2 years ago

    I think Alan Cooper pointed out that Fitt’s Law implies that besides the corners, the easiest target to acquire is “the current location of the pointer.” IOW, the biggest button is one you don’t have to move to at all. AFAIK this is not often used.

  11. Rexibit Web Services · 2 years ago

    I really like this. I am thinking of doing something similar with a CMS I am working on.