Unsafe string replacement
Contents
Overview
Unsafe use of string replacement functions to sanitize user input is extremely common. Because string replace (str_replace in PHP) functions only do a single replacement, it is necessary to loop over them until all unsafe characters or strings are removed if you are replacing more than a single character. This is a language agnostic vulnerability, effecting any application which incorrectly uses string replacement for sanitization.
Example
A trivial example:
<?php $filepath = $_GET['file']; $safe_filepath = str_replace('../', '', $filepath); echo("Safe filepath is '" . $safe_filepath . "'<br />"); include($safe_filepath); ?> |
First an attacker may try a simple File Inclusion attack, using '../' to escape. The result:
Safe filepath is 'etc/passwd'
No dice, the dangerous string ('../') is dutifully sanitized by str_replace. But, our attacker isn't going to give up yet, now armed with the knowledge that '../' is being filtered out, he may try:
test.php?file=....//....//....//....//....//....//....//....//....//....//....//....//....//etc/passwd
The result:
Safe filepath is '../../../../../../../../../../../etc/passwd' [contents of /etc/passwd]
Even if '../' is replaced twice, it can be easily bypassed by using ......///. No matter how many times the replacement is made, the attacker simply needs to nest another layer.
Other examples of unsafe uses of string replacement include:
str_replace('<?', '', $source); |
Bypassed by '<<??'
str_replace(array('<script', '<img'), '', $source); |
Bypassed by '<<imgscript>'
str_replace('file://', '', $source); |
Bypassed by 'file:/file:///'
Defense
If one must use str_replace for sensitization, it must be done recursively (thought this implementation uses a loop, for efficiency reasons):
function safe_str_replace($search, $replace, $subject) { while(strstr($subject, $search) !== FALSE) { $subject = str_replace($search, $replace, $subject); } return $subject; } |
However, the use of whitelists of 'positive' regex matching (i.e. does the input match /[a-z]+/) is more effective.