Preventing Malicious User Input in PHP
Although there are many things a security conscious developer must know when writing web applications, there are a few things that are especially important to keep in mind when developing a website which serves dynamic content. In this tutorial, I will show you several of the important security considerations which should be put in place when developing dynamic websites in PHP, though it is by no means all inclusive.
So what exactly qualifies as user input? For the purposes of this article, any data which comes from a users computer could be modified by the user, so it makes sense to treat all of it as user input. Even things which a legitimate user wouldn't modify, like cookies, need to be examined because a malicious user could change them.
Regardless of if you are starting an application from scratch or reviewing an existing application, the first thing to determine
is where the application uses user input, because all of these pages must be examined. If register_globals is enabled,
every single PHP page on your site takes user input, and every single page
must therefore be inspected. A more prudent option, however, would be to disable register_globals. If register_globals
is disabled, a one way to figure out which existing pages make use of user input is to search your files for references to $_GET,
$_POST, $_COOKIE, $_REQUEST, $_FILES, and $HTTP_RAW_POST_DATA. Many website
development packages have the ability to search across a number of files, so that might be one way of checking for references to these variables.
Although these are some of the most common ways to access user input, remember that there are still other possible ways, so just because a script doesn't
reference any of these variables doesn't mean you shouldn't check it anyway. For example, if you have custom session handlers, then
the text of the session cookie would also be treated as user input, and you would need to identify code which makes use of that value.
Once you have identified a list of pages to audit, the next step is to examine what the application does with the user input.
If the only thing being done with the data is using an if statement to evaluate if it equals something and then
serving up a page based on that, chances are there isn't much risk. If you are considering designing a website using this whitelist
method, it isn't terrible, but if you are serving a large number of pages, it scales terribly.
More often than not, however, you will be passing the user submitted information to some sort of function to evaluate it. In these
scenarios, you can identify the preconditions for each of these functions (to work as intended) and ensure that any value for the user input would not
violate any of the preconditions. Some common functions whose preconditions unmodified user-supplied data could violate
include mysql's and mysqli's query functions. For these, make sure you use their respective real_escape_string
functions on user-supplied data to ensure it does not contain characters which would modify the structure of the query. Failing to do
this can open you up to SQL injection attacks. Another common function is fopen.
In order to function as the developer intends, it has the precondition of being supplied a path to a file which the user should be able to
see. In other words, in most cases, it will open the file regardless of where it is, regardless of if you call the folder private or use
Apache rewrite rules to hide it. This means that you probably shouldn't call it if the user-supplied path contains ../
or anything else that could be problematic. You could use some regex like
/^[insert allowed characters here with proper escape sequences when necessary]*$/, along with other regex to filter out certain
undesired paths. Two other functions which people are often less vigilant about are the standard output ones (echo and
print). If you are returning any sort of user input, make sure you properly escape it so that someone can't send your
users a link with malicious JavaScript. A few other functions to watch out for (though this list is non-inclusive) include eval,
require, include, eval, and $variable_containing_user_submitted_data(), if they are passed user input (and yes, I am aware that
some of these are language constructs and not functions in the PHP sense).
Having to continually check through code to make sure that things are properly escaped and checked is time consuming, so it is useful to find a more scalable way of implementing this sort of security. Although some things like MySQL have ready-made ways of doing this like prepared statements, if you are using your own functions, one way of making this easier is to wrap the potentially vulnerable functions with your own function that will exhibit the expected behavior with all possible inputs, and to then only call these safe functions from your code.