PHP Security Guide: Sessions
Session security is a sophisticated topic, and it’s no surprise that sessions are a frequent target of attack. Most session attacks involve impersonation, where the attacker attempts to gain access to another user’s session by posing as that user.
The most crucial piece of information for an attacker is the session identifier, because this is required for any impersonation attack. There are three common methods used to obtain a valid session identifier:
Prediction refers to guessing a valid session identifier. With PHP’s native session mechanism, the session identifier is extremely random, and this is unlikely to be the weakest point in your implementation.
Capturing a valid session identifier is the most common type of session attack, and there are numerous approaches. Because session identifiers are typically propagated in cookies or as GET variables, the different approaches focus on attacking these methods of transfer. While there have been a few browser vulnerabilities regarding cookies, these have mostly been Internet Explorer, and cookies are slightly less exposed than GET variables. Thus, for those users who enable cookies, you can provide them with a more secure mechanism by using a cookie to propagate the session identifier.
Fixation is the simplest method of obtaining a valid session identifier. While it’s not very difficult to defend against, if your session mechanism consists of nothing more than session_start(), you are vulnerable.
In order to demonstrate session fixation, I will use the following script, session.php:
Upon first visiting the page, you should see 1 output to the screen. On each subsequent visit, this should increment to reflect how many times you have visited the page.
To demonstrate session fixation, first make sure that you do not have an existing session identifier (perhaps delete your cookies), then visit this page with ?PHPSESSID=1234 appended to the URL. Next, with a completely different browser (or even a completely different computer), visit the same URL again with ?PHPSESSID=1234 appended. You will notice that you do not see 1 output on your first visit, but rather it continues the session you previously initiated.
Why can this be problematic? Most session fixation attacks simply use a link or a protocol-level redirect to send a user to a remote site with a session identifier appended to the URL. The user likely won’t notice, since the site will behave exactly the same. Because the attacker chose the session identifier, it is already known, and this can be used to launch impersonation attacks such as session hijacking.
A simplistic attack such as this is quite easy to prevent. If there isn’t an active session associated with a session identifier that the user is presenting, then regenerate it just to be sure:
The problem with such a simplistic defense is that an attacker can simply initialize a session for a particular session identifier, and then use that identifier to launch the attack.
To protect against this type of attack, first consider that session hijacking is only really useful after the user has logged in or otherwise obtained a heightened level of privilege. So, if we modify the approach to regenerate the session identifier whenever there is any change in privilege level (for example, after verifying a username and password), we will have practically eliminated the risk of a successful session fixation attack.
Arguably the most common session attack, session hijacking refers to all attacks that attempt to gain access to another user’s session.
As with session fixation, if your session mechanism only consists of session_start(), you are vulnerable, although the exploit isn’t as simple.
Rather than focusing on how to keep the session identifier from being captured, I am going to focus on how to make such a capture less problematic. The goal is to complicate impersonation, since every complication increases security. To do this, we will examine the steps necessary to successfully hijack a session. In each scenario, we will assume that the session identifier has been compromised.
With the most simplistic session mechanism, a valid session identifier is all that is needed to successfully hijack a session. In order to improve this, we need to see if there is anything extra in an HTTP request that we can use for extra identification.
It is unwise to rely on anything at the TCP/IP level, such as IP address, because these are lower level protocols that are not intended to accommodate activities taking place at the HTTP level. A single user can potentially have a different IP address for each request, and multiple users can potentially have the same IP address.
Recall a typical HTTP request:
GET / HTTP/1.1
User-Agent: Mozilla/5.0 Gecko
Accept: text/xml, image/png, image/jpeg, image/gif, */*
Only the Host header is required by HTTP/1.1, so it seems unwise to rely on anything else. However, consistency is really all we need, because we’re only interested in complicating impersonation without adversely affecting legitimate users.
Imagine that the previous request is followed by a request with a different User-Agent:
GET / HTTP/1.1
User-Agent: Mozilla Compatible (MSIE)
Accept: text/xml, image/png, image/jpeg, image/gif, */*
Although the same cookie is presented, should it be assumed that this is the same user? It seems highly unlikely that a browser would change the User-Agent header between requests, right? Let’s modify the session mechanism to perform an extra check:
Now an attacker must not only present a valid session identifier, but also the correct User-Agent header that is associated with the session. This complicates things slightly, and it is therefore a bit more secure.
Can we improve this? Consider that the most common method used to obtain cookie values is by exploiting a vulnerable browser such as Internet Explorer. These exploits involve the victim visiting the attacker’s site, so the attacker will be able to obtain the correct User-Agent header. Something additional is necessary to protect against this situation.
Imagine if we required the user to pass the MD5 of the User-Agent in each request. An attacker could no longer just recreate the headers that the victim’s requests contain, but it would also be necessary to pass this extra bit of information. While guessing the construction of this particular token isn’t too difficult, we can complicate such guesswork by simply adding an extra bit of randomness to the way we construct the token:
Keeping in mind that we’re passing the session identifier in a cookie, and this already requires that an attack be used to compromise this cookie (and likely all HTTP headers as well), we should pass this fingerprint as a URL variable. This must be in all URLs as if it were the session identifier, because both should be required in order for a session to be automatically continued (in addition to all checks passing).
In order to make sure that legitimate users aren’t treated like criminals, simply prompt for a password if a check fails. If there is an error in your mechanism that incorrectly suspects a user of an impersonation attack, prompting for a password before continuing is the least offensive way to handle the situation. In fact, your users may appreciate the extra bit of protection perceived from such a query.
There are many different methods you can use to complicate impersonation and protect your applications from session hijacking. Hopefully you will at least do something in addition to session_start() as well as be able to come up with a few ideas of your own. Just remember to make things difficult for the bad guys and easy for the good guys.
Some experts claim that the User-Agent header is not consistent enough to be used in the way described. The argument is that an HTTP proxy in a cluster can modify the User-Agent header inconsistently with other proxies in the same cluster. While I have never observed this myself (and feel comfortable relying on the consistency of User-Agent), it is something you may want to consider.
The Accept header has been known to change from request to request in Internet Explorer (depending on whether the user refreshes the browser), so this should not be relied upon for consistency.