This wide- and large- screen layout may not work quite right without Javascript.

Maybe enable Javascript, then try again.

Home Fiddling with PCs

Apache mod_rewrite in .htaccess in Depth (including precedence)

If you wish you can skip all the other information about mod_rewrite in this webpage and jump directly to the section on precedence of AND and OR in rulesets.

You could also for now skip over the details directly to the section summarizing just a few frequent questions.

If you're wrestling with a very unusual and complex problem, or if you foresee a whole set of very intricate .htaccess files, knowing more of the details of exactly what mod_rewrite does may be helpful. So this webpage presents a collection of arcane tidbits about mod_rewrite. You may though instead prefer either a much simpler introduction just to munging URIs or an overview of all the different things .htaccess in Apache can be used for.

First consider whether or not you actually need to use mod_rewrite at all. There are often other -simpler- ways. And sometimes the best way is no way at all. Sometimes you can do just a couple very straightforward things with mod_rewrite simply by cut and paste of precomposed short generic sequences without having to really understand what's going on. However the use of mod_rewrite quickly gets to the point where many copy-and-paste sequences don't work, and figuring out what to tweak requires a thorough understanding of the details of how mod_rewrite works.

Remember the KISS (Keep It Short Stupid) principal when using mod_rewrite. The full rewrite process has to occur for every web request, whether or not that particular page is affected. Something as arcane and error-prone and hard to debug as the over-use of mod_rewrite should not be in the main path of a high-volume service. A seeming need to use mod_rewrite extensively may indicate the website needs to be reorganized or may suggest inadequate design or implementation elsewhere.

An .htaccess file that invokes mod_rewrite can be deployed on the majority of sytems. However debugging of that file can be difficult without full root access to the whole system. A principal debugging technique is to enable and then scan the very detailed log file produced by mod_rewrite. (The file is probably named /var/log/httpd/rewrite_log.) However enabling that log file is only a system-wide function which is generally forbidden on shared webservers because it would dramatically affect other users too. There are only two realistic options: i keep your use of mod_rewrite extremely simple so bugs don't arise in the first place, or ii test/debug on a system you fully control (even though you're going to eventually deploy on a shared webserver).

Execution of Apache Modules in General

Individual modules may access and process .htaccess files at different times. It's sometimes very hard to figure out the relation between two different modules. Rather than even try, just arrange your .htaccess files so there are no dependencies between statements that will be handled by different modules. If you use mod_rewrite, use only mod_rewrite for rewriting (not also mod_alias).

If you see both RewriteCond/RewriteRule and Redirect/RedirectMatch/Alias/AliasMatch lines in your .htaccess, change it to do everything with one or the other.

(If for some reason you have to mix mod_alias with mod_rewrite, for starters know that despite any attempt to order the statements in the .htaccess file, all mod_rewrite statements will always be executed first, then all mod_alias statements will be executed in a separate pass using the result of the mod_rewrite statements as their starting point. Also, recognize that intelligently linking the two modules by some sort of conditional expression or an environment variable is difficult if not impossible.)

If you use mod_rewrite to set or read environment variables ([E=...], %{ENV:...}), then try to avoid using mod_env (SetEnv and SetEnvIf) at all. If you must use both modules, then at least avoid having either module make any assumptions about exactly when the other module's statements will be executed. Better yet, simply completely avoid having either module read any variable that was set by the other module.

The one general rule of thumb that seems to always apply to all modules is that the physical order of lines has little effect. For example, even if RewriteOptions Inherit is placed before all the local RewriteCond/RewriteRule, the inherited rules will be executed after the local ones. You can generally expect the few cases where the order of a module's statements is important and useful to be well- and explicitly-documented.

Execution of mod_rewrite (location, order, inheritance)

It's generally true with .htaccess files that all such files all the way from the subdirectory the request points to clear up to the webserver's root directory will be consulted, and that whenever there are conflicting instructions the .htaccess file nearer the relevant subdirectory will take precedence. Thus it's fairly easy to set things like Error documents and Caching site-wide, yet allow individual subdirectories to override those settings. However this is not true of mod_rewrite.

mod_rewrite of a request will start out in the subdirectory the request points to. If there's a .htaccess file and it contains any mod_rewrite statements at all (and it includes RewriteEngine on), it will be executed by mod_rewrite. But if there's no .htaccess file at all, or there is a .htaccess file but it doesn't contain any mod_rewrite statments at all in it (not even RewriteEngine off), then mod_rewrite will continue its search with the parent directory. This process will continue until mod_rewrite finds a suitable .htaccess file, possibly all the way up to the website's root directory.

Contrary to what happens in most cases with apache .htaccess files, the default behavior of mod_rewrite is to execute only one .htaccess file, rather than all of them all the way up to the website's root directory (this can of course be changed if you wish, we're just talking about the default behavior). Thus a naive usage with only one .htaccess file in the website's root directory for the entire website works fine (and in fact is often the best way to use mod_rewrite). Likewise .htaccess files in subdirs that override things like caching durations or error documents but contain no mod_rewrite statements (not even RewriteEngine ...) will work as expected.

(Take some care to avoid .htaccess files in subdirs that have RewriteEngine ... in them but no other mod_rewrite statements for mod_rewrite. That file in itself won't hurt anything, but it will keep the .htaccess file in the parent directory from being executed by mod_rewrite.)

If you want to have mod_rewrite execute more than one .htaccess file for only one request, explicitly include RewriteOptions Inherit (or ... InheritBefore or ... InheritDown or ... InheritDownBefore or ... IgnoreInherit, which provide convenience and may be less error-prone, but in each case are not supported by some versions). In any .htaccess file, RewriteOptions Inherit will cause mod_rewrite to execute not only its statements in that file but also any mod_rewrite statements in the .htaccess file in the parent directory. If the parent .htaccess also includes an explicit RewriteOptions Inherit, then the statements from the .htaccess file in its parent directory will be executed also. Thus it's possibleby including RewriteOptions Inherit in all subdirectories to have mod_rewrite execute many .htaccess files all the way from the subdir of the request up to the website root directory for a single request. (Just because it's possible doesn't mean it's always a good idea though.)

(Be a little wary of RewriteOptions Inherit in the .htaccess file in the website root directory. Depending on how the webserver is configured, such an option may include statements from the httpd.conf file, statements which you may not not understand [or be able to change]. And it's not always immediately clear in this context whether parent means the containing Directory or the containing Virtual Host.)

Using mod_rewrite

General Availability

The mod_rewrite extension to Apache must be available on the system. Usually this means i] adding the mod_rewrite binary to the Apache modules directory, and ii] adding a LoadModule directive for it to the system-wide Apache configuration. If you're using a shared webserver, and mod_rewrite is not already available, you may be able to get the provider to add it. But if the provider won't add it, you won't be able to use mod_rewrite at all; there is no hack-around to use an Apache module the provider doesn't want you to use (for whatever reason:-).

Unlike most Apache modules which when loaded also automatically become active, the mod_rewrite extension to Apache must be not only loaded but also explicitly turned on (RewriteEngine on). Usually this is left to each .htaccess file.

The <IfModule> Apache directive can be problematic. It can be used to prevent your mod_rewrite statements from being parsed if mod_rewrite is not available on the system. This prevents error exits; it allows your .htaccess file to be parsed appropriately even on systems where mod_rewrite is not available. However it may not actually make your webserver work correctly.

With the <IfModule> Apache directive your mod_rewrite configuration may simply be ignored, no other technology with similar net behavior will be substituted. You probably had a good reason for trying to use mod_rewrite in the first place, and if the module is not available your website will at best have somewhat reduced functionality, and at worst be effectively unusable. You can sometimes use the <IfModule> Apache directive to implement graceful degradation -for example silently correcting typos and accepting alternate names only when the module is available- but only if you plan ahead. Graceful degradation is not automatic.

If there are any <AllowOverrideList> directives in the system-wide Apache configuration (which is seldom the case), your provider apparently has excruciatingly detailed rules about exactly what an .htaccess file is and is not allowed to do and how it must do it. In this case by far the simplest solution is to consult your provider's detailed instructions or documentation. If your provider has such an Apache configuration, but does not provide adequate instructions or documentation, consider changing providers. It can probably be made to work anyway, but only with considerable effort by someone quite familiar with programming.

Individual Configuration

All the above is simply about the general enabling of the use of mod_rewrite. The detailed configuration of exactly what mod_rewrite should actually do is typically specified per-user (or per-directory) in your own .htaccess file(s) (and is the main subject of this webpage).

You will need access to your own .htaccess files, to edit them and also in most cases to create them. Although this file access is the default behavior automatically included by unix/linux Operating Systems, it may be disabled in various ways by specific providers. A provider may disable access to all files whose names start with a dot. Or a provider may disable access to all possible Apache configuration files. Or a provider may disable access to all files except actual webpages (i.e. *.html files). Or a provider may instruct Apache to not use the conventional filename .htaccess but instead use some alternate filename which is illegal or impossible to enter.

If your provider has disabled access to .htaccess files, request you be granted an exception. If the execption is refused, there is nothing you can do, as the default behavior of mod_rewrite in the absence of a specific configuration is to not do anything. You will need to either rethink your need for mod_rewrite, or consider moving your website to a different provider's system.

Exact Meaning of Per-Dir

Saying the path/filename given to RewriteRule is always per-dir means that what's given to RewriteRule is relative to the directory where the .htaccess file is. The rootpath, all parent directories, and the directory the .htaccess file is in, are stripped off. (rootpath is the path all the way from the top of the whole local filesystem to the topmost directory of the website.) Only what remains of the request (the rightmost part) is passed to RewriteRule. This has an important implication, and there's also an important additional fact:

The important implication is that whether RewriteRule gets only a filename, part of the pathname too, or all of the pathname too, depends on which .htaccess file mod_rewrite started execution with. For .htaccess files in leaf directories, only the filename will be passed. Yet if the exact same request is passed to a .htaccess file in the website root, the entire path (everything below rootpath of course) will be passed as well. (This is true even if additional mod_rewrite rules have been pulled in from a parent directory using RewriteOptions ...Inherit.... The filename presented per-dir will always be relative to the subdirectory containing the first mod_rewrite .htaccess file [not necessarily the one the rules came from].)

The important additional fact is there's no variable presenting the entire current path but nothing else. REQUEST_FILENAME -perhaps the closest that exists- is the entire local file name and path (rootpath/pathname/filename.fileextension). But parsing out just the relative pathname part is error-prone. Rather than reinventing the wheel (and risking getting it wrong), just add a copy of the code below every time just before you need to use the path. (Note though this exacts a price for always working correctly: it saps performance and may reduce clarity.) It uses an RE to isolate the filename.fileextension, which is the end of the string and contains no slashes. Then it removes rootpath (available in variable DOCUMENT_ROOT) from the start of the string, leaving the pathname relative to the website root. (It's rumored that some versions of Apache set REQUEST_FILENAME differently, interpreting full path from the root ... as from the website root rather than from the filesystem root. If such versions exist, the following would not work correctly on them without some additional statements.)

RewriteCond %{DOCUMENT_ROOT},%{REQUEST_FILENAME} ^([^,]*+),\1/*+(.*?/)/*+([^/]*)$ RewriteRule ^ - [E=CURPATH:%2,E=CURNAME:%3]

Perhaps the simplest is to only have one mod_rewrite .htaccess file and have it in the root directory of the entire website. That way the normal per-dir presentation will always include exactly the full pathname as well as the filename and extension.

Internal-redirect vs. External-redirect

The distinction between internal-redirect and external-redirect is very important; it's something you must understand before implemnting any rewriting at all . The behavior is very different and the rules of thumb are quite different. The purposes for which they're used are very different. A common problem is doing one of them, then expecting some behavior associated only with the other. Neither is appropriate in all cases - depending on your purposes you may need to use both. (Documentation sometimes uses the word rewrite and other times the word redirect to mean the same thing. We'll try to consistently use the word redirect ...but you can translate as necessary.)

Another important distiction is a relative address versus an absolute address. It applies to both URIs and local filenames. The first character of a relative address is not a slash; it generally makes sense only in a subdir ...although mod_rewrite knows how to combine it with the rest of the address to make a full address. An absolute address starts with a slash, and is interpreted all the way from the filesystem root. (Local filenames actually come in three flavors: iall the way from the root of the local filesystem, iifrom the root of the website, and iiifrom some subdirectory of the website. Most of the time iii is called relative, ii is called absolute, and the very few times i is talked about (which is usually clear from the context), somewhat confusingly it's also called absolute.)

So the rules of thumb are:

(As you have seen, whether or not the first character of a redirect address is a slash is quite important. So be sure a leading slash doesn't sneak in as part of your parsed variables. Beware code like:

# condition - includes leading slash RewriteCond %{REQUEST_FILENAME} (.*?/)/*+([^/]*)$ # action - leading slash sneaks in RewriteRule ^ %1%2

as the new address will start with a slash even though that's probably not what you intended. Instead code like either this:

# condition - excludes leading slash RewriteCond %{REQUEST_FILENAME} /*+(.*?/)/*+([^/]*)$ # action - conciously choosing NO leading slash RewriteRule ^ %1%2

or this

# condition - excludes leading slash RewriteCond %{REQUEST_FILENAME} /*+(.*?/)/*+([^/]*)$ # action - conciously choosing WITH leading slash RewriteRule ^ /%1%2'

so you can intentionally choose either no leading slash or leading slash no matter exactly what the input was or the intricacies of RE behavior.

(The above example is only to illustrate handling of leading slashes. It does not correctly handle every single unusual situation.)

Predefined Variables

A significant number of useful global named variables are built in to mod_rewrite. Virtually all of these are reset only at the start of processing of each new request, and never change over the entire duration of processing that request, no matter how much the request may be redirected or otherwise modified. The one exception is REQUEST_FILENAME, which always reflects the current  —possibly modified—  state of the request -- in other words it's updated immediately after every internal-redirect.

In addition there are the $n and %n variables within each ruleset (some of them are cleared for each ruleset, others are cleared even more frequently for each RewriteCond statment!). These predefined variables can be used anywhere within a ruleset - in fact a %n variable can even be used in the very next RewriteCond (within the same ruleset of course).

The $n variables are set by the pre-execution of the RewriteRule at the beginning of the ruleset, then are not changed until the next ruleset. By contrast, the %n variables are set all over again by every individual RewriteCond as it's executed (except a negated RewriteCond may leave the %n variables unchanged or may clear them or set them to nonsensical values). Perhaps the simplest expression of only what really matters is that the %n variables are reset for every executed line, whereas the $n variables are reset only once at the start of each ruleset.

Environment (Custom) Variables

You can think of environment variables as your custom variables that you can deploy in addition to the built-in ones. (Not 100% true - although environment variables are mostly custom variables you created, there are a few built-in variables too that are accessible via the same syntax.) Despite using the same name (environment variable), Apache environment variables are different from unix/linux shell variables (although the same syntax that reads the Apache variables can read the shell variables as well).

Be warned that referencing of environment variables is a little inconsistent. The couple tens of built-in variables that relate to the original request must be referenced simply as %{VARIABLE_NAME}, but additional user-created environment variables, and the REDIRECT_... variables sometimes created by mod_rewrite, must be referenced as %{ENV:VARIABLE_NAME} (note the ENV:). If you get this wrong, you will not get any sort of error indication, you'll just notice when debugging (scanning through the rewrite log, etc.) that the variable appears to always have the empty value no matter what. To make matters worse, available documentation of which variables require ENV: and which do not is not always correct.

If mod_rewrite execution returns to the top because of a [N]ext flag, all Apache variables -including custom environment variables- still exist with their old values.

If mod_rewrite execution returns to the top because of an [L]ast flag, all environment variables cease to exist under their old names. Old environment variables that were in some way modified at least once reappear with slightly different names according to the following pattern: %{ENV:XXX} becomes %{ENV:REDIRECT_XXX} (and in some cases %{YYY} becomes %{REDIRECT_YYY}, while in other cases it simply disappears). However custom environment variables that were never modified after they were created are completely lost. So to ensure that an environment variable is always carried forward, first set it to a blank value when creating it, then immediately re-set it to the real value; the change from the blank value to the real value will be counted as a modification, and the variable will be persisted. (So if you want the value [for example yyyyy] of environment variable XXX to always be available after an [L]ast, code as follows:

# first create the custom variable with an empty value # then immediately change it to the real desired value RewriteRule ^ - [E=XXX:,E=XXX:yyyyy]

In the same circumstance, what happens to the built-in variables seems inconsistent. Some of them (ex: PATH_INFO) disappear completely; some of them (ex: REMOTE_ADDR) retain their old value and their original name, some of them (ex: QUERY_STRING) reappear as an environment variable with a slightly different name (ENV:REDIRECT_QUERY_STRING), and a few brand new environment variables are created (ex: ENV:REDIRECT_STATUS). Despite this lack of an obvious pattern in mod_rewrite's behavior, as a rule of thumb every bit of information you might care about at any time is available somehow.

An internal-redirect without an [L]ast flag (RewriteRule ^ /foobar) does not start over at the top of the .htaccess file. Rather, after the name of the destination is rewritten as specified, execution simply continues in sequence with the next line. Environment variables are not affected at all.

If a Redirect is sent back to the client (ex: [R=301,L]), you might think of this as just a continuation of the same request including a round-trip to the client. But webservers (including Apache) see returning a code and a new Location back to the client as completing the request, and see the re-request from the client with the revised Location as a brand new request. What this means is that after any remote redirect ([R=nnn]), all context including all environment variables are completely cleared [environment variables don't even reappear under different names)]. (If the Redirect sent back to the client causes it to request a document from an entirely different subdirectory, the new request will probably be handled by an entirely different .htaccess file anyway, possibly never again reaching this partcular .htaccess file.)

The [N]ext Flag

Although it can sometimes appear to be quite convenient, the [N]ext flag is in fact very dangerous, so dangerous it's usually best to simply avoid it. It's way too easy for even mod_rewrite experts to inadvertently create a loop, causing the .htaccess to not work at all, the client's browser to lock up, and even the Apache server to become unresponsive (which is especially problematic on shared webservers).

Nevertheless, just to illustrate how it could be useful, here's an example of its correct use, in this case using internal-redirects to allow users to type underscores [_] even though all your filenames actually contain dashes [-].

RewriteRule ^/*+([^_]*+)_(.*)$ $1-$2 [N]

Perhaps it's a better idea to just unroll it rather than resorting to the [N]ext flag. This has the disadvantage of hard-coding some limit (in the example 6) on how many underscores can be changed for a single request, but the advantages of not interrupting the flow of the request through the .htaccess file and of guaranteeing there will never be a looping problem.

RewriteRule ^/*+([^_]*+)_([^_]*+)_([^_]*+)_(.*)$ $1-$2-$3-$4 RewriteRule ^/*+([^_]*+)_([^_]*+)_(.*)$ $1-$2-$3 RewriteRule ^/*+([^_]*+)_(.*)$ $1-$2

The examples above can be a problem though not only because of looping, but also because they use internal-redirects. http://..._... and http://...-... will fetch exactly the same webpage (file rootpath/...-... in this example). Some web indexes will sense duplicated content and will penalize your SEO. Here's the same example, except modified to use a single external-redirect to inform the client (possibly a spider-bot) of the correct spelling. As you can see, the example unfortunately becomes quite a bit longer and more arcane.

# set up "first time" RewriteCond %{ENV:ORIGWORKING} ^$ RewriteRule ^ - [E=ORIGWORKING:%{REQUEST_URI},E=WORKING:%{ENV:ORIGWORKING}] # # change one '_' to '-' RewriteCond %{ENV:WORKING} ^/*+([^_]*+)_(.*+)$ RewriteRule ^ - [E=WORKING:%1-%2,N] # # if anything has changed do external-redirect RewriteCond %{ENV:ORIGWORKING}!=%{ENV:WORKING} !^(.*?)!=\1$ RewriteRule ^ /%{ENV:WORKING} [R=301,L]

Again this can be unrolled to not use the [N]ext flag at all. The example below illustrates one way translating underscores to dashes could be done; it's the only one of these examples that could actually be deployed in real life. (Note that this example will sometimes cause more than one round trip back to the client. This is judged to be infrequent enough and benign enough to not worry about. And counterbalancing the disadvantage is the advantage of not limiting the number of underscores that will be corrected.)

RewriteRule ^/*+([^_]*+)_([^_]*+)_([^_]*+)_(.*)$ /$1-$2-$3-$4 [R=301,L] RewriteRule ^/*+([^_]*+)_([^_]*+)_(.*)$ /$1-$2-$3 [R=301,L] RewriteRule ^/*+([^_]*+)_(.*)$ /$1-$2 [R=301,L]

The [L]ast Flag

Unexpected Behavior of the [L]ast Flag

From its name, and from most documentation, you would expect that the [L]ast flag immediately terminates all mod_rewrite processing for that request ...and in the system global Apache configuration file (typically httpd.conf) it indeed does exactly that. But in a .htaccess file it behaves a little differently. If anything about the request has been changed by mod_rewrite, it jumps back to the top of the .htaccess file. This unexpected behavior makes it very easy to get caught in an infinite loop. (Apache generally protects itself against infinite loops with counters and timers, so instead of the server simply hanging, your browser will probably get back some unusual return code.)

Only if nothing has changed does it really do what you probably expect: terminate mod_rewrite processing. The conventional solution for preventing such loops and undesired behavior is to also invoke a second [L]ast flag very near the top of the file (before anything can change). (Of course this second [L]ast flag should skip over itself unless at least one [L]ast flag had already been executed.) Code to invoke the second [L]ast flag should be included in all mod_rewrite .htaccess files. Here's some boilerplate code to insert near the top of the .htaccess file to prevent inadvertent loops:

Option +FollowSymlinks RewriteEngine on Rewrite Base / RewriteCond %{ENV:REDIRECT_STATUS} \d\d\d [OR] RewriteCond %{REQUEST_FILENAME}==%{ENV:SAVED_REQUEST_FILENAME} ^(.*?)==\1$ [OR] RewriteCond %{REQUEST_URI} ^.{300} RewriteRule ^ - [L] RewriteRule ^ - [E=SAVED_REQUEST_FILENAME:%{REQUEST_FILENAME}]

(Alternatively, you may be able to use everywhere the [E]nd flag instead of the [L]ast flag. The [E]nd flag behaves in individual .htaccess files the same as the [L]ast flag in the system-wide Apache configuration file. However, the [E]nd flag is available only in recent versions of Apache - it may not be available to you at the moment, or its use may hamper some future migration.)

Inconsistent Requirement for the [L]ast Flag

The way mod_rewrite works is that [F] and [G] (or any [R=4nn] or [R=5nn]) stop immediately, as though the [L]ast flag had been specified too. However redirects —internal-redirects (i.e. new filename but no flags) and also external-redirects ([R=3nn])— do not imply the [L]ast flag; execution will continue on to the next statement if the [L]ast flag is not explicitly specified.

An external-redirect virtually never makes sense without the [L]ast flag triggering an immediate round trip back to the client browser. Thus you will see [R=301,L] a lot, but almost never [R=301]. (Because external-redirects get the URI ready to send back to the client browser [i.e. add the http://...], even if you forgot the [L]ast flag subsequent rules probably wouldn't match.)

An internal-redirect also hardly ever makes sense without the [L]ast flag causing execution to return to the top of the mod_rewrite .htaccess file. So although you'll see RewriteRule RE newloc [L] quite a bit, you'll only ever see RewriteRule RE newloc in a few very carefully constructed mod_rewrite .htaccess files.

If a regular external-redirect was intended, including the [L]ast flag too is pretty much the only way it will work. If returning some sort of error code to the client browser was intended (403, 410, 503, etc.), [L]ast would be implied anyway; your including it explicitly anyway may document your intention more clearly, but it doesn't change mod_rewrite's behavior at all.

If an internal-redirect was intended, include [L]ast by default if you're doing something simple or if you're not sure whether or not it should be there.

Symlinks as an Alternative to Internal-redirects

Often you can use symlinks (popular name for symbolic links) to do pretty much the same things you could do with mod_rewrite's internal-redirects ...provided of course symlinks are available to you (see below). You can set up alternate (alias) names for either whole directories/folders, or for individual files, or probably for both.

How would you choose between symlinks and internal-redirects? (To avoid needless confusion and bizarre problems, use one or the other, not some mixture. If your .htaccess file specifies any internal-redirects, don't also use symlinks.)

In summary, use symlinks for just a handful of simple aliases, but use mod_rewrite's internal-redirects for more or more complex aliases.

To use symlinks they must be available to you, which they will be in either of the two following cases.

Regular Expressions (also called regex or RE) in Conditions

Most Conditions are expressed as a regular expression which either matches or doesn't match. (Even what you think of as logical or boolean variables are tested with a regular expression, perhaps ^.+$ [any non-empty value, i.e. of length at least one] for true and ^$ [completely empty (or undefined)] for false.) (This heavy reliance on regular expressions will delight some but frighten others:-)

What About Numbers?

Everything in mod_rewrite is a text string. Nothing is stored as a binary number, or tested as a binary number. Since everything is a text string, almost all tests use the same sort of Regular Expressions. (A few mod_rewrite tests return a yes/no answer without using an RE, such as -f meaning argument is the name of an existing file (including empty files and files reached indirectly through a symlink) or -l meaning argument is the name of a symlink (which may or may not point at a valid file). All these special-purpose tests are noted by a dash followed by a single lower-case letter (-x). Note the argument to all these file-related tests must be an absolute filename with path all the way from the system's root. So a condition using one of these file-related tests typically looks something like: RewriteCond %{DOCUMENT_ROOT}/filepath/filename -f

mod_rewrite also provides some tests that sort of look like numeric tests (<, =, and >), but these are in fact special forms of string tests. = is just syntactic sugar, equivalent to an RE with anchors at both beginning and end (i.e. the test =abc is the same as the RE test ^abc$). The < and > tests compare the lexicographical order of the two strings, a behavior that doesn't improve comparison of numbers, and that's almost never useful even when the arguments are plainly text. Worse, if the two strings aren't the same length, < and > will expand the shorter string to make it the same length by padding it on the left, which is different from virtually all similar operators in all computer languages. Thus the recommendation to never use the <, =, or > tests.

In most cases treating everything as a text string is exactly what you want anyway and you won't notice any limitation. (In fact the consistency of almost all tests using the same sorts of RE makes using mod_rewrite a little less daunting.) But what if something was intended to represent a number? Suppose we have VAR1=1, VAR2=01, VAR3=1.0, and VAR4=01.00. Humans will immediately recognize all these as different ways to write the same number. But as mod_rewrite stores and tests them as text strings, it will say they're all different.

If the desire is to treat them all as the same number, how can that be done? One way is by establishing a canonical form for numbers and converting all potential numbers to that form (i.e. canonicalizing them) before doing any tests.

A typical canonical form for integers is to eliminate all leading zeros and eliminate the decimal point if everything after it is zero or blank (if not, it's not an integer anyway). Here's an example that canonicalizes all the example variables.

RewriteCond %{ENV:VAR1} ^0*([1-9]\d*)(?:\.0*)?$ RewriteRule ^ - [E=VAR1:%1] # RewriteCond %{ENV:VAR2} ^0*([1-9]\d*)(?:\.0*)?$ RewriteRule ^ - [E=VAR2:%1] # RewriteCond %{ENV:VAR3} ^0*([1-9]\d*)(?:\.0*)?$ RewriteRule ^ - [E=VAR3:%1] # RewriteCond %{ENV:VAR4} ^0*([1-9]\d*)(?:\.0*)?$ RewriteRule ^ - [E=VAR4:%1] # # # now VAR1=VAR2=VAR3=VAR4=1

Which Variant of Regular Expressions?

In mod_rewrite regular expressions are not limited to the simplistic grep flavor (nor even the extended egrep flavor). Rather, regular expressions in mod_rewrite support the full PERLish/PCRE flavor, including lookahead, lookbehind, non-capturing parentheses, character class shorthand, and both lazy (non-greedy) and possessive quantifiers. While the ability to construct very complex REs can be useful, the power need not be used; your REs can be very simple. Or to put it another way, you don't need to be an RE-nerd to use mod_rewrite.

PERLish/PCRE regular expressions include a few character shorthands that can allow REs to be written much more simply and much shorter. For example \w is equivalent to [_a-zA-Z0-9] (and \W is everything else), so \w+ is a word of letters, digits, and underscore (but not dash or period). As an example of their use, consider the input string abc_xyz. The regular expression ^\w+$ will get the whole thing, treating it all as a single word. \w+\b\w+ though will not match at all, as abc and xyz are not seen as separate words since they are connected by an underscore.

Match Only What Matters

The RE general rule of thumb of matching only what matters -rather than the whole string- applies in mod_rewrite too. Don't fall victim to the mistaken idea that REs are supposed to be that way - they're most definitely not. Match only the parts that a you might modify or reference, or b add relevant specificity so the RE doesn't match unrelated things too.

Matching parts of a string that don't matter is never useful. It makes the task of coding the mod_rewrite instructions longer and trickier. It degrades performance unnecessarily. And most importantly it considerably increases the chances of errors. So don't do it. (An RE that begins .* ... or ^.* ... is almost always a red flag that something has gone awry.)

Best Throwaway Match

Sometimes you want a RewriteRule to not change (redirect) the target at all. For example you may wish to do nothing more than set up an environment variable:

RewriteCond %{REQUEST_METHOD} GET RewriteRule ... ... [E=ISGET:true,S=1] RewriteRule ... ... [E=ISGET:false]

How exactly should the before and the after be coded in the null RewriteRule? It's fairly plain an unchanged after should be specified with a single dash (-). But the before is a Regular Expression; what's the best way to specify a null RE?

So when completed the above example should look like this:

RewriteCond %{REQUEST_METHOD} GET RewriteRule ^ - [E=ISGET:true,S=1] RewriteRule ^ - [E=ISGET:false]

White Space

You can use the character class shorthand \s indiscriminately without worrying about it inadvertently proceeding to the next line of the text ...since in all cases only one line at a time is tested.

If you have a literal blank character in an RE, it will screw up parsing of that line so that the wrong string is sent to the RE engine. As an example, abc xyx is seen not as a single RE but rather as two fields, the RE abc and something else that looks like xyz. Usually you can take care of this by simply using \s rather than the literal blank (i.e. abc\sxyz). If you can't do that (or if it's too unclear), an alternative is to enclose the whole thing in quotes, like this: "abc xyz". Everything goes inside the quotemarks, even not ("!abc xyx") and start of line ("^abc xyx" or "!^abc xyz") and end of line ("abc xyz$" or "!abc xyz$").

If your RE includes both white space [ ] and one or more double quote marks ["], use single quote marks ['] instead. Likewise if your RE includes both white space [ ] and one or more single quote marks ['], use double quote marks ["] instead.

Capturing Parts of Input to Use in Output

Normal parens specify parts of the text that should be captured in the $1-$2-$3-... variables (or %1-%2-%3-... for RewriteCond). As in all REs, capture variable numbers are assigned according to the order in which opening capturing parens are encountered (reading from left to right), which means in the case of nested capturing parens the outer capture will be assigned a lower symbol number than the nested inner capture. The PERLish/PCRE lookahead/lookbehind/etc. sequences (i.e. anything that has a question mark immediately after the opening paren: (?...)) do not capture anything and do not use up a variable name. For example, if the RE www(.*?)xxx(?=.*?)yyy(.*?)zzz, is fed the name wwwabcxxxdefyyyghizzz then $1 will have the captured value abc and $2 will have the captured value ghi (and def will not be captured at all in any variable and will not be available subsequently).

The operation of capturing vs. non-capturing parentheses is mostly pretty crisp. There's a bit of murkiness when quantifiers are applied to capturing parentheses though (for example, if the RE is xxx(abc)*yyy and it's applied to xxxabcabcabcyyy, does $1 then contain inothing, iiabc, or iiiabcabcabc? (And making a bad situation even worse, not all implementations of mod_rewrite handle this exactly the same way.)

To avoid this problem, simply recast the RE with the quantifier applied to non-capturing parentheses, then put the whole thing -including the quantifier- inside a second set of parentheses that do the capturing. (To continue the above example, simply recast the RE as xxx((?:abc)*)xyz, after which there will be no question that $1 contains abcabcabc.)

Logical Operators in Conditions

mod_rewrite supports in some form all the usual logical operators. AND is the implied connector between multiple RewriteCond .... OR is another possible connector between multiple conditions. And a form of NOT that applies only to one individual line is available.

NOT's syntax is neither a separate statement nor a flag - rather it's an integral part of the condition. Any condition (even the one in RewriteRule) can be reversed by adding a bang/exclamation-point (!) to the front of the condition. The condition is evaluated, then the result is reversed just before being used. This is true of all kinds of condition, including the ones that don't involve REs (for example !-f means the file does not exist).

Note that when ! is used, capturing parentheses should not be used. After a statement like RewriteCond foobar !(abc).*(xyz), %1 and %2 probably do not contain abc and xyz (or anything useful at all).

NOT can only be easily applied to individual conditions one at a time. (There is no straightforward way though to apply NOT either to many conditions all at once, or to the net result of a group of conditions.)

OR can be specified two different ways. One way is separate RewriteCond ... statements (with the [OR] flag on all but the last one). The other way is a combined RE constructed with a vertical bar (the RE alternative marker |) like this: REa|REb or like this: (?:REa|REb) or like this: (REa|REb)). The RE way is simpler and faster. But the way with the separate lines has some distinct advantages: it allows different conditions to test against different variables (the RE way performs all the tests against the same variable), and it's more easily maintained since lines can be inserted or deleted without affecting any of the other REs.

mod_rewrite does not support any way (such as parentheses) to group logical operators. The exact order of evaluation and combination of conditions is fixed; there's nothing you can do to modify it.

Short Circuit Condition Evaluation

Like most modern computer languages, all logical tests in mod_rewrite are short-circuit. If several conditions are connected by AND, mod_rewrite will stop right after the first one that's false and say the whole series is false, and never even test the rest of the conditions. If several conditions are connected by OR, mod_rewrite will stop right after the first one in the series that's true and say the whole series is true, and never even test the rest of the conditions.

This does more than just improve performance. It means you can be certain that the %n variables were set by the first RewriteCond ... ... [OR] that matched.

Sequences ([OR] and [C]hain)

Either an [OR] or a [C]hain sequence is a series of statements tied together and handled as one block, with each individual statement being either executed or skipped according to the results of the others. (Despite involving more than one statement, the [S=n]kip flag does not indicate a sequence.) The [OR] makes sense only on RewriteCond statements (in a single ruleset). The [C]hain flag makes sense only on RewriteRule statements (thus tying together several whole rulesets). Either specifying [OR] on a RewriteRule or [C]hain on a RewriteCond makes no sense, and may even cause strange behavior.

Both kinds of sequence are composed of the series of statements that contain the flag and one more statement; the last statement in the sequence does not contain the flag. mod_rewrite is not very forgiving if you don't follow this syntactical requirement. For example, if inadvertently the very last RewriteCond also contains an [OR] flag, the corresponding RewriteRule will always be executed. And if a flag is missing in the middle of what was intended to be one sequence, the statements will behave as two separate sequences.

This can easily lead to brittle code maintenance headaches. Adding or removing or shuffling statements may inadvertently a include an extra unrelated statement in a chain, or b break a chain into two unsynchronized chains, or c concatenate two existing chains into just one, or even d as noted above cause basic processing logic to malfunction. It's fairly simple to avoid all these sorts of problems initially. But problems can easily creep in unnoticed later when maintenance moves a few statements around.

I don't know any simple way to avoid these problems. All you can do is every time statements are added or deleted or moved, check your [OR] and [C]hain flags all over again to be sure all the sequences are what you intended.

Simple Rulesets

A ruleset is one RewriteRule, optionally preceeded by one or more RewriteCond(s). The simplest ruleset is one where the RewriteConds are either all connected by AND or all connected by OR.

Elaborate Rulesets

Rulesets can depend critically on both the condition on the RewriteRule line and the conditions in the RewriteCond lines, and on careful parsing by the pre-execution of the RewriteRule to set the $n variables so they can then be used by the RewriteCond lines. (Likewise the RewriteCond lines set the %n variables, which can then be used by the second execution of the RewriteRule.) Also the RewriteConds may be linked by a combination of ANDs and ORs (rather than just all AND or all OR). All these more elaborate rulesets are trickier.

Precedence

Just being concerned with exactly how RewriteConds with both some ANDs and some ORs are combined suggests at least a moderately complex use of mod_rewrite. If you just want to keep it real simple, you may prefer this introduction to modifying URIs in Apache.

The way the logic combining multiple conditions is implemented by mod_rewrite is a bit different. Conditions are processed strictly in physical order. So long as the running result is still true, it is combined with the next condition according to whether the current condition defaults to AND or is explicitly marked [OR]. If the current condition was AND and the result is still true, move ahead. But if the condition was AND and the result just became false, quit and report false as the final answer. If the condition was OR and the result is still true, skip over one or more conditions conditions all the way past the end of that OR sequence, then continue processing. But if the condition was OR and the result just became false, back out the false and instead try the next condition in that OR sequence (and if necessary keep trying all the way to the end of that OR sequence). If you reach the end of the OR sequence and every single result has been false, quit and report false as the final answer.

When you reach the end of all the conditions, report true as the final answer. (Because of the way the evaluation works, if the result would have been false, you'd never have gotten all the way to the end; the fact that you got all the way to the end implies the result is still true.)

mod_rewrite never reorders any evaluations according to operator precedence (no scan ahead, no multiple passes, no yacc-style parsing, no stack, etc.). So one could say there's no precedence in the conventional sense (although the net behavior -so long as there is no nesting- is similar to OR binding more tightly than AND). The net result of the above procedure completely ignores any parentheses you may have intended. mod_rewrite instead simply always processes all RewriteConds in physical order (top to bottom, or left to right). In all but the simplest case of all the logic at the top level with no nesting, correctly interpreting the boolean logic in a ruleset is so confusing and error prone that it may be better to simply avoid it. Mentally check that any proposed mod_rewrite ruleset actually does what you intend in all cases.

The actual implementation of combining logic works well in simple cases, providing a good simulation of OR binding more tightly than AND so long as the logic is all at the top level with no nesting. In more complex cases, although the behavior is well-defined and repeatable, it might not actually be quite what you think it's going to be. It's probably best to simply avoid such logic completely.

(Note I'm saying this is what it actually does, not anything one way or the other about what it should do. Although there may [or may not] be good reasons for it to be the way it is, that's not our concern here.)

As an example, here's a moderately complex ruleset along with the corresponding conventionally written boolean expression

RewriteCond a [OR] RewriteCond b RewriteCond c [OR] RewriteCond d RewriteCond e [OR] RewriteCond f [OR] RewriteCond g RewriteRule ...something

the corresponding conventionally written boolean statement is (a OR b) AND (c OR d) AND (e OR f OR g).

DeMorgan's Rules

When what you want to do doesn't fit the simple case, the first thing to try is rewriting what you really want into a simpler ruleset by using DeMorgan's rules. These make the expression look completely different, yet produce exactly the same result. Fully written out they are

# - DeMorgan's Rules - # # 1) NOT (a AND b) == (NOT a) OR (NOT b) # # 2) NOT (a OR b) == (NOT a) AND (NOT b)

Identify those portions of the ruleset logic that are causing problems, and use DeMorgan's rules to rewrite just those sub-expressions of the corresponding boolean expression. Rewriting of lowest-level sub-expressions is simplest as it has no impact on any other part or level of the expression. Rewriting of higher-level sub-expressions of a complex expression is more difficult to do accurately as the contained lower-level sub-expressions need to be negated/reversed, and doing so may require applying DeMorgan's rules to those sub-sub-expressions as well.

Expressing DeMorgan's rules as a rote recipe rather than boolean equations gives:

  1. reverse (negate) both lower-level tests

  2. also reverse (negate) the parenthesized result

  3. finally swap the AND to an OR (or the OR to an AND)

NOTs on the lowest-level tests are simply a matter of adding or deleting !. An outer NOT can often be implemented with a clever change to the RewriteRule action; but sometimes [S=1]kip is required. (A [S=1]kip with parameter always and only 1 is pretty straightforward and does not risk becoming a maintenance headache.)

Rewriting the entire boolean expression all at once with DeMorgan's rules is also possible. But it may not be useful, as the pendulum may swing too far, resulting in another expression that's equally complex and difficult to express as a mod_rewrite ruleset.

For what it's worth, here's the recipe for using DeMorgan's rules to rewrite all levels of an entire boolean expression all at once:

  1. be sure all possible parentheses are written explicitly, so evaluating the expression never requires using any rule or assumption about precedence

  2. reverse (negate) all lowest-level tests

  3. also reverse (negate) the outermost parenthesized result

  4. finally swap all the ANDs to ORs and ORs to ANDs

Interconnected Rulesets

If your logic is more complicated than what you can transform into a simple case (with DeMorgan's rules or some other way), maybe you just plain shouldn't be doing it. But if you want to keep going anyway, there are ways to handle even more complex logic.

As there is no way with mod_rewrite to group anything or otherwise explicitly specify where parentheses go, you will likely have to break your logic into several separate rulesets interconnected by intermediate environment variables. For example if you want to implement (a AND b)OR(c AND d) (and for some reason can't use DeMorgan's rules), do it like this:

# (it's assumed variables FIRSTPART and SECONDPART have never been used # before and so do not have any values and can be interpreted as false # - if this is not the case, you may need to explicitly set them to false # before beginning these rulesets) # RewriteCond a RewriteCond b RewriteRule ^ - [E=FIRSTPART:true] # RewriteCond c RewriteCond d RewriteRule ^ - [E=SECONDPART:true] # RewriteCond %{ENV:FIRSTPART} true [OR] RewriteCond %{ENV:SECONDPART} true RewriteRule ...something

Alternate Matches

Another Elaborate Ruleset structure reduces the amount of typing, keeps REs from becoming so ridiculously complex, and reduces the chances maintenance will introduce an error. That structure is Alternate Matches. As always conditions are processed in order, each one looking for an RE match. And because everything is an [OR] sequence, the first condition that matches will stop processing of any further conditions, and those %n variables will be passed to the RewriteRule statement at the end of the ruleset. If it's arranged that all of the conditions produce the exact same %n variables, an Alternate Match ruleset will be formed and it can be quite useful. Here's an example of an Alternate Match ruleset:

# alternative #1: URI may be in the website root # i.e. /bazfile.ext RewriteCond %{REQUEST_URI} ^/*+()([^/]*)$ [OR] # # or alternative #2: URI may include a path # i.e. /foopath/barpath/bazfile.ext RewriteCond %{REQUEST_URI} ^/*+(.*?/)/*+([^/]*)$ # # Now take the same action for both alternatives # without having to consider which one matched RewriteRule ^ - [E=PATHNAME:%1,E=FILENAME:%2] # # Net result: # If the URI contained no path part # environment variable PATHNAME will be empty # If the URI ended with a slash # environment variable FILENAME will be empty

(A couple things to keep in mind when constructing these: iall special RE sequences (?... are always non-capturing, so they never use up a %n number, and iiit's valid to use an empty capture (), an empty capture will use up a %n number and that %n will always have a [defined] empty value after a match.)

(In the example ...*+ specifies an atomic capture [i.e. one which will not backtrack no matter what]. It's used purely to make these REs perform better and work right even in some unusual cases; it has nothing to do with the main thrust of the example [alternate matches].)

(Even though the above example shows how to do it, beware this sort of complex construct, as it is a yellow flag that something may be awry and it may be prudent to search for a simpler way.)

Special Case Ruleset Structures

Default Values

A common need is to supply a variable (for example VARNAME) with a default value if it isn't already set. (Although something similar to unix/linux shell syntax ${VARNAME:=defaultvalue} seems like a perfect solution, nothing similar is supported by mod_rewrite.) Instead, defaulting a variable in mod_rewrite is sometimes conceptualized as a degenerate case of if-then-else with only one statement in each branch. It can be written for mod_rewrite fairly straightforwardly as follows:

RewriteCond any RewriteCond more-any RewriteRule ^ - [E=VARNAME:newvalue,S=1] RewriteRule ^ - [E=VARNAME:defaultvalue]

Because the [S=1]kip value is always one, and because there's no obvious way to expand either statement, this particular construct is unlikely to become a maintenance issue.

(Giving a variable a default value could also be written for mod_rewrite as follows, where the default value is always set but then overridden with a specific value if necessary. Performance of the first method is very slightly better, as the the default value is set only when necessary rather than every time.)

RewriteRule ^ - [E=VARNAME:defaultvalue] RewriteCond any RewriteCond more-any RewriteRule ^ - [E=VARNAME:newvalue]

Comparing Two Variables

Testing whether or not a variable has a particular value is straightforward - in fact that's most of what mod_rewrite instructions do. RewriteCond basically contains the variable name in its second column and in its third column the constant value we're testing for (expressed as an RE).

But what if we want to test one variable against another variable (not against a constant value we typed)? This is much harder, because RewriteCond only replaces variable names with their values in its second column, but not in its third column. So in order to get the values of both variables we have to put them both in the second column, and something else (but what?) has to go in the third column.

Although given how RewriteCond works this comparing of two variables might seem practically impossible, it's actually fairly simple once you get the strange-looking RE that goes in the third column. The functionality depends on an infrequently used RE feature called back substitution.

Here it is: mod_rewrite instructions to check if VARA and VARB are the same.

# VARA and VARB both have (possibly empty) values # We test to see if they have the _same_value # RewriteCond %{ENV:VARA}==%{ENV:VARB} ^(.*?)==\1$ RewriteRule ^ ...dosomething

You could expand this slightly to do the action only if the variables are both the same and non-empty by adding one more line, as follows:

# VARA and VARB both have (possibly empty) values # We test to see if they have the _same_non-empty value # RewriteCond %{ENV:VARA}==%{ENV:VARB} ^(.*?)==\1$ RewriteCond %{ENV:VARA} !^$ RewriteRule ^ ...dosomething

(The example assumes the magic string we've chosen to separate the two variables [== in this case] will never be contained in any of the possible values. If it might occur in a value, then we would need to re-select a different magic string, and change both the RewriteCond and its RE slightly to reflect this.)

Reversal of Result of a Whole Set of Conditions

Suppose you've drafted a ruleset like this

RewriteCond a RewriteCond b RewriteCond c RewriteRule ...something

but it isn't quite right yet, as you want the rule to be executed if the net result of all the conditions is not true. You can always do this with logic manipulations that include DeMorgan's rules. But such manipulated rulesets can sometimes be unclear, error prone, and hard to maintain.

An alternative that reverses the action is as follows

RewriteCond a RewriteCond b RewriteCond c RewriteRule ^ - [S=1] RewriteRule ...something

If the net result of all the conditions is true, the RewriteRule will skip over the following RewriteRule that would actually perform the action. Only if the net result of all the conditions is false will the RewriteRule that actually performs the action be reached.

Multi-rulesets

Interconnecting multiple rulesets to implement very complex logic is tricky and error-prone (but possible). There are no structures in mod_rewrite. mod_rewrite instructions are not really a procedural language, they're more like a specification language. Before proceeding, check again that no more straightforward solution is possible and you really do need to use very complex logic with mod_rewrite.

There are several strategies for using very few or no multiple interconnected rulesets.

If you really do want to go ahead and implement quite complex logic, use custom environment variables to communicate between the rulesets in a group. Have all rulesets in the group except the final one set those environment variables as their only action; do not try to make them do anything further. Then have the final ruleset in the group make a decision by checking and combining all the custom environment variables, then depending on that decision possibly invoke the significant action.

Here's a trivial example that tries to tell spiders indexing images that all the "internal" pictures -which are just for our own use and should not appear in any index- are "gone" (return code 410):

# (To keep this example simpler and focussed on multi-rulesets, # all the variables are _assumed_ to be cleared.) # RewriteCond %{HTTP_USER_AGENT} bot|spider [NC] RewriteRule ^ - [E=ISBOT:true} # RewriteCond %{ENV:ISBOT} true RewriteCond %{HTTP_USER_AGENT} Image|Media[NC] RewriteRule ^ - [E=ISIMAGEBOT:true] # RewriteCond %{REQUEST_URI} \.(?:jpe?g|gif|png|tiff?)$ [NC] RewriteRule ^ - [E=ISPICTURE:true} # RewriteRule \binternal|internal\b: - [E=ISINTERNAL:true] ## RewriteCond %{ENV:ISIMAGEBOT} true RewriteCond %{ENV:ISPICTURE} true RewriteCond %{ENV:ISINTERNAL} true RewriteRule ^ - [G]

Frequent Questions About mod_rewrite

Here are descriptions of the mod_rewrite issues that are asked about most often.

(Some of these summary descriptions reiterate some of the information already discussed above:-)

Q#1: What Boilerplate Code Should Go in Every mod_rewrite .htaccess File?

Always insert the following (even though doing so may feel silly as it seems these sorts of things really should be the default, even though some of it is not really necessary in many cases, and even though some of it may look just plain wrong). Doing so will often save you a great deal of grief.

Option +FollowSymlinks RewriteEngine on Rewrite Base / RewriteCond %{ENV:REDIRECT_STATUS} \d\d\d [OR] RewriteCond %{REQUEST_FILENAME}==%{ENV:SAVED_REQUEST_FILENAME} ^(.*?)==\1$ [OR] RewriteCond %{REQUEST_URI} ^.{300} RewriteRule ^ - [L] RewriteRule ^ - [E=SAVED_REQUEST_FILENAME:%{REQUEST_FILENAME}]

(A detailed explanation of what the last two lines do and how they do it is available above.)

You also have the option of adding further sanity checks near the top, such as if you know all URIs on your site have reasonable lengths, summarily reject extremly long requests.

Q#2: Are There Logical Structures in mod_rewrite?

Most everybody looks for some way to implement familiar logical procedural programming structures in mod_rewrite. (The most frequently sought logical structure is if-then-else.)

The new way to do it (<if>, <elseif>, <else>) may not be available to you, as it is only provided by the very latest versions of Apache, and those versions are not installed on many systems.

There is not really any old way to do it; with mod_rewrite there is simply no predefined logical structure no matter what. (In particular there is no if-then or if-then-else construct.)

Equivalent behavior can be created with a wide variety of tricks (many of which are described above): rearrangement of tests, rewriting of tests (probably using DeMorgan's rules), duplication of tests, intermediate variables, immediate exit started by use of the [L]ast (or [END]) flag, even clever uses of [C]haining. When nothing else will work, there's always [S=n]kipping. The [S=n]kip flag is the universal fallback. It will always work ...but using that very manual procedure is risky because it's so easy to not keep the count perfectly accurate.

Q#3: What's the Precedence of AND and OR in mod_rewrite Conditions?

Because it's so tightly intertwined with other issues, the thorough discussion of precedence is above rather than being presented here in isolation.

Q#4: How Can the [S=n]kip Flag be Used?

The [S=n]kip flag can be used to implement behavior equivalent to a wide variety of logical constructs. As perhaps the most important case, [S=n]kip can be used to implement the equivalent of an if-then-else construct, even if both branches are long and even if there are nested conditions. It's done as follow:

... # <if> RewriteCond any RewriteCond more-any ## skip to true branch ([S] always 1) RewriteRule ^ - [S=1] ## skip to false branch ## ([S=n] *MUST* be adjusted to always ## exactly match the length of the true branch) RewriteRule ^ - [S=3] RewriteCond any-2 RewriteRule ...dosomething-2 RewriteCond any-3 RewriteCond more-any-3 RewriteRule ...dosomething-3 ## skip over false branch ## ([S=n] *MUST* be adjusted to always ## exactly match the length of the false branch) RewriteRule ^ - [S=2] # </if> # # <else> RewriteCond any-4 RewriteRule ...dosomething-4 RewriteCond any-5 RewriteCond more-any-5 RewriteRule ...dosomething-5 # </else> ...

This always works and can be used without restriction. However, as the [S=n]kip values currently must be generated by manual counting, and as there's no way to guarantee (or even check) their correctness, it can easily become a maintenance nightmare.

(Some sort of compiler or pre-processor that auto-generated all the [S=n]kip counts would make the use of mod_rewrite a whole lot easier and more robust. Unfortunately to my knowledge no such pre-processor exists [maybe you could create your own though].)

Q#5: How Can I Make Good Use of mod_rewrite's [C]hain Flag?

Many large, complex .htaccess files do not use the [C]hain flag at all. The flag provides one more tool in the bag of tricks a very clever coder might use when trying to slightly shorten the logic and/or improve the performance. But it's often not worth the trouble, especially if you're not completely comfortable with its esoteric logic.

The [C]hain flag links several rulesets together so they follow each other immediately. A [C]hain sequence is somewhat like the reverse of an [OR] sequence, except for whole rulesets rather than individual conditionals. Execution proceeds onward to the next ruleset in the sequence until there is some unmet condition or problem (in either RewriteRule or RewriteCond), after which the entire remainder of the [C]hain sequence is completely skipped (not even testing any of their conditions).

The logic of a chained sequence is a little different from anything you've seen elsewhere. Any negative condition (either the pattern in the RewriteRule or the logical combination of the RewriteCond patterns) skips not only that ruleset but also all the rest of the rulesets in the [C]hain. This may sometimes be quite useful. However it doesn't quite correspond to any commonly encountered logical operation, and as you've probably had zero experience with anything like it, you may not be able to make good use of it.

A common way to make the [C]hain flag useful without wading too deeply into unfamiliar logic is to use it only to tie together pairs of rulesets, not trying to construct longer blocks. Here's an example of one such pair:

# vvv - Begin pair of rulesets tied together in [C]hain - vvv RewriteCond %{QUERY_STRING} ^[\?&]*(.*?&)?(?<=^|&)name=([^&]*)&?((?!&).*)$ [NC] RewriteRule ^(.*)$ $1/%2?%1%3 [C] # Now from the above we already know we need to do an R=301 # and have already substituted the new URI (and modified the query string). # Do we also need to clean up the site name? RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC] RewriteRule ^(.*)$ http://%1/$1 [R=301,L] # ^^^ - End pair of rulesets tied together in [C]hain - ^^^ # This third ruleset is part of the same logic, but NOT part of the [C]hain. # No site name cleanup needed, so we'll just do R=301 with the other information. RewriteRule ^(.*)$ /$1 [R=301,L]

Chaining can sometimes also be used off-label as an aid in implementing familiar logic constructs, somewhat simplifying either rule creation or rule maintenance. It can for example be used to implement a form of automatic if-then ...but only if every ruleset in the [C]hain sequence (except the first of course) evaluates to true. In other words given the input foo.html, this will execute all five rulesets:

# <if> RewriteRule foo bar.html [E=VAR1:1,C] RewriteRule ^ - [E=VAR1:2,C] RewriteRule bar - [E=VAR1:3,C] RewriteRule ^ - [E=VAR1:4] # </if> RewriteRule ^ - [E=VAR1:5]

But it's awfully easy for a collection of rules like this to not behave exactly as you desired and expected. For example, given the very same input (foo.html) this almost-identical collection of rules will execute only rulesets 1, 2, and 5 (the RE in RewriteRule 3 won't match so ruleset 3 won't be executed, and execution will then skip past the end of the [C]hain so ruleset 4 won't even be considered).

# <if> RewriteRule foo bar.html [E=VAR1:1,C] RewriteRule ^ - [E=VAR1:2,C] RewriteRule foo - [E=VAR1:3,C] RewriteRule ^ - [E=VAR1:4] # </if> RewriteRule ^ - [E=VAR1:5]

Although the example below looks more sophisticated with its RewriteCond, it will in fact behave exactly the same as the example above, for the same reason. (As always, mod_rewrite doesn't care whether it was a RewriteRule or the combination of RewriteConds that was false, only that the totality of the ruleset is false.)

# <if> RewriteRule foo bar.html [E=VAR1:1,C] RewriteRule ^ - [E=VAR1:2,C] RewriteCond %{REQUEST_METHOD} foobar # <== additional line RewriteRule ^ - [E=VAR1:3,C] RewriteRule ^ - [E=VAR1:4] # </if> RewriteRule ^ - [E=VAR1:5]

It's also sometimes possible to use [C]hain to reasonably implement if-then-else in the special case where all the <if> rulesets except the first will always be true and the false/<else> branch has only one rule in it, as shown in the following example. If the first ruleset is false, mod_rewrite will jump over the entire [C]hain, effectively always jumping directly to what you think of as the <else>. But if the first ruleset condition is true, the entire [C]hain (corresponding to the <if> part) will be executed (assuming all those rulesets are true). Then the [S=n]kip on the last ruleset of the [C]hain will jump over the single ruleset comprising the false/<else> branch.

# <if> RewriteRule foo bar.html [E=VAR1:1,C] RewriteRule ^ - [E=VAR1:2,C] RewriteRule ^ - [E=VAR1:3,C] RewriteRule ^ - [E=VAR1:4,S=1] # </if> # # <else> RewriteRule ^ - [E=VAR1:5] # </else> ...

It's even (sorta) possible to further generalize the simulation of if-then-else so that both branches can be of any length, like this:

# <if> RewriteRule foo bar.html [E=VAR1:1,C] RewriteRule ^ - [E=VAR1:2,C] RewriteRule ^ - [E=VAR1:3,C] ## skip over false/<else> branch ## (the argument to [S] *MUST* vary with ## the length of the false/<else> branch) RewriteRule ^ - [E=VAR1:4,S=4] # </if> # # <else> RewriteRule ^ - [E=VAR1:5,C] RewriteRule ^ - [E=VAR1:6,C] RewriteRule ^ - [E=VAR1:7,C] RewriteRule ^ - [E=VAR1:8] # </else> ...

The execution is partially automatic, in that because of the use of [C]hains, when mod_rewrite is executing either the true/<if> branch or the false/<else> branch it can figure out exactly where the end of that branch is ...and that makes it easy to either delete or add a ruleset. However, the value on the [S=n]kip parameter that allows the end of the true/<if> branch to skip over the entire false/<else> branch is not automatic. Although the risk of maintenance headaches is considerably reduced, it's not eliminated entirely.

It's likely the only way you'll find to use chaining is to construct an if-then-else with no nested conditionals in each whole chained branch (as above). Using the [C]hain flag this way is a bit less brittle than doing the same thing using just the [S=n]kip flag (see above), as with the [C]hain flag rules can often be inserted or deleted in the true branch without disturbing any counts. Nevertheless there's still enough risk of a maintenance headache that this construct is often avoided. (And besides, no nested conditionals can be a pretty severe restriction.)

Because it's so oddly restrictive (and because RewriteRule actions can usually all be combined in one statement anyway), and because maintenance headaches are only reduced but not eliminated, and because the real use of [C]hain requires mastery of a new and unfamiliar type of logic construct, the [C]hain flag is often considered to not be very useful.




Rather than this in-depth description of mod_rewrite, you may prefer a simpler introduction to rewriting URIs with Apache.


Location: (N) 42.67995, (W) -70.83761
 (North America> USA> Massachusetts> Boston Metro North> Ipswich)

Email comments to Chuck Kollars
Time: UTC-5 (USA Eastern Time Zone)
 (UTC-4 summertime --"daylight saving time")

Chuck Kollars headshot Chuck Kollars' other web presences include Chuck's books and Chuck's movies.

You may also wish to look at Dad's photo album.

All content on this Personal Website (including text, photographs, audio files, and any other original works), unless otherwise noted on individual webpages, are available to anyone for re-use (reproduction, modification, derivation, distribution, etc.) for any non-commercial purpose under a Creative Commons License.