Notes on configuring Apache web server

Access control

  1. The Order directive is not intuitive, but luckily it has only two possible states: allow,deny and deny,allow (which are actually single tokens, so must not include whitespace).

    • With Order deny,allow (or no Order directive, since this is the default) and …

      • … no other directives: all access is allowed (the second word of the Order directive is the default);

      • … only Deny directives: access denied only in listed conditions;

      • … only Allow directives are superfluous (default is to allow anyway);

      • Allow directives which list conditions that are independent from all Deny conditions are superfluous (default is to allow anyway);

      • Deny directives which list a subset of Allow conditions are irrelevant (the second word of the Order directive, allow, overrides);

      • Allow directives which list a subset of Deny conditions actually work as expected.

    • With Order allow,deny and …

      • … no other directives: all access is denied (the second word of the Order directive is the default);

      • … only Allow directives: access allowed only in listed conditions;

      • … only Deny directives are superfluous (default is to deny anyway);

      • Deny directives which list conditions that are independent from all Allow conditions are superfluous (default is to deny anyway);

      • Allow directives which list a subset of Deny conditions are irrelevant (the second word of the Order directive, deny, overrides);

      • Deny directives which list a subset of Allow conditions actually work as expected.

    • Another way to understand this is to sort all directives according to the Order, pretend that there is a from all directive that corresponds to the second word of the Order at the start (separate from other directives of its kind, which are sorted at the end), and that the last match wins.

    • See:

  2. Configuration sections are processed in a certain order. The section containing access directives that determines access is the last relevant one to be processed.

    • See:

    • Thus subdirectories override parent directories, Files sections override directories, Location sections override all.

    • Order is not inherited, an Allow or Deny directive in a section implies default Order (experimentally determined).

  3. Host-based access interacts with password-based access specified by AuthType and related directives via the Satisfy directive. If Satisfy any is in effect (from a previously processed section), Deny from all will not be completely effective.

    • This is why Dokuwiki has the following sequence in its .htaccess file to completely deny access to certain files:

      Order allow,deny
      Deny from all
      Satisfy All

URL rewriting

  1. See:

  2. Substitutions can target a filesystem path.

    • A substitution can only be considered a filesystem path if it starts with a slash.

    • The request path that is matched has initial slashes removed (experimentally determined), so substitutions which don't explicitly add an initial slash are never treated as filesystem paths.

    • The manual says that a path is considered a filesystem path if its first component matches a directory in the filesystem root. This would cause problems as a redirect to /images/$1 would start failing after mkdir /images. In fact, the substitution needs to match the entire DocumentRoot to be considered a filesystem path (experimentally determined).

    • Internally the current filesystem directory is prefixed to a target that is not a filesystem path (e.g. target.html/wwwroot/target.html), which turns it into a filesystem path for further processing.

  3. Substitutions can target another host with a full URL.

    • These are also hard to trigger by accident, because double slashes are removed from the request path (experimentally determined).

    • Hard but possible. Consider:

      • RewriteRule .* $0/secret_filename.php

      • The request for http://example.com/http:/ gets externally redirected to http://secret_filename.php.

    • An external redirect with [R] simply prepends the hostname. If the target is a relative path, the current filesystem directory is prefixed like in an internal redirect before this happens (e.g. target.htmlhttp://localhost/wwwroot/target.html). Specifying the correct HTTP directory in RewriteBase helps, as this replaces the filesystem directory in targets.

    • The body of the external redirect is a standard Apache status page. A custom page can be set with ErrorDocument 302, but this is not useful, since the redirect target is not passed to this page, so it can't send the correct Location header.

  4. [F] is equivalent to [R=403]. [G] is equivalent to [R=410]. [R=404] has no short form, but is also useful.

  5. Rewriting restarts processing after it rewrites the URL through an internal redirect.

    • For instance, on this website the implementation is obscured by forbidding .php extensions in URLs that are, in fact, served by PHP, through redirects.

      • In .htaccess:

        RewriteRule \.php$ - [F]
        RewriteCond %{DOCUMENT_ROOT}/$0.php -f
        RewriteRule .* $0.php [L]
    • This doesn't work on its own because after the [L] rule redirects, the [F] rule matches its target.

    • See:

    • The REDIRECT_STATUS environment variable is set after an internal redirect (experimentally determined), which can be used to prevent processing after a redirect.

    • At the beginning of the ruleset in .htaccess:

      RewriteCond %{ENV:REDIRECT_STATUS} .
      RewriteRule ^ - [L]
    • Examples using REDIRECT_STATUS: tips and tricks, cheat sheet.

Logging

  1. The combined log format includes some fields as sent by the client.

    • See:

    • The HTTP request line (%r) is included, enclosed in doublequotes.

      • A conforming client would encode doublequotes and backslashes that are part of the request line, but a malicious client can include these characters directly.

      • A regex for this field of the log (and other fields determined by request headers) is:

        • basic: /"([^"\]*\(\\x..\)*\(\\[^x]\)*)*"/

        • extended: /"([^"\]|\\x..|\\[^x])*"/

    • The authentication user name (%u), not enclosed in quotes.

      • The user name is not logged even if an Authorization header is sent, unless AuthType is enabled (experimentally determined).

      • But since the field is not enclosed in quotes, if authentication is enabled a malicious user can include spaces to confuse simple log parsers.

      • Special characters are quoted as specified in the note (referenced above) even though this field isn't mentioned (experimentally determined), so confusion is limited. In particular, it's not possible to produce the space-doublequote sequence that appears next to the %r field. However, it may be possible to spoof the date.

      • The same applies to the remote username (%l), enabled with IdentityCheck.

      • The same might apply to the remote host (%h), enabled with HostnameLookups.

      • Workaround:

        • LogFormat "%h \"%l\" \"%u\" %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined

  2. Other interesting log elements:

    • Connection state: %X and %k;

    • Processing time: %T and %D (%<D and %>D seem to be the same in case of an internal redirect via RewriteRule).

Compression

  1. mod_deflate codes only with gzip, never with deflate. The two formats use identical compression (zlib), but have different headers and footers. In deflate they take up at minimum 6 bytes; in gzip they take up at minimum 18 bytes. gzip has a better checksum, but for real integrity checking there is Content-MD5.

    • Most compressible files: AddOutputFilterByType DEFLATE text/plain text/html application/xhtml+xml text/css application/javascript application/ecmascript image/svg+xml application/atom+xml application/rss+xml text/xml application/xml text/csv application/json application/soap+xml

    • Maximum compression: DeflateCompressionLevel 9. This probably won't save more than a couple of bytes.