First of all, having an .htaccess file isn’t required in order for your site to function properly. However, it can be extremely useful, as it can perform a number of functions in a relatively simple fashion. Your .htaccess file will be a standard ASCII .txt file, built in a text editor such as Notepad (do NOT use Word or Wordpad, or any editor which utilizes word-wrapping, as these will render the file unusable).
Some of the common functions of .htaccess include:
1. Custom Error Messages
Identifying custom pages to be displayed in the event of a specific error code, such as for a 404 error. This would be encoded in this fashion:
[code]ErrorDocument 404 /errors/notfound.html[/code]
where the custom file notfound.html is located in the errors folder. Thus, whenever a URL cannot be found, that custom 404 page would be displayed, rather than a generic message.
2. Password Protection
If you want to limit access to certain sections of your site to specific users, you can do so in your .htaccess file. You would create another ASCII file entitled .htpasswd, in which your authorized user names and their passwords are listed, and only those users would be allowed access, after entering the appropriate user name and password (we won’t be going into detail on this function today, though – that’s a topic for another post altogether).
3. Enabling SSI
If your host permits it, you can use .htaccess to enable SSI, but it’s wise to check first, as the TOS of a few hosting companies may not allow .shtml. It’s a simple matter of informing the server that any file with a filetype of .shtml should be parsed for server side commands, in this fashion:
[code]
AddType text/html .shtml
AddHandler server-parsed .shtml
Options Indexes FollowSymLinks Includes
[/code]
4. Blocking Access by IP
You can also block users by specific IP address or by IP block, by a simple allow,deny directive, such as:
[code]
order allow,deny
deny from 123.45.6.7
deny from 012.34.56.
allow from all
[/code]
This will block the specific user at IP 123.45.6.7 and any IPs in the 012.34.56 block (012.34.56.001, 012.34.56.002, 012.34.56.003, etc.).
5. Blocking Access by Referer
If you find traffic coming from a site you’d like to exclude, you can do so by a referer block easily enough:
[code]
RewriteEngine on
RewriteCond %{HTTP_REFERER} badsite.com [NC]
RewriteRule .* - [F][/code]
(Notice the backslash ( ) before the .com in the URL. This is to call out the “.” as a period, as in regex.)
This will stop any traffic coming from badsite.com, as it makes it case nonsensitive (such as BADSite.com, BadSite.com, etc.), and will return a 403 error code. You can even specify multiple referers, thus:
[code]
RewriteEngine on
RewriteCond %{HTTP_REFERER} badsite.com [NC,OR]
RewriteCond %{HTTP_REFERER} anotherbadsite.com
RewriteRule .* - [F]
[/code]
if your server doesn’t handle FollowSymlinks, you may get a 500 Internal Server error when you use the code above, as is. If so, it’s likely because your server doesn’t handle FollowSymlinks in its httpd.conf, so you’ll simply need to add this line, before the first RewriteCond line:
[code]
Options +FollowSymlinks
[/code]
6. Blocking Bad Bots
Sometimes you’ll want to block a bot that is causing you problems by downloading all of your pages for offline reading, or scraping your emails. Since these bots rarely will pay any attention to your robots.txt directives, you’ll want to ban their user agents in .htaccess. The process is simple, once you identify the bots you want to keep out.
[code]
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw
[/code]
Any of the bots you list will be served a 403 error when they try to view your site, which can save you bandwidth and server resources.
7. Redirects
If you remove pages that have already been indexed, you may want to redirect them rather than letting them go to a 404, in order to provide a new path for any incoming links or traffic.
[code]RewriteEngine On
Redirect 301 /oldpage.htm http://www.mydomain.com/newpage.htm
Redirect 301 /myfolder/oldpage.htm http://www.mydomain.com/newfolder/newpage.htm[/code]
8. Prevent Viewing of your htaccess
As a general rule, it’s always a good idea to keep your .htaccess from being viewed. This is done with a simple deny directive:
[code]
<Files .htaccess>
order allow,deny
deny from all
</Files>
[/code]
There are other functions that can be handled in your .htaccess, but the above will handle most of the common requirements. As in most coding, it’s always prudent to save a backup copy of your existing file before you upload the new one, just in case of a problem. An error in your .htaccess can have major repercussions, even making your site inaccessible (hence, invisible) to the search engines, so take care.