Robots .txt Generator

Robots.txt is the short form used by SEOs and tech-savvy webmasters to describe the robots exclusion standard. What this means is that the robots.txt instructs the search engine spiders, robots which areas of a website they should not visit. A simple, easy to use robots txt generator can be used to place these instructions in a website.

This standard was proposed in 1994 by Martijn Koster after a web crawler written by Charles Stross played havoc with Martijn’s site. Robots.txt has become the de facto standard which present day web crawlers follow and comply with. However spurious web crawlers that target websites to spread viruses and malware ignore robots.txt and visit the directories of websites that the robots.txt forbids crawlers from visiting. These malicious robots will not only ignore the robots.txt instructions but will visit the pages and directories that are forbidden to visit. That’s, how they spread malware and ruin sites.

When a search engine’s robot wants to visit a website, for example, let’s assume the website URL is http://www.examples.com/Greetings.html/ but before the search engine starts evaluating the site it checks if http://www.examples.com/robots.txt exists. It does exist, and it finds these two lines:

User-agent: *

Disallow: /

It will not inspect the site nor will it index it. In the first line robots.txt file ‘User-agent: *’ is instructing all search engines to follow its instructions and in the second line ‘Disallow: /’ it is instructing them not to visit any directories of the site.

There are two important factors which you must be aware of, these are:

  • Remember if you right click on any website you can view its source code. Therefore remember your robots.txt will be visible to public and anyone can see it and see which directories you have instructed the search robot not to visit.
  • Web robots may choose to ignore your robots.txt Especially malware robots and email address harvesters. They will look for website vulnerabilities and ignore the robots.txt instructions.

A typical robots.txt instructing search robots not to visit certain directories in a website will look like:

User-agent: *

Disallow: /aaa-bin/

Disallow: /tmp/

Disallow: /~mike/

This robots text is instructing search engines robots not to visit. You cannot put two disallow functions on the same line, for example, you cannot write: Disallow: /aaa-bin/tmp/. You have to instruct which directories you want to ignore explicitly. You cannot use generic names like Disallow: *.gif.

Remember to use lower case for your ‘robots.txt’ file name and not ‘ROBOTS.TXT.'

The virtual host has different meanings for different things. A virtual web host distinguishes using the domain name of different sites sharing the same IP address. The robots.txt can be placed in your domain code and will be read and executed by the search robot.

If you are sharing a host with other users, you will have to ask the host administrator to help you.

If you are an SEO or tech-savvy webmaster, you can create the robots.txt file on a Microsoft machine using notepad.exe or textpad.exe and even Microsoft Word. Just remember to save it as Plain Text.

On Apple Macintosh, you can use TextEdit using format ‘make plain text’ and save as western.

On Linux, you can use vi or emacs.

Once you have created your robots.txt file, you can copy/paste it in the header section of your website’s header code.

If you are an SEO or webmaster or developer, you can get assistance from searchenginereports.net site. Visit the website and click on ‘Free SEO Tools.' Scroll down the list of SEO tools till you hit Robots.txt generator tool.

Click on this tool’s icon, and it will open a page displaying: Robots.txt Generator.

  • Default - All Robots are: Default is ‘Allowed.'
  • Crawl-Delay: Default is ‘No Delay.'
  • Sitemap: (leave blank if you don't have)
  • Search Robots: Here all the robots will be listed on individual lines and the default will be same as the Default, which is ‘Allowed.'
  • Restricted Directories: Here you will specify the directories that you want to restrict the search robots from visiting. Remember to list one directory in each box.

After you have entered your restrictions; you can click on create Robots.txt or select ‘clear.' In case you have made any mistake in entering your requirements click on ‘clear’ and reenter the fields.

If you select the Create Robots.txt option, the system will generate the robots.txt file. You can then copy and paste it in the header of your website’s HTML code.

There are no restrictions on the number of times you can use this excellent free tool. In case you forgot to add a directory to restrict or want to add a new directory. You can use the Robots txt generator tool to do create the new file.

Remember if it’s a new directory you want to add, just list it in the Robots txt generator tool’s Restricted Directories. Once the file is generated, only copy/paste the line of the directory restricted into your existing robots.txt file in HTML.

You can enter all the restricted directories including the old and new ones and create a new robots.txt file which you can cut and paste after you delete the previous robots.txt file from the HTML source.

As you are going to be tampering with your website’s source code, be very careful. Don’t try any experiments in creating robots.txt you could crash your site inadvertently.

If you have developed your website in WordPress, you can seek assistance from WordPress robots.txt plugin, how to create robots.txt in WordPress and several other sites including WordPress.

In HTML you can also get help from robot.txt example.

Remember robots.txt is the portion where you are issuing instructions to the search engines robots on which directories they should not visit. Also in the robots.txt, you can instruct them not to follow the external links of your website. But you cannot generate that using searchenginereports.exe unless they have been placed in a separate directory.