It is a text file created by webmasters to instruct the search engine crawlers. This file is generally located in the root directory of a website. When the crawlers or spiders start to crawl any website, they first check the root directory of that website for the robots.txt file. If they found, they follow the instruction from robots.txt on how to crawl the website. The robots.txt file tells the spiders to crawl or not crawl any particular webpage. You can create a robots.txt file using a Robots file generator.
♦ User-Agent: This specifies the rule applied to all the robots e.g Google Bot.
♦ Disallow: This specifies the pages you want to block the bots from accessing.
♦ Noindex: This specifies the pages you want a search engine to block AND not index.
♦ # (hash Symbol): This may be used for comments within a robots.txt file.
♦ Remember Each User-Agent/Disallow group should be separated by a blank line; however, no blank lines should exist within a group (between the User-agent line and the last Disallow).
Allowing all web crawlers access to all content: The syntax below tells all web crawlers to crawl any pages.
Blocking all web crawlers from all content.The syntax below tells all web crawlers not to crawl any pages.
Blocking a specific web crawler from a specific folder. The syntax below tells not to crawl any pages that contain the URL.
Blocking a specific web crawler from a specific web page. The syntax below tells Bing’s crawler (user-agent name Bing) to avoid crawling the specific page at a given Url.
Disallow: /example-subfolder/blocked page.html