to follow or not to follow? there are valid reasons for both.
by becky livingston
seo for cpas: the accountant’s handbook
website crawlers – or spiders or robots – index content on web pages in an automated or methodical manner. they often create a copy of all the pages they visited for later processing by a search engine.
more: 12 seo myths | quick tip: accelerated mobile pages | why you need an ssl certificate now | why tracking urls are so important | what goes into a buyer persona? | what is seo and why is it important?
exclusively for pro members. log in here or 2022世界杯足球排名 today.
crawlers are handy programs used to automate maintenance tasks, such as link checking or code validation. they also can be used to gather information from web pages, such as email addresses.
you make the decision to have google crawl and index all or some of your pages by using a robots.txt file. that file is created by a webmaster to instruct web robots on how to crawl the website. there are two settings: “disallow” and “allow.”
in wordpress, depending on the seo plugin you’re using, you can simply click a button at the bottom of a page or post to determine its “crawlability.” if you’re using the all in one seo plugin, a page’s seo elements would look like the following:
in this instance, noindex means you do not want the search engines to index this page.
why would you do that? in the case when you’re using a portal login page, payment page or other sensitive page that you only want people to access who have the link.
the nofollow means you don’t want the search engines to follow the links on this page. you might use that feature when you’re leveraging user-generated content on your site with the person’s name or website url. if you check this box, even if you make the content link active, the search engine will not follow the link from your page and “append” it to your link rank.
site rank leverages the links your site is connected to when determining your overall site ranking. so, if you were to link to a bunch of poorly ranked sites, it might negatively impact your own site ranking.
disabling the page’s seo (the last checkbox) will prevent this page from being indexed and tied to your site’s overall ranking.
when would you do that?
whenever you want to hide information that you don’t want the crawlers to list on the search results page, or when you have a page on your site that you don’t want available to the public, but have available for linking purposes. let’s say you have a seasonal tax associate employment page. you might want to hide that from the seo so people don’t find it six months down the road when you’re out of tax season.
another example might be you’ve put a slightly sensitive document on your site for some to review. you would want to disable that page so the crawlers don’t pick it up – but you have to remember to remove the page once you’re done with it.
here’s why. even though you’ve marked a page as disabled, the robots.txt file is still publicly accessible. anyone with a little programming background can look up those pages and find the ones you’re hiding. the best rule of thumb is to
- post what you need,
- check the boxes so the engines do not crawl/index the page, and then
- remove it when you don’t need it any longer.
make a backup of the page’s code (and a screenshot so you know what it looked like) before you delete it so you can easily create the page again if needed.
what does this look like if you’re not using a wordpress site?
the web developer would create a robots.txt file that includes two things, the user-agent and the function: disallowing or allowing. it looks like this:
user-agent: googlebot
disallow: /subfolder/webpagename.html
in that instruction, you’re telling the google bot not to crawl the web page on your site in the subfolder location called webpagename.html. the robot.txt file is stored on the main level to the site map and may include a series of disallows in one file.