web-site.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Tue, 10 Dec 2019 19:31:15 +0200
changeset 2391 aedbd074ec54
parent 2228 837f1337c59b
permissions -rw-r--r--
Added extra examples of queries.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
1334
9bf0d5a1f0cf Include common header with quick links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1331
diff changeset
     1
.. -*- coding: utf-8; -*-
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     2
2072
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
     3
==========
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
     4
 Web site
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
     5
==========
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     6
.. contents::
1905
fba288d59662 Include only local subsections into TOC. This prevent duplication of
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1334
diff changeset
     7
   :local:
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
     8
2072
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
     9
Speeding up web site loading
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    10
============================
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    11
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    12
* http://developer.yahoo.com/performance/rules.html
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    13
2072
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    14
robots.txt
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    15
==========
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    16
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    17
To exclude all robots from the entire server::
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    18
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    19
  User-agent: *
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    20
  Disallow: /
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    21
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    22
To exclude all robots from part of the server::
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    23
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    24
  User-agent: *
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    25
  Disallow: /cgi-bin/
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    26
  Disallow: /tmp/
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    27
  Disallow: /junk/
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    28
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    29
To allow a single robot::
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    30
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    31
  User-agent: Google
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    32
  Disallow:
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    33
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    34
  User-agent: *
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    35
  Disallow: /
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    36
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    37
To allow all robots complete access::
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    38
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    39
  User-agent: *
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    40
  Disallow:
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    41
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    42
See:
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    43
2073
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    44
http://www.robotstxt.org/
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    45
  Page provides description for robots.txt usual practice and discussion about
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    46
  possible standardization efforts.
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    47
http://www.robotstxt.org/robotstxt.html
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    48
  About /robots.txt
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    49
http://www.robotstxt.org/faq.html
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    50
  Frequently Asked Questions.
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    51
https://en.wikipedia.org/wiki/Robots_exclusion_standard
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    52
  Wikipedia article on robots.txt.
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    53
http://googlewebmastercentral.blogspot.com/2008/06/improving-on-robots-exclusion-protocol.html
1abbd5a7db80 Add Wikipedia article.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2072
diff changeset
    54
  Improving on Robots Exclusion Protocol.
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    55
2072
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    56
Sitemap
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    57
=======
1331
7d93a4940822 Sitemap.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1325
diff changeset
    58
7d93a4940822 Sitemap.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1325
diff changeset
    59
Sitemaps protocol allows a webmaster to inform search engines about URLs on a
7d93a4940822 Sitemap.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1325
diff changeset
    60
website that are available for crawling.
7d93a4940822 Sitemap.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1325
diff changeset
    61
2077
94a39ed90fca Reindent text.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2073
diff changeset
    62
http://www.sitemaps.org/protocol.html
94a39ed90fca Reindent text.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2073
diff changeset
    63
  Sitemap protocol.
94a39ed90fca Reindent text.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2073
diff changeset
    64
http://en.wikipedia.org/wiki/Sitemaps
94a39ed90fca Reindent text.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2073
diff changeset
    65
  Wikipedia article.
1331
7d93a4940822 Sitemap.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1325
diff changeset
    66
2072
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    67
Web document structure useage
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    68
=============================
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    69
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    70
http://dev.opera.com/articles/view/mama/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    71
  Metadata Analysis and Mining Application
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    72
2072
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    73
Validation
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    74
==========
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    75
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    76
* http://validator.w3.org/
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    77
2072
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    78
Add search to your site
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    79
=======================
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    80
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    81
http://www.google.com/support/customsearch/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    82
  Custom Search Help
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    83
http://help.yahoo.com/l/uk/yahoo/search/basics/basics-13.html
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    84
  Can I add a Yahoo! Search box to my site?
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    85
2072
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    86
Check websites for broken links
72921b56230b Remove dots from headers.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 1912
diff changeset
    87
===============================
1325
ea51f96a6a47 Check websites for broken links.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents:
diff changeset
    88
2228
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    89
http://linkchecker.sourceforge.net/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    90
  linkchecker home page.
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    91
http://arthurdejong.org/webcheck/
837f1337c59b Removed indentation that compiled into <blockquote>.
Oleksandr Gavenko <gavenkoa@gmail.com>
parents: 2077
diff changeset
    92
  webcheck home page.