web-search.rst
changeset 960 a898726dc330
child 962 eff78cdd312f
equal deleted inserted replaced
959:32194b963ea8 960:a898726dc330
       
     1 .. -*- coding: utf-8 -*-
       
     2 
       
     3 =============
       
     4  WEB search.
       
     5 =============
       
     6 .. contents::
       
     7 
       
     8 Disable page indexing by search engine.
       
     9 =======================================
       
    10 
       
    11 Add to html page in head tag such code::
       
    12 
       
    13   <meta name="ROBOTS" content="NOINDEX,NOFOLLOW" />
       
    14 
       
    15 Dictionary Search.
       
    16 ==================
       
    17 
       
    18   http://www.onelook.com/
       
    19 
       
    20 Google historical corpus statistics.
       
    21 ====================================
       
    22 
       
    23   http://ngrams.googlelabs.com/
       
    24 
       
    25 Google search query syntax.
       
    26 ===========================
       
    27 
       
    28   http://www.google.com/support/websearch/bin/answer.py?answer=136861
       
    29                 Google search basics: More search help
       
    30   http://www.google.ru/help/operators.html
       
    31                 Advanced Operators
       
    32   http://code.google.com/intl/ru/apis/soapsearch/reference.html
       
    33                 Google SOAP Search API Reference
       
    34   http://www.google.com/cse/docs/resultsxml.html
       
    35                 Google WebSearch Protocol Reference for Google Site Search
       
    36   http://en.wikipedia.org/wiki/Google_Search
       
    37 
       
    38 Phrase Search.
       
    39 --------------
       
    40 
       
    41 Use double quotes to search exactly mutch of string. Words marked in this way
       
    42 will appear together in all results exactly as entered::
       
    43 
       
    44   "WORD1 WORD2 WORD3"
       
    45 
       
    46 Note: You may need to use a "+" to force inclusion of common words in a phrase.
       
    47 
       
    48 Boolean OR Search.
       
    49 ------------------
       
    50 
       
    51 "OR" capital is essential::
       
    52 
       
    53   WORD1 OR WORD2
       
    54 
       
    55 Remove site from search by "-site:"::
       
    56 
       
    57   WORD1 WORD2 -site:ebay.com -site:shopping.com
       
    58 
       
    59 Include query term (search exactly as is).
       
    60 ------------------------------------------
       
    61 
       
    62 If a common word is essential to getting the results you want, you can include
       
    63 it by putting a "+" sign in front of it::
       
    64 
       
    65   +WORD WORD1 WORD2
       
    66 
       
    67 Exclude query term.
       
    68 -------------------
       
    69 
       
    70 You can exclude a word from your search by putting a minus sign ("-")
       
    71 immediately in front of the term you want to exclude from the search results::
       
    72 
       
    73   WORD1 WORD2 -WORD
       
    74 
       
    75 Fill in the blanks.
       
    76 -------------------
       
    77 ::
       
    78 
       
    79   GNU *
       
    80   Mozilla *
       
    81 
       
    82 Site Restricted Search.
       
    83 -----------------------
       
    84 ::
       
    85 
       
    86   site:example.com WORD1 WORD2
       
    87   site:.gov WORD
       
    88 
       
    89 Cached Results Page.
       
    90 --------------------
       
    91 
       
    92 The query prefix "cache:" returns the cached HTML version of the specified web
       
    93 document that the Google search crawled. Note there can be no space between
       
    94 "cache:" and the web page URL. If you include other words in the query, Google
       
    95 will highlight those words within the cached document::
       
    96 
       
    97   cache:www.google.com
       
    98 
       
    99 Use Google as a free proxy (if direct access bloked): cache:example.com
       
   100 
       
   101 Title Search.
       
   102 -------------
       
   103 
       
   104 Restricts the results to those with all of the query words in the title::
       
   105 
       
   106   intitle:WORD1 intitle:WORD2 WORD3
       
   107   allintitle:WORD1 WORD2
       
   108 
       
   109 Note: Putting "intitle:" in front of every word in your query is equivalent to
       
   110 putting "allintitle:" at the front of your query.
       
   111 
       
   112 URL Search.
       
   113 -----------
       
   114 
       
   115 If you prepend "inurl:" to a query term, Google search restricts the results to
       
   116 documents containing that word in the result URL. Note there can be no space
       
   117 between the "inurl:" and the following word.
       
   118 
       
   119 Starting a query with the term "allinlinks:" restricts the results to those with
       
   120 all of the query words in the URL links on the page::
       
   121 
       
   122   inurl:WORD1 inurl:WORD2 WORD
       
   123   allinurl: WORD1 WORD2
       
   124 
       
   125 Note: "inurl:" works only on words, not URL components. In particular, it
       
   126 ignores punctuation and uses only the first word following the "inurl:"
       
   127 operator. To find multiple words in a result URL, use the "inurl:" operator for
       
   128 each word.
       
   129 
       
   130 Note: Putting "inurl:" in front of every word in your query is equivalent to
       
   131 putting "allinurl:" at the front of your query.
       
   132 
       
   133 Link anchor search.
       
   134 -------------------
       
   135 
       
   136 Searches for text in a page's link anchors. A link anchor is the descriptive
       
   137 text of a link::
       
   138 
       
   139   inanchor:"WORD1 WORD2"
       
   140 
       
   141 Text Only Search.
       
   142 -----------------
       
   143 
       
   144 Starting a query with the term "allintext:" restricts the results to those with
       
   145 all of the query words in only the body text, ignoring link, URL, and title
       
   146 matches::
       
   147 
       
   148   intext:WORD
       
   149   allintext: WORD1 WORD2
       
   150 
       
   151 File Type Filtering.
       
   152 --------------------
       
   153 
       
   154 The query prefix "filetype:" filters the results returned to include only
       
   155 documents with the extension specified immediately after. Note there can be no
       
   156 space between "filetype:&quot; and the specified extension::
       
   157 
       
   158   WORD filetype:doc OR filetype:pdf
       
   159 
       
   160 File Type Exclusion.
       
   161 --------------------
       
   162 
       
   163 The query prefix "-filetype:" filters the results to exclude documents with the
       
   164 extension specified immediately after. Note there can be no space between
       
   165 "-filetype:" and the specified extension::
       
   166 
       
   167   WORD -filetype:doc -filetype:pdf
       
   168 
       
   169 Web Document Info.
       
   170 ------------------
       
   171 
       
   172 The query prefix "info:" returns a single result for the specified URL if it
       
   173 exists in the index::
       
   174 
       
   175   info:www.google.com
       
   176 
       
   177 Note: No other query terms can be specified when using this special query term.
       
   178 
       
   179 Back Links.
       
   180 -----------
       
   181 
       
   182 The query prefix "link:" lists web pages that have links to the specified web
       
   183 page::
       
   184 
       
   185   link:www.google.com
       
   186 
       
   187 Note: there can be no space between "link:" and the web page URL.
       
   188 
       
   189 Note: No other query terms can be specified when using this special query term.
       
   190 
       
   191 Related Links.
       
   192 --------------
       
   193 
       
   194 Lists web pages that are similar to the specified web page::
       
   195 
       
   196   related:www.google.com
       
   197 
       
   198 Note: there can be no space between "related:" and the web page URL.
       
   199 
       
   200 Note: No other query terms can be specified when using this special query term.
       
   201 
       
   202 Word definition.
       
   203 ----------------
       
   204 
       
   205 The query prefix "define:" will provide a definition of the words listed after
       
   206 it::
       
   207 
       
   208   define:WORD
       
   209 
       
   210 Google Code.
       
   211 ============
       
   212 
       
   213   http://code.google.com/
       
   214   http://www.google.com/help/faq_codesearch.html
       
   215 
       
   216 file:
       
   217 -----
       
   218 ::
       
   219 
       
   220   file:\.(x|abc)$
       
   221 
       
   222 lang:
       
   223 -----
       
   224 ::
       
   225 
       
   226   lang:"c++", -lang:java
       
   227   lang:^(c|c#|c\+\+)$
       
   228 
       
   229 license:
       
   230 --------
       
   231 ::
       
   232 
       
   233   license:apache,-license:gpl
       
   234   license:bsd|mit
       
   235 
       
   236 package:
       
   237 --------
       
   238 ::
       
   239 
       
   240   package:"www.kernel.org"
       
   241   package:\.tgz$
       
   242 
       
   243 Yahoo search query syntax.
       
   244 ==========================
       
   245 
       
   246   http://help.yahoo.com/l/uk/yahoo/search/basics/index.html
       
   247                 Yahoo! Search Help Topics
       
   248   http://help.yahoo.com/l/uk/yahoo/search/basics/basics-04.html
       
   249                 Search Tips
       
   250   http://help.yahoo.com/l/uk/yahoo/search/basics/basics-08.html
       
   251                 What is Advanced Search?
       
   252   http://help.yahoo.com/l/uk/yahoo/search/basics/basics-19.html
       
   253                 How do I search for a specific URL, sub-page, or find sites that link to mine?
       
   254 
       
   255 All of these words.
       
   256 -------------------
       
   257 
       
   258 Includes all of the words you typed in the search box. This is similar to
       
   259 inserting "AND" between words or the symbol "+" before a word.
       
   260 
       
   261 At least one of these words.
       
   262 ----------------------------
       
   263 
       
   264 Searches for results that match either one or more of the words. This is similar
       
   265 to inserting "OR" between the words.
       
   266 
       
   267 Exact phrase.
       
   268 -------------
       
   269 
       
   270 Searches for the words in exactly the order you enter them. This is similar to
       
   271 putting quotes (" ") around a set of words.
       
   272 
       
   273 None of these words.
       
   274 --------------------
       
   275 
       
   276 Excludes words from your search. This is similar to inserting "NOT" between the
       
   277 words or the symbol "-" before a word.
       
   278 
       
   279 site:
       
   280 -----
       
   281 
       
   282 This allows one to find all documents within a particular domain and all its
       
   283 subdomains.
       
   284 
       
   285 To exclude DOMAIN from search::
       
   286 
       
   287   -site:DOMAIN
       
   288 
       
   289 hostname:
       
   290 ---------
       
   291 
       
   292 This allows one to find all documents from a particular host only.
       
   293 
       
   294 link:
       
   295 -----
       
   296 
       
   297 This allows one to find documents that link to a particular URL.
       
   298 
       
   299 url:
       
   300 ----
       
   301 
       
   302 This allows one to find a specific document in our index.
       
   303 
       
   304 inurl:
       
   305 ------
       
   306 
       
   307 This allows one to find a specific keyword as part of indexed URLs.
       
   308 
       
   309 intitle:
       
   310 --------
       
   311 
       
   312 This allows one to find a specific keyword as part of the indexed titles.
       
   313 
       
   314 Back links.
       
   315 -----------
       
   316 ::
       
   317 
       
   318   linkdomain:DOMAIN
       
   319 
       
   320 Bing search query syntax.
       
   321 =========================
       
   322 
       
   323   http://onlinehelp.microsoft.com/en-WW/bing/ff808535.aspx
       
   324                 Bing Help
       
   325   http://onlinehelp.microsoft.com/en-us/bing/ff808438.aspx
       
   326                 Advanced search options
       
   327   http://onlinehelp.microsoft.com/en-us/bing/ff524480.aspx
       
   328                 Search effectively
       
   329   http://onlinehelp.microsoft.com/en-us/bing/ff808421.aspx
       
   330                 Advanced search keywords
       
   331 
       
   332 "+"
       
   333 ---
       
   334 
       
   335 Finds webpages that contain all the terms that are preceded by the + symbol.
       
   336 Also allows you to include terms that are usually ignored.
       
   337 
       
   338 " "
       
   339 ---
       
   340 
       
   341 Finds the exact words in a phrase.
       
   342 
       
   343 "()"
       
   344 ----
       
   345 
       
   346 Finds or excludes webpages that contain a group of words.
       
   347 
       
   348 AND or &.
       
   349 ---------
       
   350 
       
   351 Finds webpages that contain all the terms or phrases.
       
   352 
       
   353 NOT or -.
       
   354 ---------
       
   355 
       
   356 Excludes webpages that contain a term or phrase.
       
   357 
       
   358 OR or |.
       
   359 --------
       
   360 
       
   361 Finds webpages that contain either of the terms or phrases.
       
   362 
       
   363 contains:
       
   364 ---------
       
   365 
       
   366 Keeps results focused on sites that have links to the file types that you
       
   367 specify::
       
   368 
       
   369   contains:wma
       
   370 
       
   371 filetype:
       
   372 ---------
       
   373 
       
   374 Returns only webpages created in the file type that you specify::
       
   375 
       
   376   filetype:pdf
       
   377 
       
   378 inanchor: or inbody: or intitle:
       
   379 --------------------------------
       
   380 
       
   381 These keywords return webpages that contain the specified term in the metadata,
       
   382 such as the anchor, body, or title of the site, respectively. Specify only one
       
   383 term per keyword. You can string multiple keyword entries as needed.
       
   384 
       
   385 ip:
       
   386 ---
       
   387 
       
   388 Finds sites that are hosted by a specific IP address. The IP address must be a
       
   389 dotted quad address. Type the ip: keyword, followed by the IP address of the
       
   390 website.
       
   391 
       
   392 language:
       
   393 ---------
       
   394 
       
   395 Returns webpages for a specific language. Specify the language code directly
       
   396 after the language: keyword. You can also access this function using the Search
       
   397 Builder Language function. For more information about using Search Builder, see
       
   398 Use advanced search.
       
   399 
       
   400 loc: or location:
       
   401 -----------------
       
   402 
       
   403 Returns webpages from a specific country or region. Specify the country or
       
   404 region code directly after the loc: keyword. To focus on two or more languages,
       
   405 use a logical OR to group the languages::
       
   406 
       
   407   WORD1 WORD2 (loc:US OR loc:GB)
       
   408 
       
   409 prefer:
       
   410 -------
       
   411 
       
   412 Adds emphasis to a search term or another operator to help focus the search
       
   413 results.
       
   414 
       
   415 site:
       
   416 -----
       
   417 
       
   418 Returns webpages that belong to the specified site. To focus on two or more
       
   419 domains, use a logical OR to group the domains. You can use site: to search for
       
   420 web domains, top level domains, and directories that are not more than two
       
   421 levels deep. You can also search for webpages that contain a specific search
       
   422 word on a site.
       
   423 
       
   424 feed:
       
   425 -----
       
   426 
       
   427 Finds RSS or Atom feeds on a website for the terms you search for.
       
   428 
       
   429 hasfeed:
       
   430 --------
       
   431 
       
   432 Finds webpages that contain an RSS or Atom feed on a website for the terms you
       
   433 search for::
       
   434 
       
   435   site:www.nytimes.com hasfeed:football
       
   436 
       
   437 url:
       
   438 ----
       
   439 
       
   440 Checks whether the listed domain or web address is in the Bing index.
       
   441