www/HACKING.rst
changeset 642 c1032aea6265
parent 636 bc521aba85bc
child 647 6ae5399c8087
equal deleted inserted replaced
641:a49a091d8231 642:c1032aea6265
   267   OANC download page.
   267   OANC download page.
   268 
   268 
   269 http://www.anc.org/data/oanc/
   269 http://www.anc.org/data/oanc/
   270   OANC home page.
   270   OANC home page.
   271 
   271 
       
   272 https://en.wikipedia.org/wiki/Word_lists_by_frequency
       
   273 
       
   274 Useful word lists:
       
   275 
       
   276 
       
   277 https://en.wikipedia.org/wiki/Academic_Word_List
       
   278   Academic Word List at Wikipedia.
       
   279 https://web.archive.org/web/20080212073904/http://language.massey.ac.nz/staff/awl/headwords.shtml
       
   280   Academic Word List by Averil Coxhead created in 2000 as addition to GSL and
       
   281   has 570 headwords.
       
   282 
       
   283 Obsolete or proprietary word list:
       
   284 
       
   285 https://en.wikipedia.org/wiki/Basic_English
       
   286   850 headword list created in 1930.
       
   287 
       
   288 General Service List
       
   289 --------------------
       
   290 
       
   291 Updated GSL (General Service List) was obtained from:
       
   292 
       
   293 http://jbauman.com/gsl.html
       
   294   A 1995 revised version of the GSL with minor changes by John Bauman. He added
       
   295   284 new headwords to original 2000 word list created by Michael West in 1953.
       
   296 
       
   297 First column represents the number of occurrences per 1,000,000 words of the
       
   298 Brown corpus based on counting word families.
       
   299 
       
   300 https://en.wikipedia.org/wiki/General_Service_List
       
   301   General Service List at Wikipedia.
       
   302 http://jbauman.com/aboutgsl.html
       
   303   About the General Service List by John Bauman.
       
   304 
       
   305 New General Service List
       
   306 ------------------------
       
   307 
       
   308 NGSL was obtained from:
       
   309 
       
   310 http://www.newgeneralservicelist.org/s/NGSL-101-by-band-qq9o.xlsx
       
   311   Microsoft XLS file with headword, frequency and SFI.
       
   312 
       
   313 First column represents the adjusted frequency per 1,000,000 words and counting
       
   314 base word families.
       
   315 
       
   316 Academic Word List
       
   317 ------------------
       
   318 
       
   319 The Academic Word List (AWL) was published in the Summer, 2000 issue of the
       
   320 TESOL Quarterly (v. 34, no. 2). It was devloped by Averil Coxhead, of Victoria
       
   321 University of Wellington, in New Zealand. The AWL is a replacement for the
       
   322 University Word List (published by Paul Nation in 1984).
       
   323 
       
   324 AWL (Academic Word List) is obtained from:
       
   325 
       
   326 https://web.archive.org/web/20081014065815/http://language.massey.ac.nz/staff/awl/download/awlheadwords.rtf
       
   327   Original Academic Word List in RTF format.
       
   328 
       
   329 Its structure is headword following by frequency level (from 1 as most frequent
       
   330 to 10 as least frequent).
       
   331 
       
   332 New Academic Word List
       
   333 ----------------------
       
   334 
       
   335 Frequency word list was obtained from:
       
   336 
       
   337 http://www.newacademicwordlist.org/s/NAWL_SFI.csv
       
   338   CSV with colums ``Word,SFI,U,D``.
       
   339 
       
   340 ``SFI`` and ``D`` columns was deleted and ``U`` and ``Word`` column was swapped.
       
   341 Data was sorted by ``U`` column (adjusted frequency per 1,000,000 words).
       
   342 
       
   343 NSWL headword list with word variations was obtained from:
       
   344 
       
   345 http://www.laurenceanthony.net/software/antwordprofiler/
       
   346   Laurence Anthony's AntWordProfiler home page.
       
   347 
       
   348 It is encoded in ``latin-1`` and recoded into ``utf-8`` (because of ``É``
       
   349 symbol).
       
   350 
       
   351 See also:
       
   352 
       
   353 http://www.newacademicwordlist.org/
       
   354   Home page.
       
   355 
       
   356 Special English word list
       
   357 -------------------------
       
   358 
       
   359 https://en.wikipedia.org/wiki/Special_English
       
   360   Special English is a controlled version of the English languageused by the
       
   361   United States broadcasting service Voice of America (VOA). 1557 headwords.
       
   362 
       
   363 BNC+COCA wordlist
       
   364 -----------------
       
   365 
       
   366 Paul Nation prepare frequency wordlist from combined BNC and COCA corpus:
       
   367 
       
   368 http://www.victoria.ac.nz/lals/about/staff/paul-nation
       
   369   Paul Nation's home page and list download page.
       
   370 https://simple.wiktionary.org/wiki/Wiktionary:BNC_spoken_freq
       
   371   About list on Wikimedia.
       
   372 
       
   373 It has 25000 basewords (and each baseword comes with variations) splited into
       
   374 chunks by 1000 words.
       
   375 
       
   376 I get list from:
       
   377 
       
   378 http://www.laurenceanthony.net/software/antwordprofiler/
       
   379   Laurence Anthony's AntWordProfiler home page.
   272 
   380 
   273 Register gadict dictionaries for dictd under Debian
   381 Register gadict dictionaries for dictd under Debian
   274 ===================================================
   382 ===================================================
   275 ::
   383 ::
   276 
   384