r.rst
author Oleksandr Gavenko <gavenkoa@gmail.com>
Thu, 03 Jan 2019 22:13:18 +0200
changeset 2334 c44e4331713c
parent 1943 3206ad475871
permissions -rw-r--r--
merged

.. -*- coding: utf-8 -*-

===
 R
===
.. contents::
   :local:

Inspecting objects
==================

Info about object dimensions::

  length(c(1,2,3))
  dim(matrix(1:6, 2, 3))
  ncol(matrix(1:6, 2, 3))
  nrow(matrix(1:6, 2, 3))

Brief info about any object::

  typeof(str)
  class(str)
  unclass(str)
  str(c(1, 2))
  str(summary)

Column names of datasets::

  names(...)
  names(list(colA=1, colB=2))

Column/row names of matrixes::

  colnames(matrix(...))
  rownames(matrix(...))

List objects in global context: ``ls()``.

Objext size in memory: ``object.site(1:2)``

Interactive session
===================

Controlling output precision::

  options(digits=3)

List of all options::

  str(options())

Debugging
=========

To mark function for debugging call::

  debug(fun, text = "", condition = NULL)
  debugonce(fun, text = "", condition = NULL)

To return function to normal execution::

  undebug(fun)
  isdebugged(fun)

You can under to debug mode in any piece of code by calling ``browser``.

``traceback`` prints out the function call stack after an error occurs; does
nothing if there's no error.

``trace`` allows you to insert debugging code into a function a specific places.

``recover`` allows you to modify the error behavior so that you can browse the
function call stack.

Profiling
=========

How long execution of expression takes (in low sec/milisec resolution)::

  system.time(expr, gcFirst = TRUE)
  unix.time(expr, gcFirst = TRUE)

``Rprof`` function enable global profiling. ``summaryRprof`` function decrypt
profiling data::

  Rprof()       ## start profiling
  Rprof(NULL)   ## suspend profiling
  Rprof(append = TRUE)  ## resume profiling
  Rprof(NULL)   ## end profiling
  summaryRprof() ## investigate profiling report

Generating random numbers
=========================

For each distribution there are exists corresponding generation function, named
with prefix ``r``::

  rnorm(n, mean = 0, sd = 1)
  rt(n, df, ncp)
  rbinom(n, size, prob)
  rpois(n, lambda)
  runif(n, min = 0, max = 1)
  rexp
  rchisq
  rgamma

In order to generate predictable sequences use::

  set.seed(seed, kind = NULL, normal.kind = NULL)

Sampling from array::

  sample(x, size, replace = FALSE, prob = NULL)
  sample.int(n, size = n, replace = FALSE, prob = NULL)

  sample(1:10, 10)  ## permutation!!
  sample(1:10, 100, replace=TRUE)

Looping over data
=================

``lapply`` iterate over data and return list with result of function
application::

  lapply(1:5, function(x) x^2)
  lapply(matrix(rnorm(20*10),20,10), mean)

Usually you don't need a list but a vector. ``sapply`` works like ``lapply`` but
also try to convert result to matrix or vector is dimantions and elvement types
permit this::

  lapply(list(1:5), mean)
  [[1]]
  [1] 3
  sapply(list(1:5), mean)
  [1] 3

``apply`` works on specific dimension of data so useful to work with matrixes
and data frames::

  apply(matrix(1:6, 2, 3), 1, min)
  [1] 1 2

  apply(matrix(1:6, 2, 3), 2, max)
  [1] 2 4 6

  apply(array(rnorm(2*2*10), c(2, 2, 10)), c(1, 2), mean)
             [,1]       [,2]
  [1,] -0.2733804  0.3154234
  [2,]  0.1830982 -0.5889010

``colSums``, ``rowSums``, ``colMeans``, ``rowMeans`` is defined as optimized
equivalent for::

  rowSums = apply(x, 1, sum)
  colSums = apply(x, 2, sum)
  rowMeans = apply(x, 1, mean)
  colMeans = apply(x, 2, mean)

``split`` partitioning data on factor (analog of SQL ``group by``)::

  data<-data.frame(rnorm(10),rbinom(10,1,prob=.7))
  sdata<-split(data[,1],data[,2])
  lapply(sdata,mean)

Exploring data
==============

Check `Inspecting objects`_ section.

Investigating unique values::

  sapply(data, unique)
  sapply(data$col, unique)
  sapply(data[,c("col1","col2")], unique)
  sapply(data[,5:10], unique)

  table(data$col)

  tapply(data$what, data$by, unique)
  tapply(data$what, data$by, summary)
  tapply(data$what, data$by, range)
  tapply(data$what, data$by, mean)
  tapply(data$what, data$by, sd)

Brief info about vectors and matrixes::

  summary(1:8)
  summary(matrix(1:20, 4, 5))

Simple plots::

  i<-1:100
  x<-i/10
  y<-x^2
  plot(x,y)

  hist(rpois(100,10))
  hist(rpois(100,10),breaks=20)

Renaming columns
================
::

  names(d)[names(d)=="beta"] <- "two"
  names(d)[2] <- "two"

  library(plyr)
  newd <- rename(d, c("beta"="two", "gamma"="three"))

Removing names for raws and columns
===================================
::

  rownames(dt) <- NULL
  colnames(dt) <- NULL

Filtering raws and columns
==========================
::

  TODO

Droping raws and columns
========================

Drop column from data frame by number::

  dfnew <- df[-1]         # first
  dfnew <- df[-ncol(df)]  # last
  dfnew <- df[-c(1, 3:4, 7)]  # range

Drop column from data frame by name::

  newdf <- df[ , !(names(df) %in% c("lat", "long"))]

  df <- data.frame( a = 1:10, b = 2:11, c = 3:12 )
  df <- subset(df, select = c(a,c))
  df <- subset(df, select = -c(a,c))