Finding the reason for dependency inclusion into a configuration.
.. -*- coding: utf-8 -*-
===
R
===
.. contents::
:local:
Inspecting objects
==================
Info about object dimensions::
length(c(1,2,3))
dim(matrix(1:6, 2, 3))
ncol(matrix(1:6, 2, 3))
nrow(matrix(1:6, 2, 3))
Brief info about any object::
typeof(str)
class(str)
unclass(str)
str(c(1, 2))
str(summary)
Column names of datasets::
names(...)
names(list(colA=1, colB=2))
Column/row names of matrixes::
colnames(matrix(...))
rownames(matrix(...))
List objects in global context: ``ls()``.
Objext size in memory: ``object.site(1:2)``
Interactive session
===================
Controlling output precision::
options(digits=3)
List of all options::
str(options())
Debugging
=========
To mark function for debugging call::
debug(fun, text = "", condition = NULL)
debugonce(fun, text = "", condition = NULL)
To return function to normal execution::
undebug(fun)
isdebugged(fun)
You can under to debug mode in any piece of code by calling ``browser``.
``traceback`` prints out the function call stack after an error occurs; does
nothing if there's no error.
``trace`` allows you to insert debugging code into a function a specific places.
``recover`` allows you to modify the error behavior so that you can browse the
function call stack.
Profiling
=========
How long execution of expression takes (in low sec/milisec resolution)::
system.time(expr, gcFirst = TRUE)
unix.time(expr, gcFirst = TRUE)
``Rprof`` function enable global profiling. ``summaryRprof`` function decrypt
profiling data::
Rprof() ## start profiling
Rprof(NULL) ## suspend profiling
Rprof(append = TRUE) ## resume profiling
Rprof(NULL) ## end profiling
summaryRprof() ## investigate profiling report
Generating random numbers
=========================
For each distribution there are exists corresponding generation function, named
with prefix ``r``::
rnorm(n, mean = 0, sd = 1)
rt(n, df, ncp)
rbinom(n, size, prob)
rpois(n, lambda)
runif(n, min = 0, max = 1)
rexp
rchisq
rgamma
In order to generate predictable sequences use::
set.seed(seed, kind = NULL, normal.kind = NULL)
Sampling from array::
sample(x, size, replace = FALSE, prob = NULL)
sample.int(n, size = n, replace = FALSE, prob = NULL)
sample(1:10, 10) ## permutation!!
sample(1:10, 100, replace=TRUE)
Looping over data
=================
``lapply`` iterate over data and return list with result of function
application::
lapply(1:5, function(x) x^2)
lapply(matrix(rnorm(20*10),20,10), mean)
Usually you don't need a list but a vector. ``sapply`` works like ``lapply`` but
also try to convert result to matrix or vector is dimantions and elvement types
permit this::
lapply(list(1:5), mean)
[[1]]
[1] 3
sapply(list(1:5), mean)
[1] 3
``apply`` works on specific dimension of data so useful to work with matrixes
and data frames::
apply(matrix(1:6, 2, 3), 1, min)
[1] 1 2
apply(matrix(1:6, 2, 3), 2, max)
[1] 2 4 6
apply(array(rnorm(2*2*10), c(2, 2, 10)), c(1, 2), mean)
[,1] [,2]
[1,] -0.2733804 0.3154234
[2,] 0.1830982 -0.5889010
``colSums``, ``rowSums``, ``colMeans``, ``rowMeans`` is defined as optimized
equivalent for::
rowSums = apply(x, 1, sum)
colSums = apply(x, 2, sum)
rowMeans = apply(x, 1, mean)
colMeans = apply(x, 2, mean)
``split`` partitioning data on factor (analog of SQL ``group by``)::
data<-data.frame(rnorm(10),rbinom(10,1,prob=.7))
sdata<-split(data[,1],data[,2])
lapply(sdata,mean)
Exploring data
==============
Check `Inspecting objects`_ section.
Investigating unique values::
sapply(data, unique)
sapply(data$col, unique)
sapply(data[,c("col1","col2")], unique)
sapply(data[,5:10], unique)
table(data$col)
tapply(data$what, data$by, unique)
tapply(data$what, data$by, summary)
tapply(data$what, data$by, range)
tapply(data$what, data$by, mean)
tapply(data$what, data$by, sd)
Brief info about vectors and matrixes::
summary(1:8)
summary(matrix(1:20, 4, 5))
Simple plots::
i<-1:100
x<-i/10
y<-x^2
plot(x,y)
hist(rpois(100,10))
hist(rpois(100,10),breaks=20)
Renaming columns
================
::
names(d)[names(d)=="beta"] <- "two"
names(d)[2] <- "two"
library(plyr)
newd <- rename(d, c("beta"="two", "gamma"="three"))
Removing names for raws and columns
===================================
::
rownames(dt) <- NULL
colnames(dt) <- NULL
Filtering raws and columns
==========================
::
TODO
Droping raws and columns
========================
Drop column from data frame by number::
dfnew <- df[-1] # first
dfnew <- df[-ncol(df)] # last
dfnew <- df[-c(1, 3:4, 7)] # range
Drop column from data frame by name::
newdf <- df[ , !(names(df) %in% c("lat", "long"))]
df <- data.frame( a = 1:10, b = 2:11, c = 3:12 )
df <- subset(df, select = c(a,c))
df <- subset(df, select = -c(a,c))