Sunday, January 13, 2013

Introduction to R – Functions

Functions

Functions are just like what you remember from math class. Most functions are in the following form: f(argument1, argument2, ...) Where f is the name of the function, and argument1, argument2, . . . are the arguments
to the function. Here are a few examples of built-in functions:

RGui (64-bit)_2013-01-08_16-25-25_thumb[1]
Note in the last example that if you give the argument in the default order, you can omit the names. Some built-in functions have operator form like the following examples:
RGui (64-bit)_2013-01-08_16-29-23_thumb[3]
A function in R is just another object that is assigned to a symbol. You can define your own functions in R, assign them a name, and then call them just like the built-in functions. Writing your function code on the R Console is hard, so R provided a simple text editor for that. To write your code go to File >> New Script. This will open the R Editor, enter the following code
RGui (64-bit)_2013-01-10_09-05-56

You could select the code, copy and paste it in R console. Now you can call the function to get its result (note that entering the function name and hitting enter retrieves the function code. This is a useful trick to view function code before using it).
RGui (64-bit)_2013-01-10_10-29-47
Now lets go back to the editor and save the code in our working directory in MyCode.R (.R is the extension for R code files). To load the code in any R code file inside the console for use, use source() and pass the file name for it. Then you can use the code inside that file in the console. Each time you edit that code file, you have to call source() again to load the latest code.
RGui (64-bit)_2013-01-10_10-40-49

Arguments

A function definition in R includes the arguments’ names (in the previous example we didn’t use any arguments).
RGui (64-bit)_2013-01-10_11-08-31
Optionally, you can include default values for arguments. If you specify a default value for an argument, it will be considered optional (can be omitted from the function call). If you provided a value for an argument with default value, your value will override the default one. Non-optional parameters have to be provided in the function call.
RGui (64-bit)_2013-01-10_11-26-13

If you want to specify a variable-length argument list, specify (…) in the arguments to the function. Everything other than the named arguments, will be stored in the ellipsis … .To can then convert the ellipsis to a list to work with it.
RGui (64-bit)_2013-01-10_12-31-13
You can also refer directly to items within the ellipsis using the variables ..1 for the first item, ..2 for second and so on to ..9. Any argument that appear after the ellipsis in the function call, have to be named explicitly.
You can get the set of arguments accepted by a function, use the args function. NULL represents the function body.
RGui (64-bit)_2013-01-10_14-30-46
You can pass named arguments any ware in the function call by their name. Unnamed arguments have to match the order that they are listed in the function definition. The following lm() function calls are equivalent :
lm(data = mydata, y ~ x, model = FALSE, 1:100)
lm(y ~ x, mydata, 1:100, model = FALSE)
Named argument are helpful if you have a long argument list which you remember it by arguments’ names, not the order.

Lazy Evaluation

Arguments to functions are evaluated lazily, so they are evaluated only as needed. The function below never uses the argument b, so calling f(2) will not produce an error because the 2 gets positionally matched to a (the only variable needed).
RGui (64-bit)_2013-01-15_17-07-50
even if you will use a missing argument, R will not give an error until the first use of this missing argument. Everything before that will execute normally.
RGui (64-bit)_2013-01-16_07-49-45

Return Values

You can use the return function to specify the value to be returned by the function. Also R will return the last evaluated expression as the result of the function if no return() is found.
RGui (64-bit)_2013-01-10_12-52-41

Functions as Arguments

Many functions in R can take other functions as arguments. An example of these functions, the sapply function iterates through each element in a vector, applying another function to each element in the vector and returning the results.
RGui (64-bit)_2013-01-10_13-07-51

Anonymous Functions

You create functions that do not have names. These are called anonymous functions. Anonymous functions are usually passed as arguments to other functions.
RGui (64-bit)_2013-01-10_13-32-56
the R interpreter assigns the anonymous function functions(x) {x * 7} to the argument f of function apply.to.three then assigns 3 to the argument x of the anonymous function. So, it will ends up by evaluating 3 * 7 and returns the result.
anonymous functions can also be used with sapply()
RGui (64-bit)_2013-01-10_13-43-09
it is possible also to define an anonymous function and apply it directly to an argument.
RGui (64-bit)_2013-01-10_13-45-53

Scoping rules

How does R know which value to assign to which symbol ? How does R know what value to assign to the symbol lm ? Why doesn’t it give it the value of lm that is in the stats package ?
RGui (64-bit)_2013-01-16_09-13-32
When R tries to bind a value to a symbol, it searches through a series of environments (sets of symbols, objects,…) to find the appropriate value. When you are working on the command line and need to retrieve the value of an R object, the search begins with the global environment you working in it and look for a symbol name matching the one requested. If not found, R starts searching the namespaces of each of the packages on the search list. You can get the search list using search() function. .GlobalEnv represents your current working environment on the R command line, and its always the first element of the search list. The base package is always the last one. The order on the list matters
RGui (64-bit)_2013-01-16_09-25-18
If you loaded a package with library the namespace of that package will be in the 2nd position of the search list, and everything else will be shifted down the list.
RGui (64-bit)_2013-01-16_09-34-34
You can also load package on the command line window by going to Packages >> Load package >> then select the desired package and click Ok.
You can configure which packages to be loaded automatically on startup to be available for you. To do that open C:\Program Files\R\<Your-R-Version>\etc\Rprofile.site using Notepad and append the following to the bottom of the file. You can append whatever package you want to the vestor c and it will be loaded for your on startup.
local({
old <- getOption("defaultPackages")
options(defaultPackages = c(old, "car", "RODBC", "foreign", "DAAG", "MASS",

"lattice ", "latticedl", "sciplot", "tree", "lme4"))
})

Lexical Scoping Rules (or Static Scoping Rules)  determines how a value is associated with a free variable in a function. The values of free variables are searched for in the environments in which the function was defined.
So what is a a free variable ? a free variable is not a formal argument (arguments declared in function signature) nor a local variable that is declared and assigned in the function body. In the following example, x and y are formal arguments. z is a free variable. f <- function(x, y) { x^2 + y / z }
So what is an environment ? an environment is a collection of (symbol, value) pairs. Every environment has a parent environment, and it is possible for an environment to have multiple children. A function + an environment = a closure or function closure.
So, searching for the value for a free variable starts in the environment in which the function was defined, if not found, the search continued to the parent environment. The search continues until we hit the top-level environment ( workspace or the namespace of the package). After that the search continues down the search list until we hit the empty environment. If not found, an error is thrown.
You can get the environment of a function using environment() (for functions coded on the command line, that will be the global environment). You can get the parent of an environment using parent.env() (for functions coded on command line, it will be send item in the search list).
RGui (64-bit)_2013-01-16_10-37-17
Why does knowing lexical scoping rules matters ? Typically, a function is defined in the global environment, so that the values of free variables will be found in the user’s workspace (which is the right approach). However, in R you can define functions inside other functions, in this case the environment in which a function is defined is the body of another function.
In this post we talked about functions and using it weather from the console or from external files, functions as parameters, anonymous functions, and many other low level stuff.
Stay tuned for more R notes.