2013-01-07

Comment: Search and Replace

A post in R bloggers caught my attention this morning. The main idea was that how can you change objects in a string. For example given a basket of fruits we would like to change apples to bananas by using R and the ifelse funtion. There are two main solutions how to change one object into another:

#Given a basket of fruits
basket = c("apple", "banana", "lemon", "orange", "orange", "pear", "cherry")

basket = ifelse(basket == "apple", "banana" , basket)
basket[basket == "lemon"] = "pear"
#The latter is usually preferred, but sometimes the ifelse cannot be avoided


If you want to change multiple objects you would need an equal amount of ifelses. In the original post, here, R de jeu presented his function solution which removes the cumbersome writing. I recommend reading the orginal post if the idea isn't immediately clear.

The function given in the post by R de jeu is the following

decode <- function(x, search, replace, default = NULL) {

# build a nested ifelse function by recursion

 decode.fun <- function(search, replace, default = NULL)
 if (length(search) == 0L) {
  function(x) if (is.null(default)) x else rep(default, length(x))
 } else {
         function(x) ifelse(x == search[1L], replace[1L],
                            decode.fun(tail(search, -1L),
                                tail(replace, -1L),
                                default)(x))
}
      return(decode.fun(search, replace, default)(x))
}


I'm not a big fan of recursive functions as they tend to be the slowest of all possible solutions. So I decided to do my own version of the decode function without recursion.
#Same input as in the decode function
decode2 = function(x, search, replace, default=NULL){
 
 #this is if we want the "default type" output
 if(!is.null(default)){ 
    default.bol = !(x %in% search)
    x[default.bol] = default 
 }

 for(i in seq(search)){
  x[x == search[i]] = replace[i] 
 }

 return(x)
}


A standard for-looping with extra logical operations. Now it's time for some speed tests! Spoilers: my solution was around 20ish times faster, but with small vectors the time difference isn't that important.

#If it's worth doing, it's worth overdoing

picnic.basket <- c("apple", "banana",  "Lightsaber", "pineapple", "strawberry",
"lemon","joy","orange", "genelec speakers", "cat", "pear", "cherry",
"beer", "sonic screwdriver", "cat with a caption", "knife", "cheddar",
"The Book of Proofs", "evil ladies", "cheese", "The R inferno", "smoked reindeer meat",
"salmon", "Tardis", "Book of Death", "Pillows", "Blanket", "Woman")

multi.dimensional.basket = sample(picnic.basket, 1000000, replace=TRUE)
search = sample(picnic.basket, 10)
replace = sample(picnic.basket[!(picnic.basket %in% search)], 10)

#Making sure results match.
sum(decode(multi.dimensional.basket, search, replace) == 
    decode2(multi.dimensional.basket, search, replace))
sum(decode(multi.dimensional.basket, search, replace, "fig") == 
    decode2(multi.dimensional.basket, search, replace, "fig"))

system.time( decode(multi.dimensional.basket, search, replace) )
system.time( decode2(multi.dimensional.basket, search, replace) )

system.time( decode(multi.dimensional.basket, search, replace, "fig") )
system.time( decode2(multi.dimensional.basket, search, replace, "fig") )

No comments:

Post a Comment