So since the other day when I fiddled around with sens.slope and got a significant improvement just by pre-allocating the memory, I wanted to see if I could make any further enhancements. I mean, it got faster, but it was still way slow (for n=1000, time~8 seconds, using the improved method). I wanted it to be instant. I wanted to be able to use it for more than n=1000, I wanted it for n=10000. So at first, my instincts as an old java programmer were to first try and flatten the nested loop

for (i in 1:(n - 1)) {

for (j in (i + 1):n) {

k <- k + 1

d[k] <- (x[j] - x[i])/(j - i)

}

}

With some tinkering around with counters and counter counters, I was able to flatten it down into one loop. However, the improvement from this was trivial at best (in some cases, there was no improvement at all).

Then I started to remember that I had to speak R with less of an accent. R has it's own thing it's good at- vectorisation.

How could I vectorise this loop? Some combination of seq and rep?

I eventually came to this:

#first, calculate the size of the final vector:

finalk<- 0.5*n^2-0.5*n

#pre-allocate all vector sizes

d<-vector()

length(d) <- finalk

ivals<-vector()

length(ivals)<-finalk

#calculate ivals with lapply/seq/rep/sapply:

reps = lapply(n:2, seq, from=2)

ivals<-rep(seq_along(reps), sapply(reps, length))

#calculate jvals with lapply

jvals<-vector()

length(jvals)<-finalk

jvals<-unlist(lapply(2:n,function(x) x:n))

#calculate d using a native R vectorised way:

d <- (x[jvals]-x[ivals])/(jvals-ivals)

Ok, so how well does it do? Well, let's look at microbenchmark results for a random data set of n=300

>random_ts_data300 <- ts(diffinv(rnorm(300)),frequency = 60)

> microbenchmark(sens.slope.faster(random_ts_data300), sens.slope.fastest(random_ts_data300))

for (i in 1:(n - 1)) {

for (j in (i + 1):n) {

k <- k + 1

d[k] <- (x[j] - x[i])/(j - i)

}

}

With some tinkering around with counters and counter counters, I was able to flatten it down into one loop. However, the improvement from this was trivial at best (in some cases, there was no improvement at all).

Then I started to remember that I had to speak R with less of an accent. R has it's own thing it's good at- vectorisation.

How could I vectorise this loop? Some combination of seq and rep?

I eventually came to this:

#first, calculate the size of the final vector:

finalk<- 0.5*n^2-0.5*n

#pre-allocate all vector sizes

d<-vector()

length(d) <- finalk

ivals<-vector()

length(ivals)<-finalk

#calculate ivals with lapply/seq/rep/sapply:

reps = lapply(n:2, seq, from=2)

ivals<-rep(seq_along(reps), sapply(reps, length))

#calculate jvals with lapply

jvals<-vector()

length(jvals)<-finalk

jvals<-unlist(lapply(2:n,function(x) x:n))

#calculate d using a native R vectorised way:

d <- (x[jvals]-x[ivals])/(jvals-ivals)

Ok, so how well does it do? Well, let's look at microbenchmark results for a random data set of n=300

>random_ts_data300 <- ts(diffinv(rnorm(300)),frequency = 60)

> microbenchmark(sens.slope.faster(random_ts_data300), sens.slope.fastest(random_ts_data300))

Unit: milliseconds | ||||||||

expr | min | lq | mean | median | uq | max | neval | cld |

sens.slope.faster(random_ts_data300) | 966.6235 | 991.2666 | 991.2666 | 1022.864 | 1044.4 | 1229.881 | 100 | b |

sens.slope.fastest(random_ts_data300) | 19.07987 | 21.37374 | 21.37374 | 30.60782 | 26.89354 | 126.9601 | 100 | a |

Hint: the vectorised method is "fastest"!.

Wow what an improvement?

There was one other improvement that I made that was a bit of a bottleneck, the original function called the internal mk.test of the trend package to get the varS. This method is great, but it is written in R. There is a much faster method written in Fortran in the Kendall package to get varS.

So the final, improved function becomes:

require(Kendall)

sens.slope.fastest<-function (x, level = 0.95)

{

na.fail(x)

#the MannKendall test in the Kendall package is written in Fortran.

#It is very quick.

varS<- MannKendall(x)$varS[1]

n <- length(x)

retval <- list(b.sen = NULL, b.sen.up = NULL, b.sen.lo = NULL,

intercept = NULL, nobs = n, method = "SSLP", D = NULL,

conf.int = level * 100, varS = NULL)

#the length of the d variable is easily calculated using

#a variation on the triangular numbers formula

finalk<- 0.5*n^2-0.5*n

#pre-allocate the size of each vector

d<-vector()

length(d) <- finalk

ivals<-vector()

length(ivals)<-finalk

reps = lapply(n:2, seq, from=2)

ivals<-rep(seq_along(reps), sapply(reps, length))

jvals<-vector()

length(jvals)<-finalk

jvals<-unlist(lapply(2:n,function(x) x:n))

#now d is easily calculated in the vectorised form:

d <- (x[jvals]-x[ivals])/(jvals-ivals)

#I have replaced () with {} where possible, to reduce execution time

b.sen <- median(d, na.rm = TRUE)

C <- qnorm(1 - {1 - level}/2) * sqrt(varS)

rank.up <- round({finalk + C}/2 + 1)

rank.lo <- round({finalk - C}/2)

rank.d <- sort(d)

retval$b.sen.lo <- rank.d[rank.lo]

retval$b.sen.up <- rank.d[rank.up]

intercepts <- x - b.sen * (1:n)

retval$intercept <- median(intercepts)

retval$b.sen <- b.sen

retval$D <- d

retval$varS <- varS

class(retval) <- "trend.test"

return(retval)

}

and the final benchmark results:

Unit: milliseconds | ||||||||

expr | min | lq | mean | median | uq | max | neval | cld |

sens.slope.faster(random_ts_data300) | 987.3958 | 1017.521 | 1053.604 | 1031.085 | 1070.765 | 1486.412 | 100 | b |

sens.slope.fastest(random_ts_data300) | 19.82347 | 22.43609 | 33.55085 | 23.90496 | 26.1712 | 97.73917 | 100 | a |

sens.slope(random_ts_data300) | 7263.537 | 7609.489 | 7817.028 | 7809.984 | 7974.151 | 9411.681 | 100 | c |

Clearly, the original is the least best performer, and the "fastest" version is the best performer, by a long margin.

Cheers for now

N