Monday, 19 October 2015

SPEEEEEDY......R...

So since the other day when I fiddled around with sens.slope and got a significant improvement just by pre-allocating the memory,  I wanted to see if I could make any further enhancements. I mean, it got faster, but it was still way slow (for n=1000, time~8 seconds, using the improved method). I wanted it to be instant. I wanted to be able to use it for more than n=1000, I wanted it for n=10000. So at first, my instincts as an old java programmer were to first try and flatten the nested loop

  for (i in 1:(n - 1)) {
     for (j in (i + 1):n) {
       k <- k + 1
       d[k] <- (x[j] - x[i])/(j - i)
     }
   }

With some tinkering around with counters and counter counters, I was able to flatten it down into one loop. However, the improvement from this was trivial at best (in some cases, there was no improvement at all).

Then I started to remember that I had to speak R with less of an accent. R has it's own thing it's good at- vectorisation.

How could I vectorise this loop? Some combination of seq and rep?
I eventually came to this:

  #first, calculate the size of the final vector:
  finalk<- 0.5*n^2-0.5*n

  #pre-allocate all vector sizes
  d<-vector()
  length(d) <- finalk
  
  ivals<-vector()
  length(ivals)<-finalk
  
 #calculate ivals with lapply/seq/rep/sapply: 
  reps = lapply(n:2, seq, from=2)
  ivals<-rep(seq_along(reps), sapply(reps, length))
  
  #calculate jvals with lapply
  jvals<-vector()
  length(jvals)<-finalk
  jvals<-unlist(lapply(2:n,function(x) x:n))
  
  #calculate d using a native R vectorised way:
  d <- (x[jvals]-x[ivals])/(jvals-ivals)

Ok, so how well does it do? Well, let's look at microbenchmark results for a random data set of n=300
>random_ts_data300 <- ts(diffinv(rnorm(300)),frequency = 60)
> microbenchmark(sens.slope.faster(random_ts_data300), sens.slope.fastest(random_ts_data300))
Unit: milliseconds
expr min lq mean median uq max neval cld
sens.slope.faster(random_ts_data300) 966.6235 991.2666 991.2666 1022.864 1044.4 1229.881 100 b
sens.slope.fastest(random_ts_data300) 19.07987 21.37374 21.37374 30.60782 26.89354 126.9601 100 a

Hint: the vectorised method is "fastest"!. 
Wow what an improvement? 

There was one other improvement that I made that was a bit of a bottleneck, the original function called the internal mk.test of the trend package to get the varS. This method is great, but it is written in R. There is a much faster method written in Fortran in the Kendall package to get varS. 

So the final, improved function becomes:

require(Kendall)
sens.slope.fastest<-function (x, level = 0.95)
{
  na.fail(x)
  #the MannKendall test in the Kendall package is written in Fortran. 
  #It is very quick.
  varS<- MannKendall(x)$varS[1]
  n <- length(x)
    
  retval <- list(b.sen = NULL, b.sen.up = NULL, b.sen.lo = NULL,
                 intercept = NULL, nobs = n, method = "SSLP", D = NULL,
                 conf.int = level * 100, varS = NULL)

  #the length of the d variable is easily calculated using 
  #a variation on the triangular numbers formula
  finalk<- 0.5*n^2-0.5*n

  #pre-allocate the size of each vector
  d<-vector()
  length(d) <- finalk  
  
  ivals<-vector()
  length(ivals)<-finalk  
  
  reps = lapply(n:2, seq, from=2)
  ivals<-rep(seq_along(reps), sapply(reps, length))  
  
  jvals<-vector()
  length(jvals)<-finalk
  jvals<-unlist(lapply(2:n,function(x) x:n))
  
  #now d is easily calculated in the vectorised form:
  d <- (x[jvals]-x[ivals])/(jvals-ivals)
  
 #I have replaced () with {} where possible, to reduce execution time  
  b.sen <- median(d, na.rm = TRUE)
  C <- qnorm(1 - {1 - level}/2) * sqrt(varS)
  rank.up <- round({finalk + C}/2 + 1)
  rank.lo <- round({finalk - C}/2)
  rank.d <- sort(d)
  retval$b.sen.lo <- rank.d[rank.lo]
  retval$b.sen.up <- rank.d[rank.up]
  intercepts <- x - b.sen * (1:n)
  retval$intercept <- median(intercepts)
  retval$b.sen <- b.sen
  retval$D <- d
  retval$varS <- varS
  class(retval) <- "trend.test"
  return(retval)
}

and the final benchmark results:
Unit: milliseconds
expr min lq mean median uq max neval cld
sens.slope.faster(random_ts_data300) 987.3958 1017.521 1053.604 1031.085 1070.765 1486.412 100 b
sens.slope.fastest(random_ts_data300) 19.82347 22.43609 33.55085 23.90496 26.1712 97.73917 100 a
sens.slope(random_ts_data300) 7263.537 7609.489 7817.028 7809.984 7974.151 9411.681 100 c
Clearly, the original is the least best performer, and the "fastest" version is the best performer, by a long margin.

Cheers for now
N

Friday, 16 October 2015

HardR.BettR.FastR.StrongR.

Hi Folks

This is just to document a quick and easy method to significantly improve the speed and scalability of your R code.

Let's look at the sen.slope function in the trend package.

> sens.slope
function (x, level = 0.95) 
{
    na.fail(x)
    res.mk <- mk.test(x)
    varS <- res.mk$Varianz
    n <- length(x)
    d <- NULL
    retval <- list(b.sen = NULL, b.sen.up = NULL, b.sen.lo = NULL, 
        intercept = NULL, nobs = n, method = "SSLP", D = NULL, 
        conf.int = level * 100, varS = NULL)
    k <- 0
    for (i in 1:(n - 1)) {
        for (j in (i + 1):n) {
            k <- k + 1
            d[k] <- (x[j] - x[i])/(j - i)
        }
    }
    b.sen <- median(d, na.rm = TRUE)
    C <- qnorm(1 - (1 - level)/2) * sqrt(varS)
    rank.up <- round((k + C)/2 + 1)
    rank.lo <- round((k - C)/2)
    rank.d <- sort(d)
    retval$b.sen.lo <- rank.d[rank.lo]
    retval$b.sen.up <- rank.d[rank.up]
    intercepts <- x - b.sen * (1:n)
    retval$intercept <- median(intercepts)
    retval$b.sen <- b.sen
    retval$D <- d
    retval$varS <- varS
    class(retval) <- "trend.test"
    return(retval)
}
<environment: namespace:trend>

Great function, however, doesn't perform so well when n>900. In fact, I've never actually let it run long enough to finish on an n=900 dataset. Obviously this is not OK for large datasets.

My first thought to improve things was to try and flatten that nasty O(n^2) nested loop (where d is calculated).

The first step to doing that would be to try and calculate the final value of k (so that we know at least how big d is)

So let's try and get all the expected values of k, for each value of n, or at least, a reasonable number of n (warning: O(n^3) complexity. This will take a while!)

kvals<-NULL
for(n in 2:1000) {
 k<-0
 for (i in 1:(n - 1)) {
   for (j in (i + 1):n) {
         k <- k + 1    
     }
  }
 kvals[n]=k
}

Great. Now the n values:
nvals<-1:1000
nvals[1]<-NA

nvals<-na.omit(nvals)
kvals<-na.omit(kvals)

And now we can start fitting some models

model<-lm(kvals~poly(nvals, 2, raw = T))
model
Call:
lm(formula = kvals ~ poly(nvals, 2, raw = T))

Coefficients:
             (Intercept)  
              -9.712e-11  
poly(nvals, 2, raw = T)1  
              -5.000e-01  
poly(nvals, 2, raw = T)2  
               5.000e-01 

Why did I chose poly? The data seemed to be roughly poly. Why second order? Because I am hesitant to go beyond second order for fear of lack of meaning.

So how did it perform?
Readers will be familiar with the SSE formula used around here of sum(residuals^2). Well, that's exactly what I did. And isn't R great at it!

SSE<-sum(model$residuals^2)
SST<-sum((mean(kvals)-kvals)^2)
R2<-1-(SSE/SST)
>R2
[1] 1

Looks like it performs great.
But, just to confirm, let's do the traditional method of using summary:
 summary(model)

Call:
lm(formula = kvals ~ poly(nvals, 2, raw = T))

Residuals:
       Min         1Q     Median         3Q 
-6.973e-09 -7.500e-12 -5.000e-13  4.500e-12 
       Max 
 8.577e-09 

Coefficients:
                           Estimate Std. Error
(Intercept)              -9.712e-11  3.347e-11
poly(nvals, 2, raw = T)1 -5.000e-01  1.542e-13
poly(nvals, 2, raw = T)2  5.000e-01  1.491e-16
                            t value Pr(>|t|)
(Intercept)              -2.902e+00  0.00379
poly(nvals, 2, raw = T)1 -3.242e+12  < 2e-16
poly(nvals, 2, raw = T)2  3.354e+15  < 2e-16
                            
(Intercept)              ** 
poly(nvals, 2, raw = T)1 ***
poly(nvals, 2, raw = T)2 ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.505e-10 on 996 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 9.033e+31 on 2 and 996 DF,  p-value: < 2.2e-16

#which gives us the formula:
final_k <- 0.5*n^2-0.5*n-9.712e-11

/EDIT: It has been pointed out to me that this is basically just the triangular numbers formula. I didn't see it before, but now it is obvious, so you can simply derive the nth value of k by 0.5*n^2-0.5*n. 

Great. We can now predict what the final value of k will be given a value of n. So what, I hear you say. Well like I said, at first I was trying to flatten the nested loop. However, this was going to take longer than a few hours, so CBFd. However, I did notice that the original function was doing another serious R no-no: not pre-allocating vector size for d! Given that d grows quadratically with n (as can be seen by the model), this is going to be a big problem. However, we now know what size to set d to, prior to running the loop. 

#simply replacing 
d<-NULL 
#---with---
d<-vector()
length(d)<-0.5*n^2-0.5*n-9.712e-11

and hey presto, we have an exponential decrease in execution times.
library(microbenchmark)
data(nottem)
#where sens.slope is the original unmodified code, 
#and sens.slope.faster is using the pre-allocated vecdtor size for d

> compare<-microbenchmark(sens.slope(nottem), sens.slope.faster(nottem), times = 10)
> compare
Unit: milliseconds
                      expr       min        lq      mean    median        uq       max neval cld
        sens.slope(nottem) 1701.6792 1738.6495 1776.7413 1751.8925 1800.3433 1926.2313    10   b
 sens.slope.faster(nottem)  633.2217  642.8728  665.1544  654.5556  692.8441  710.0819    10  a 

Ok so pretty big improvement over the nottem data (median of 1751 v 654 ns) , how about a bigger data set?

random_ts_data600 <- ts(diffinv(rnorm(600)),frequency = 60)
> compare<-microbenchmark(sens.slope(random_ts_data600), sens.slope.faster(random_ts_data600), times = 10)
> compare
Unit: seconds
                                 expr       min        lq     mean    median
        sens.slope(random_ts_data600) 74.050463 77.946799 90.54006 80.181241
 sens.slope.faster(random_ts_data600)  3.923095  3.956954  4.26138  4.005052
        uq        max neval cld
 82.963701 161.454800    10   b
  4.458716   5.198488    10  a 

so thats 80 seconds down to 4. Yep, i'd say that's a win!

I've submitted this improvement to the trend package maintainer as a suggestion.

Cheers for now 
N

Friday, 2 October 2015

Custom functions and trend analysis in R

Hi Folks

So today, I'm going to create a custom R function, based upon an existing S3 class function in the Kendall package, maintained by A.I. McLeod. The basis for the regional Kendall function, is that it can adjust for any slight variances in trends at sample points within a geographic region, and detect if there is a significant overall trend or not, using identical methodology to the seasonal Kendall test for monotonic trend.

The original Seasonal Mann-Kendall function within the Kendall package, uses the time-series object. This isn't really conducive to what we want to do with the regional Kendall test, so that had to change. Actually, there are other problems with using the base time-series object, in that the test would not work if there were any missing data points, or if the frequency of the sampling was variable, which is often the case with messy, environmental data. To combat that, I did develop an xts (eXtensible Time Series) version of both the Kendall and Sen slopes (from the Trend package. maintained by Thorsten Pohlert ). 

For example modifying the original code for the SeasonalMannKendall function in the Kendall package to accept xts data instead of base time-series (i.e. ts),

#Custom SeasonalMannKendall adaptation to enable use of xts and irregular time series. 
SeasonalMannKendall.xts<-function (x, FUN=mean) 
{
     if(!is.xts(x))
     {
       if(is.ts(x)) {
         return(SeasonalMannKendall(x))
       } else {
        stop("error: input must be an xts or ts object")
       }
     }
    Score <- 0
    Denom <- 0
    varScore <- 0
    for (i in 0:11) {
    if(length(apply.monthly(x,FUN)[.indexmon(apply.monthly(x,FUN))==i])>=3){
        ans <- MannKendall(apply.monthly(x,FUN)[.indexmon(apply.monthly(x,FUN))==i])
        Score <- Score + ans$S
        Denom <- Denom + ans$D
        varScore <- varScore + ans$varS
      }
    }
    sl <- 2 * (1 - pnorm(abs(Score/sqrt(varScore))))
    ans <- list(tau = Score/Denom, sl = sl, S = Score, D = Denom, 
                varS = varScore)
    oldClass(ans) <- "Kendall"
    ans
}
The above function isn't ideal, as I realised once Thorsten pointed out when I submitted a similar function for the Sen slope estimate in the Trend package:
#Custom sea.sen.slope adaptation to enable use of xts and irregular time series. 
sea.sen.slope.xts<-function (x, FUN=mean) 
{
    na.fail(x)
    if (!is.xts(x)) {
        stop("error: input must be xts object")
    }
  
    #subset to find out how many seasons
    monthly.data<-apply.monthly(x, FUN)
    countSeasons <- 0
    for (i in 0:11) {
       l<-length(monthly.data[.indexmon(monthly.data)==0])
       if(l>0) 
         countSeasons<-countSeasons+1
    }
    
    if (countSeasons < 2) {
        stop("error: minimum of two seasons required.")
    }
    n <- length(monthly.data)
    retval <- list(b.sen = NULL, intercept = NULL, nobs = n, 
        method = "SeaSLP")
    y <- NULL
    d <- NULL
    varS <- 0
    for (i in 0:11) {
        y <- monthly.data[.indexmon(monthly.data)==i]
        res <- sens.slope(ts(y))
        D.m <- res$D
        d <- c(d, D.m)
        varS <- varS + varS
    }
    b.sen <- median(d, na.rm = TRUE)
    intercepts <- as.vector(monthly.data) - b.sen * (1:n)
    retval$intercept <- median(intercepts)
    retval$b.sen <- b.sen
    class(retval) <- "trend.test"
    return(retval)
}

Thorsten's comments were as follows:-

  1. The sea.sens.slope.xts should only calculate seasonal sens slope and intercept and should not aggregate data (neither median not mean). The pre-processing of data should be up to the user. Otherwise two functions(or two functionabilities) would be included into one function.
  1. The current function assumes monthly data (i.e. frequency of 12).What about quarterly or half-yearly data. Can you find a way to generalize the function to any seasonal data with xts-data format? 


I wrote the following function, addressing Thornton's concerns for the sen slope estimate:-

sea.sens.slope.xts<-function (x, guess_scale=T, scale='monthly')
{
  na.fail(x)
  if (!is.xts(x)) {
    stop("error: input must be xts object")
  }

  if(guess_scale) {
    #get periodicity of object
    p<-periodicity(x)
    pscale<-p$scale
  } else {
    pscale<-scale
  }

  #size of dataset
  n<- length(x)

  if (n < 2) {
    stop("error: minimum of two seasons required.")
  }

  retval <- list(b.sen = NULL, intercept = NULL, nobs = n,
                 method = "SeaSLP")
  y <- NULL
  d <- NULL
  varS <- 0

  #possible scale values are: 'minute','hourly', 'daily','weekly', 'monthly','quarterly', and 'yearly'.
  if(pscale=='minute') {
    for (i in .indexmin(x)) {
      y <- x[.indexmin(x)==i]
      res <- sens.slope(ts(y))
      D.m <- res$D
      d <- c(d, D.m)
      varS <- varS + varS
    }
  } else if(pscale=='hourly') {
    for (i in 0:23) {
      y <- x[.indexhour(x)==i]
      res <- sens.slope(ts(y))
      D.m <- res$D
      d <- c(d, D.m)
      varS <- varS + varS
    }
  } else if(pscale=='daily') {
    #get day names and day index mapping
    dndi_map<-xts(as.numeric(strftime(as.Date(index(x)), format="%j")), index(x))

    #merge the map into the data
    x<-merge(dndi_map,x)

    for (i in unique(weekdays(index(x)))) {
      y <- x[.indexday(x)==i]
      res <- sens.slope(ts(y))
      D.m <- res$D
      d <- c(d, D.m)
      varS <- varS + varS
    }
  } else if(pscale=='weekly') {
    #get week numbers and week index mapping
    wnwi_map<-xts(week(index(x)), index(x))

    #merge the map into the data
    x<-merge(wnwi_map,x)

    for (i in 1:52) {
      y<-x[x$wnwi_map==i]$x
      res <- sens.slope(ts(y))
      D.m <- res$D
      d <- c(d, D.m)
      varS <- varS + varS
    }
  } else if(pscale=='monthly') {
    for (i in 0:11) {
      y <- x[.indexmon(x)==i]
      res <- sens.slope(ts(y))
      D.m <- res$D
      d <- c(d, D.m)
      varS <- varS + varS
    }
  } else if(pscale=='quarterly') {
    #get quarter numbers and quarter index mapping
    qxstr<-quarters(index(x))
    qnmi_map<-xts(ifelse(qxstr=="Q3",3,ifelse(qxstr=="Q2",2,ifelse(qxstr=="Q1",1,ifelse(qxstr=="Q4",4,NA)))), index(x))

    #merge the map into the data
    x<-merge(qnmi_map,x)

    for (i in 1:4) {
      y<-x[x$qnmi_map==i]$x
      res <- sens.slope(ts(y))
      D.m <- res$D
      d <- c(d, D.m)
      varS <- varS + varS
    }
  } else if(pscale=='calendar-seasons') {
    #get the season numbers and indexes
    snsi_map<-xts(ifelse( month(index(x)) %in% c(12, 1, 2), 2,
                          ifelse( month(index(x)) %in% c(3, 4, 5), 3,
                                  ifelse( month(index(x)) %in% c(6, 7, 8), 4,
                                          ifelse( month(index(x)) %in% c(9, 10, 11), 1,NA)))), index(x))

    #merge the map into the data
    x<-merge(snsi_map,x)

    for (i in 1:4) {
      y<-x[x$snsi_map==i]$x
      res <- sens.slope(ts(y))
      D.m <- res$D
      d <- c(d, D.m)
      varS <- varS + varS
    }

  }else if(pscale=='yearly') {
    for (i in c(min(.indexyear(x)):max(.indexyear(x)))) {
      y <- x[.indexyear(x)==i]
      res <- sens.slope(ts(y))
      D.m <- res$D
      d <- c(d, D.m)
      varS <- varS + varS
    }
  }

  b.sen <- median(d, na.rm = TRUE)
  intercepts <- as.vector(x) - b.sen * (1:n)
  retval$intercept <- median(intercepts)
  retval$b.sen <- b.sen
  class(retval) <- "trend.test"
  return(retval)
}

With that in mind, I set about implementing the Regional Mann Kendall function, as described in  this presentation by the USGS.

This function is a trivial modification of the original Seasonal MannKendall test, basic usage is x as a dataframe, RegionVariable is the vector in the frame that contains the regions for each data point, the data variable is the vector in the frame that contains the data for each point, and Regions is a unique list of regions. I suppose I could eradicate that, since it could easily be elucidated by simply calling
unique on the RegionVariable. But I will leave that as a homework exercise.

RegionalMannKendall<-function (x,RegionVariable, DataVariable, Regions)
{
  Score <- 0
  Denom <- 0
  varScore <- 0
  numberOfRegions<-length(Regions)
  RegionColNumber<- which(colnames(x)==RegionVariable)
  DataColNumnber<- which(colnames(x) == DataVariable)

  for (i in 1:numberOfRegions) {
      ss<- x[which(x[,RegionColNumber]==Regions[i]),]
       ans <- MannKendall(ss[,DataColNumnber])
      Score <- Score + ans$S
      Denom <- Denom + ans$D
      varScore <- varScore + ans$varS
  }
  sl <- 2 * (1 - pnorm(abs(Score/sqrt(varScore))))
  ans <- list(tau = Score/Denom, sl = sl, S = Score, D = Denom,
             varS = varScore)
  oldClass(ans) <- "Kendall"
  ans
}
#example usage########################
###random data:
regionaldata<-data.frame(Pollution=rnorm(1:25), Regions=c(rep("R1",5), rep("R2",5), rep("R3",5), rep("R4",5), rep("R5",5)))
RegionalMannKendall(regionaldata, "Regions", "Pollution",unique(regionaldata$Regions))

Sunday, 6 September 2015

Feed Forward Artificial Neural Networks. In Excel.

TLDR? I made a pre-made Excel-FFANN a few years ago that all you have to do is load in the training data: http://www.excelville.com/file/244/FORTUNETXL

Hi Folks

I've been tinkering on and off with feed forward artificial neural networks (FFANNs) for a few years now, and the first time I made a working multi-neuron multi layer network was in Excel. I find Excel is great for trying to visualise algorithms (thus the silly attempt at a Hash Table in Excel I did a while back). 

I am going to treat this like it's the first time you have ever heard of neural networks, and really focus in on some of the minute detail. Please do not be offended if you already understand the concepts. 

This really is more of a story of my own personal discovery of neural networks a few years ago, and so it should relate fairly easily to anyone with a similar level of intrigue (and some maths and excel skills will help, for example, you should probably be able to understand and replicate the outcomes for the Multiple Linear Regressions post, before kicking-on with this one.)

There are six basic steps in order to create a functioning FFANN
  1. Specify a neuron
  2. Specify an activation function
  3. Specify a structure
  4. Scale the data
  5. Develop a training technique
  6. Training, Testing and Validation

Step #1: Specify a Neuron

Artificial neurons can come in a lot of different flavours. Today, we are going to specify the most basic of basic. The neuron we use will be able to use a continuous input and produce a continuous output (that is, it is dealing within the realm of numbers, like 1 to 10000, rather than categories, like spam or not-spam).

The anatomy of an artificial neuron

Step #2: Specify an activation function

The activation function is really what the neuron is. We have to combine the input with a weight and produce an output (the weights of the neurons become the "brain" of the network). In order to combine the inputs and the weights, we have to "squash" the result down. The technical term for this is the sigmoid function. It is often most easily described by the special case of the logistic function: 

You can use any squashing function you want, another common one is the hyperbolic tangent function (which rescales the above special case of the logistic function, which can be useful in certain situations).

The idea behind the sigmoid function is to produce outputs on a curve that is continuous (it is actually a bit S shaped)
The tanh function returns outputs between -1 and 1, whilst the logistic function returns outputs between 0 and 1.

Step #3: Specify a Structure

The structure of our network is really simple. The fact that we are building a feed forward network makes it simple. The name says it all. We set up a few neurons to feed the independent variables into, and their output feeds into a hidden layer of two neurons, whose output is then combined in the output layer and tested against the actual dependent variable data to determine the error. This is called "the first pass". If we were focusing on the jargon, the three neurons taking the input are called the "input layer", the next two are in the hidden layer and the last is, predictably, called the "output layer".  You could have multiple neurons in the output layer, if you wanted to have more than one Yhat to get simultaneously


The layout of the basic network. Weights are changed according to the results of the error function (not shown).

Step #4: Scale the data

This is very important, you must scale the data so that it lies in the same range as whatever your activation function is able to produce. In our case, we want 0 to 1, so simply dividing by the maximum of the dataset (including the dependant variable) will give an adequate scaling. 

Step #5: Develop a training technique

Most of the literature around on these feed forward artificial neural networks focuses on one training technique in particular: back-propagation. I however, am not. I am going to focus on a more reliable technique, less prone to localised maxima and minima than back-prop. It is a very simple evolutionary technique, relying on randomness. 

One technique I use, is to pass the data through the network, calculate the error of yhat (by looking at the sum squares of residuals (SSerr), - NB this is very different to the calculation of error used in backprop), keeping track of each weight, and just randomly changing each weight one by one and testing the SSerr each time to see if there is an improvement. I've found the more Gaussian the random numbers used, the faster this method works. 

This method, however, does require some programming to implement (simple VBA script to loop through each weight cell, see what the current value of SSerr is, store it, randomly change the current weight cell, look at SSerr now, see if it is < previous SSerr).

Something along the lines of the following will do it:

Sub evolve()
Dim PREVSSER As Double
Dim PREVW As Double

    For Each c In Sheets("sheet1").Range("B2:G4")
        PREVSSER = Sheets("SHEET1").Range("K1").Value
        PREVW = c.Value
        
        c.Value = Application.WorksheetFunction.Norm_S_Inv(Rnd()) + PREVW
        
        If PREVSSER < Sheets("SHEET1").Range("K1").Value Then
            c.Value = PREVW
        End If
    Next c 
End Sub


Where the weights are in sheet1 in the range b2:g4 (*NB* Technically the only w3 values used are in E and F for the n4 and n5 neurons in the hidden layer, so there could be some more optimisation to improve efficiency, but I haven't bothered including it here), and the SSerr is calculate in cell K1.

Setting up as per the following:


The formulas for SSE, SSTOT and RSqrd are as per what is described in the Multiple Linear Regressions post.

Step #6: Training, Testing and Validation.

The first step if we were presented with actual data would be to randomly split the dataset into 3 frames -  training (15%), validation (15%) and testing (70%). 

In our case, since I am generating the data, I will leave this as a homework exercise for the reader (If I really wanted to, I could just generate some validation and testing data using the same formula used to generate the training data). The generated training dataset I have used is:
A B Y
        68.00               96.00             356.00
        46.00               20.00             106.00
        50.00               34.00             152.00
        86.00               40.00             206.00
        86.00               91.00             359.00
        71.00               42.00             197.00
        62.00               16.00             110.00
        83.00               53.00             242.00
          1.00               60.00             181.00
        64.00               57.00             235.00
        12.00               97.00             303.00
        24.00               85.00             279.00
          8.00               73.00             227.00
          3.00               23.00               72.00
        52.00               20.00             112.00
        42.00                  7.00               63.00
        18.00                  3.00               27.00
        67.00               96.00             355.00
        64.00               55.00             229.00
        43.00               79.00             280.00
        84.00               48.00             228.00
        23.00               18.00               77.00
As you can see, the data is clearly of the form A + 3B = Y. We will generate some more random data after training to validate that the training has taken.

But for now, lets perform a training run. Simply run the "evolve" macro. If you haven't already, hit Alt+F11 to open up the VBA editor and insert a new module.

Copy and paste the function above. Place the cursor over the top of the first line of the sub (or select all of the lines of code in the sub) and press F5. 

This will have altered the weights, though, not improved the SSerr much (maybe a tiny bit). Hit f5 again (again you will see a slight change). Now hold down f5 for a while and see what the results are (in my case, I pressed f5 around 70 times and got an r2 of 0.95). 

In fact, a little modification of the original function yields some pretty interesting results:
Sub evolve2()
Dim PREVSSER, RSQRD As Double
Dim PREVW As Double
Dim COUNT As Long

RSQRD = Sheets("SHEET1").Range("O1").Value
Do While RSQRD < 0.95
COUNT = COUNT + 1
RSQRD = Sheets("SHEET1").Range("O1").Value
    For Each c In Sheets("sheet1").Range("B2:G4")
        PREVSSER = Sheets("SHEET1").Range("K1").Value
        PREVW = c.Value
        
        c.Value = Application.WorksheetFunction.Norm_S_Inv(Rnd()) + PREVW

        
        If PREVSSER < Sheets("SHEET1").Range("K1").Value Then
            c.Value = PREVW
        End If
    Next c
Loop
Debug.Print COUNT

End Sub

The above changes assume you have set up the Rsqrd calculation in O1, of course. The "count" at the end of this seems to range between 40 and 60 to get to an Rsqrd of >0.95. 

The following charts document one training run:
COUNT=0 (weights zeroed)

COUNT=14
COUNT=17
COUNT = 20

COUNT=66

Because I'm a nice guy, you can download the workbook I used to create this tutorial for free.
The password you're going to need to unlock that archive is "statastizard" (quotes removed).

Well that's about all I've got time for now. Talk to you all next time.

Cheers
N

PS:
Don't forget you can catch me on Linked In as well.