Sunday, 14 June 2015

Banana Plants Do Not Walk

You can download all skepticism posts from iTunes as a podcast - just search for skeptistizard. 

As you wander into a secluded rainforest you see a root creeping over the abandoned trail that you have happened to stumble upon. You look up and see a banana plant staring down at you, with a maniacal, menacing, apoplectic face malignant with ill content foreordained to bring its' wrath upon you and your soul.

As intimidated as you are by this terrifying sight, your rational mind slowly illuminates that your fear is a fruitless vanity that whilst not easily overcome under heavy emotion, is eventually withered away spectre by imaginary spectre until the only explanation left for the spellbinding curiosity is natural and menial.

Three Steps to Reason

  1. The first thing your mind implores you hold evident is that Plato was wrong. There is no higher reality, only reality. Things can only exist and interact with each other within reality itself. [1][4][9][10][13][15]
  2. People are (generally) excellent at pattern matching Unfortunately, this also means that we see patterns when there really aren't any there at all.[5][6][7]
  3. People are (in general) especially excellent at recognising the features of other people. Again, this means we personify many things that match some of the features of what we have classified as features of people, even if we only managed to hazard a timid bleary eyed glimpse of the subject in question[2]. There is a caveat to this one, whilst generally true, there are some people that actually can't recognise faces at all - they have a disorder called prosopagnosia, and facial recognition can be more difficult in people who are diagnosed with autism spectrum disorders[12][3]. Whilst not proven in any sense (at least to any of which I can easily find), to say people with prosopagnosia or other facial recognition difficulties would not suffer as commonly (or even, at all) from the effects of pareidolia is a limb I am willing to go out upon.[11]

Now that you have gradually illuminated the darkest most tenebrous domains of the irrational mind with some reason, you can see that the face you saw is merely the way in which the light of the sunset was falling on the trunk. You can see that the roots are not moving, but are stationary, as there are no obvious signs of previous movement.

You conclude (correctly, as it is the most simplest explanation) that the plant is growing in such a way so as to encourage the optical illusion of pareidolia, and the accompanying feature of the illusion of apparent movement only adds to the chimerical optical supposition.

This plant, whilst interesting and useful, is not able to wander the rainforest in search of its' vindictive retribution deserved by its' defiling adversaries.

It is: just a plant, that can turn earth air, water and sunshine into glucose and other carbohydrates. The natural explanation isn't actually that menial at all, it's just not that unique.

Thanks for Reading,
 I'm Nick Emblow, and this has been Skeptistizard.


[1] Copleston, Frederick, (1993) A History of Philosophy, vol.1: Greece and Rome, New York, Image Books,

[2] Chambon, Valérian et al. (2007). "Visual Pattern Recognition: What Makes Faces so Special?". In Corrigan, Marsha S. Pattern Recognition in Biology. New York: Nova Science Publishers.

[3] De Renzi E (1986). "Prosopagnosia in two patients with CT scan evidence of damage confined to the right hemisphere". Neuropsychologia 24 (3): 385–9.

[4] Durant, Will, (1991) A Story of Philosophy: The Lives and Opinions of the Greater Philosophers of the Western World, Pocket Books,

[5] Eysenck, Michael W.; Keane, Mark T. (2003). Cognitive Psychology: A Student's Handbook (4th ed.). Hove; Philadelphia; New York: Taylor & Francis.

[6] Krumhansl, Carol L. (1990). Cognitive Foundations of Musical Pitch. Oxford Psychology Series No. 17 (2nd ed.). New York: Oxford University Press (published 2001).

[7] Margolis, Howard (1987). Patterns, Thinking, and Cognition: A Theory of Judgement (3rd ed.). Chicago; London: University of Chicago Press (published 1996).

[8] Radford, B. (2012). Can 'Walking Palm Trees' Really Walk?. [Blog] Live Science. Available at: [Accessed 2 Jun. 2015].

[9] Rogers, Arthur Keyon, (1935) A Student’s History of Philosophy, New York, The Macmillan Company,

[10] Russell, Bertrand (2004) History of Western Philosophy, London, Routledge Classics,

[11] Sagan, Carl (1995). The Demon-Haunted World – Science as a Candle in the Dark. New York: Random House

[12] Schreibman, Laura (1988). Autism. Newbury Park: Sage Publications

[13] Thilly, Frank (1983) A History of Philosophy, New Delhi, SBE Publishers,

[14] Weigelt, Sarah; Koldewyn, Kami; Kanwisher, Nancy (2012). "Face identity recognition in autism spectrum disorders: A review of behavioral studies". Neuroscience & Biobehavioral Reviews 36 (3): 1060–1084

[15] Zeller, Eduard, (1881) A History of Greek Philosophy, London, Longmans, green and Co.

Extended Warranties are Pretty Much Worthless

Extended Warrenties are pretty much worthless. I know this, and I know why, and I still bought one for a recent purchase. Why?

Backup, first let me explain a bit.

All products have an expected lifetime curve, this will usually follow a Bathtub curve. That is to say, for a group of product X, most of the failures will either occur within the first quadrant of its life, or the last. Extended warranties generally only cover the "bottom of the bathtub", which is, as you'd expect a pretty good bet that the edge will be greater than the odds for non-failure (meaning of course, that the company selling you the warranty is almost selling a sure thing - for them) Of course, there will be some times that there are some failures in the bottom of the bathtub, but in general, the person or company selling the product will make a profit (as guaranteed by the Kelly Criterion)

The bathtub curve (courtesy NIST)

As you can plainly see in the above chart, the odds are low that there will be a failure during the extended warranty period. Many places further hedge this by adding in replacement only terms in their extended warranty contracts (because it is assumed that the product you buy will depreciate faster than money itself - a fair call in most instances). What does this mean? This means that in general, and on average, anyone selling extended warranties that cover the "intrinsic failure period" only (ie the bottom of the bath), then there is a really good chance that they will grow their bankroll, assuming they place no more than the fraction of their bankroll as determined by the Kelly Criterion, (And one can assume given the almost zero relative chance of return that the Kelly fraction would be rather high).

Ok, so I know this. So why did I pay $250 for a 5 year extended warranty (beyond the 2 year manufacturers warranty)?

Simply put - because I'm placing the value of my time above that of the value lost through buying the extended warranty. (It is a product that we've had before, that has failed before within 2 years and we had trouble trying to get our money back under the consumer law)

The Australian Consumer law in theory should cover the intrinsic failure period. However, there is a problem - it is hard to force company's to recognise their obligations under it. The more expensive the product, the more likely you are going to experience some sort of problem when you apply it. If I pay $250 for a means to make the argument a non-issue, that saves me countless hours of arguing, then I think I'll just pay for it.

For what it's worth, in general, I still think it's a good idea to avoid extended warranties, unless you have some pretty good intel on the actual odds of failure.


Wednesday, 1 April 2015

Multiple Linear Regression with Excel (part 1)

Hi Folks

Today I am going to walk through Multiple Linear Regression using Excel. First, a lot is already published and well understood about the regression equations, so I'm not going to go into great detail. If you are interested in the proofs, simply look at the wikipedia article on the subject, which is quite complete.

What are we doing when we do a multiple linear regression? Well the basic idea is you have several "independent" variables that could contribute to the outcome of some dependent variable. For example, the price of a house might be influenced by the number of rooms, number of bathrooms, median house price for the location and current interest rate levels. Assuming the very unlikely case that there is some linear relationship between those variables and the price of a house (a big assumption!), a multiple linear regression could be used to predict what the price of some house will be.

Lets say we have this (totally imaginary) data:
number of rooms number of bathrooms median house price interest rate price of house
6 2 250001.3659 8%  $          850,021.37
4 2 249985.1355 4%  $          650,005.14
2 1 249991.6254 9%  $          450,001.63
3 1 249985.2125 9%  $          549,995.22
4 1 249999.2385 6%  $          650,009.24
5 2 249995.5378 8%  $          750,015.54
3 1 250011.8438 7%  $          550,021.85
2 1 249995.037 4%  $          450,005.04
5 1 249993.0192 6%  $          750,003.02
1 1 250010.0988 1%  $          350,020.10
5 1 249981.2104 4%  $          749,991.21
2 1 250001.5801 5%  $          450,011.58
5 1 250016.6773 7%  $          750,026.68
4 2 249973.7979 6%  $          649,993.80
6 1 250030.3517 7%  $          850,040.36
4 1 249997.3914 4%  $          650,007.39
5 2 250000.8471 10%  $          750,020.86
2 1 249993.2497 8%  $          450,003.26
1 1 250021.2873 6%  $          350,031.29
5 2 250000.1968 10%  $          750,020.21
4 1 249991.4433 6%  $          650,001.45
6 2 249995.1933 4%  $          850,015.19
4 1 249999.3487 4%  $          650,009.35
6 3 250043.1142 5%  $          850,073.12
6 3 250009.1431 7%  $          850,039.15
Now the silly formula I used to generate the "price of house" column is this
=(100000*Number of Rooms + 10*Number of Bathrooms+1*Median House Price+Interest Rate^2)

There is a classic way of attacking this (trying to predict what the price of house column would be for any set of values of X), and it is using matrix algebra.

The basic formula for the "hat" matrix, which is the predicted values for Y, is given as

Yhat = X(X'X)-1X'Y

So in excel, this is really easy. There is a little trick to remember, however, when dealing with this formula in the matrix form, you have to add a column of 1 to the front of both X and Y matrix.
In our case, our X matrix is the columns "Number of Rooms", Number of Bathrooms", "Median House Price" and "Interest Rate" and our Y matrix is simply the "Price of House" (YES you CAN predict more than one dependant variable at a time. More to come in future post)

So the first step is understanding the notation. X' is simply the transpose of the X matrix. The function in excel for this is simply "Transpose"

The -1 indicates that we want the inverse of the matrix of the multiple of (X'X). The inverse of a matrix is simply given in excel by the function "Minverse".

To multiply matrices in excel, you need to use the MMULT function (and remember, in matrix algebra, multiplication isn't necessarily palindromic - the order matters).

So the hat matrix can be calculated by the excel formula

Now if you have followed the instructions and added the column of 1s to both the x matrix and y matrix, you should get the hat matrix (which will have, not surprisingly, a column of 1s prior to the results)

If you haven't worked it out yet, you need to enter all matrix formulas as array formulas. That is you hold down control and shift and then press enter in the formula bar.

If you want to eliminate the column of 1s from the hat matrix, simply transpose, select the column of results and transpose again. ie:

For our random data, here are the results of the hat matrix:
number of rooms number of bathrooms median house price interest rate price of house predicted Y (hat matrix)
6 2 250001.3659 8%  $          850,021.37  $                             850,021.35
4 2 249985.1355 4%  $          650,005.14  $                             650,005.12
2 1 249991.6254 9%  $          450,001.63  $                             450,001.61
3 1 249985.2125 9%  $          549,995.22  $                             549,995.20
4 1 249999.2385 6%  $          650,009.24  $                             650,009.22
5 2 249995.5378 8%  $          750,015.54  $                             750,015.52
3 1 250011.8438 7%  $          550,021.85  $                             550,021.83
2 1 249995.037 4%  $          450,005.04  $                             450,005.02
5 1 249993.0192 6%  $          750,003.02  $                             750,003.00
1 1 250010.0988 1%  $          350,020.10  $                             350,020.08
5 1 249981.2104 4%  $          749,991.21  $                             749,991.19
2 1 250001.5801 5%  $          450,011.58  $                             450,011.56
5 1 250016.6773 7%  $          750,026.68  $                             750,026.66
4 2 249973.7979 6%  $          649,993.80  $                             649,993.78
6 1 250030.3517 7%  $          850,040.36  $                             850,040.34
4 1 249997.3914 4%  $          650,007.39  $                             650,007.37
5 2 250000.8471 10%  $          750,020.86  $                             750,020.84
2 1 249993.2497 8%  $          450,003.26  $                             450,003.24
1 1 250021.2873 6%  $          350,031.29  $                             350,031.27
5 2 250000.1968 10%  $          750,020.21  $                             750,020.19
4 1 249991.4433 6%  $          650,001.45  $                             650,001.43
6 2 249995.1933 4%  $          850,015.19  $                             850,015.17
4 1 249999.3487 4%  $          650,009.35  $                             650,009.33
6 3 250043.1142 5%  $          850,073.12  $                             850,073.10
6 3 250009.1431 7%  $          850,039.15  $                             850,039.13

A scatter plot of the hat matrix versus the actual prices shows how well the regression performs:

As you can see from the excel added trendline which calculated an R squared of 1, the fit of the scatter is (very nearly) perfect.

How do we get R squared? It is simply the ratio of the sum of squares of residuals to the sum of squares of the total variation.

That is it is 1-(SSE/SST)
SSE is simply the sum of each residual squared. The residuals matrix is the Y-Yhat. 
So SSE is then, by definition, SUM((Y-Yhat)^2)

SST is simply the sum of the difference of each point in Y from the mean of Y squared and can be calculated easily using the formula


These formulas for the above dataset should produce the following results:
Measure Result
 SSE                                    0.0105388561459062000
 SST       620,048,335,919.4820000000000000000
R squared 1
What this says is that there is a total of 620,048,335,919.482 variance in the Y matrix, and the hat matrix doesn't explain around 0.01 of that matrix. That is, the SSR, or the variance explained by the regression, is SST-SSE = 620048335919.472

This indicates that this particular model explains the data really well, and it could reasonably be expected to predict future results extremely well. 

In the next post of this series, I will go into the analysis of variance and other measures (such as Q-Q plots and residuals interrogation) of regression performance. 

Thanks for reading.

Nick E