[02] I want it to be meaningful and sustained!

But what do these terms mean?

The CPD algorithm identifies statistically significant change points in accordance with the user-defined Confidence Level. This ensures the changes are legitimate and not one-off or random deviations. We want to avoid false-positives when indicating a change and the higher the confidence level the less likely these will arise. In general, a movement must be observed at a new level around three times in a row to be identified as a change with 95% confidence.

The algorithm also only retains changes of sufficient size as specified by the Minimum Percentage Change parameter. These two requirements combine to ensure the identified change is what we term as meaningful; that is, the change is statistically significant and it is sizable.

Another useful requirement is that the change is sustained whereby one change is not cancelled out by a subsequent change in the opposite direction shortly thereafter. In the example for a weekly series below, there have been two change points in quick succession with the second change essentially cancelling out the first. The movement is more than just a spike so it has legitimately been picked up as a change. The entire non-sustained period is identified by the dotted orange line.

 

If we were to look at the output CSV file at this point then we would see that the Sustained boolean variable is set to false when the movement is not sustained:

In practice we want to disregard non-sustained changes in any subsequent analyses as they do not reflect the overall picture. The easy option would be to delete such change rows when they arise but we would lose some information in doing so. Instead, CPD has an option to disregard non-sustained changes and re-calculate the average quantities around them.

I know we haven't got into any of the CPD settings yet and there are a lot of them (now available at Settings), but in this posting we are going to start with the meaningful and sustained settings as they are the most interesting. These settings appear on the Change Point tab as illustrated below. 

You can see the Confidence Level is set at 95%. Increasing this value to say 99% will reduce the number of change points identified but it will run the risk of missing some legitimate ones. Alternatively, you can reduce the value and allow the algorithm to be more sensitive and require less than three observations before a change is detected but you will increase the risk of false positives. Determining the appropriate level requires weighing up the costs between identifying false positives and missing true positives.

The Minimum Percentage Change is set at 10% which means that only changes exceeding +/- 10% are retained. If changes of a smaller size are detected then the algorithm will combine change durations until there are no individual changes that are too small.

The Minimum Change Interval specifies the shortest duration that is allowed between changes. The value of 5 in this case means that at least five weeks must elapse before another change in either direction will be retained. This parameter can be used to prevent what would best be viewed as one change being split into two changes, say when there was a 40% increase one week and a further 10% increase three weeks later.

The Sustained Duration parameter specifies the maximum interval in which non-sustained changes will be identified. Thus, in this case if there is an increase and then a decrease (or vice-versa) within 12 weeks then this will likely be flagged as non-sustained. The actual outcome is also affected by the specified minimum change interval; if this value is close to the sustained duration value then there may not be any non-sustained intervals anyway.

And finally, getting back to the option of disregarding non-sustained changes, this is provided by the Disregard non-sustained changes selector. Utilising this option will lead to the recalculation of the change points resulting in the following plot: 

and the following output file with all changes now being sustained:

Back to blog

Leave a comment

Please note, comments need to be approved before they are published.