[01] Get with the plot!
Share
Those change point plots are all very nice, but what are they actually telling me?
Indeed, there is a lot going on in the above example so we'll explain each part in turn. Starting at the top, the main title informs us that we are looking at change points in the Sunspots dataset, and this is identified as series number S0001 which will align with the subsequent results table.
In this instance we are plotting the monthly mean number of sunspots observed from January 2000 through to January 2021. This is the time series shown in blue and repeated in both the upper and lower plots (labelled as Monthly quantity).
The upper plot shows all the change points detected in the series across four iterations of the change point algorithm. The order in which the change points are detected is provided by each purple CPx number (CP1, CP2, ... CPn). The first change point tends to be the most obvious change. The fact that this series has been identified as Seasonal will act to prevent regular seasonal movements being identified as changes.
The degree of confidence that the identified changes are actual change points and not just random fluctuations is provided by the percentage values below the corresponding CPx number. Change points that have a confidence level above the user-specified level (95% in this case) are given solid purple lines while those below the specified level are given dashed lines.
Change points are transferred to the lower plot on condition that they are meaningful, hereby defined as:
- they are statistically significant, and
- the change in the average value exceeds the user-specified minimum change level (+/- 10% in this case).
The solid orange line shows the average level between the changes (labelled as Average quantity) and the size of the changes at each point is shown by the orange percentage values. The actual averages are provided by the sequence of orange numbers at the top of this chart.
In reality the plots are only provided as a means for confirming the process has performed as expected, and particularly only during the exploration phase of your analysis. In fact there are options to only view a sample of plots (they are nice to look at) or none at all during program execution.
As seen below, the key information shown in the plots is summarised in the CSV output table, although the Change_IDs are now in chronological order. This table can be filtered and interrogated to provide information for subsequent analysis and inclusion in other models. Start_Period and End_Period specify the boundaries between each change, whilst Duration is the period of time over which the Avg_Quantity applies (the number of months in this case).
Although the change occurs between one End_Period and the next Start_Period, our preference is to keep things simple and say that the change occurred at the time of the Start_Period.
As for the last column in the table - hopefully we have sufficiently Sustained your interest and we can pick this up in a later post!