Time Series Analysis#

Agenda#

  • Introduction

  • ARIMA-Family

  • TSA Decomposition

  • Smoothing

  • Fancy models

What is a Time Series?#

A series of data points over time

Q: Examples

What is a Time Series?#

A series of data points over time

Q: Which is a time series?

  • average monthly temperatures 1980-2021

  • average sales of sneakers in January 2021

  • the latest song by Billie Eilish

Some temperature data#

Date

Mean

Year

Month

794

1950-10-06

-0.20

1950

10

795

1950-09-06

-0.10

1950

9

796

1950-08-06

-0.18

1950

8

797

1950-07-06

-0.09

1950

7

798

1950-06-06

-0.06

1950

6

799

1950-05-06

-0.12

1950

5

800

1950-04-06

-0.21

1950

4

801

1950-03-06

-0.06

1950

3

802

1950-02-06

-0.26

1950

2

803

1950-01-06

-0.30

1950

1

What is special in time series data?#

It can (needs to) be ordered by time

Actual values depend on historical ones

  • Dependent variable stands on both sides of the equation

  • Even the error terms can depend on historical values

85% of today’s temperature can be explained by yesterdays!

What is special in time series data?#

It can (needs to) be ordered by time

Actual values depend on historical ones

  • Dependent variable stands on both sides of the equation

  • Even the error terms can depend on historical values

Distributions can change over time

  • non-stationarity (hard)

Predicting the future… mhm#

We have no way of knowing (or seriously guessing for that matter) the next outcome of a random experiment. But there is some stuff that we can do!

Pry all information from our data that is not random, and make forecasts from that. Ideally, the remaining error should only be white noise.

Job of a good TSA

  • split the systematic and the unsystematic

  • identify both

  • forecast the systematic

  • define the unsystematic

EDA for Timeseries#

Visualization#

Visualization is the most important (EDA) tool

  • Gives a good impression of stylized facts

  • Trends and/or cycles?

  • Are there missing values? Are there patterns in missing values?

../_images/d1834eb3724beb0ef70e5ee438840f0e08551a84297582f93cd77c0852936479.png ../_images/14b286a1ed5a42eb1b697c3235abb12c3fff87ab2149dda7350571a8ddac8b55.png

Cycles - and on and on it goes#

Seasonal patterns (cycles) are mid-/short-term information

  • Recurring value levels

  • Typical example: sales during the year

  • Can show any cyclical behavior(e.g. Trigonometric)

Problematic in time series modeling

  • Values in different areas of the cycle are significantly different

  • Different distribution - non-stationary

Question: What are some seasonalities you can think of?
../_images/965b9f7e56cdd2889ccf2a29215a8b4086fd889b163236ec4198b62153b48b8d.png

Value distributions look different at different time ranges

Eliminating cycles#

Cycles often contain process-intrinsic information

Eliminate cycles with

  • Model + subtract trends

  • Time differencing: \(\Delta y_t = y_t - y_{t-4}\)

  • Low-pass filtering

  • Fourier series:
    \( s_t = \beta_0 + \sum_{j=1}^{k} (\beta_j \sin(\lambda_j t) + \kappa_j \cos(\lambda_j t))\text{, }\lambda_j = \frac{2\pi}{d_j} \)

../_images/cf2b7ae7742e12d8d573864ce0fcdc3a46f32ef9113184f46e552a83cc4d9862.png

Time series before and after removing trend and seasonal patterns… what a beauty

Recap:#

We talked about timeseries data:

  • Data were the sequence plays a crucial role

  • Typically: data with a fixed interval without missing values

We talked about visual EDA:

  • Trends (Where are they coming from? how to deal with that?)

  • Cycles (Many examples, what to do with that?)

Triggerwarning#

Timeseries?#

DON’T PANIC#

Common Tools in TSA#

Data#

../_images/d245f0ef5fa6ba3a5c1d3e6717d638e1133aacf5682b0d14c72a388e0f1284ac.png
Note: There shouldn't be gaps in Timeseries data. And the frequency should be consistent

Imputation#

../_images/61579f4d281d98b576f6f095552189f1351f595e53e1f62f08486038f7346fe3.png
  1. filling (backfilll,forwardfill,mean)

  2. interpolate / filtering

  3. Resampling

  4. Predicting

Train - Test - Split#

Q: How would you split a Timeseries?

../_images/61579f4d281d98b576f6f095552189f1351f595e53e1f62f08486038f7346fe3.png

Train - Test - Split#

Q: How would you split a Timeseries?

../_images/4f5f1e878596921a0ae5035c2179a34ea5b34161ae5a01e4375f6545fc8b0f7b.png
train,test = data[:-10],data[-10:]

Crossvalidation#

Cross validation in TSA

CV in Timeseries 2

Stationarity#

What does it mean for a TS to be stationary?#

Having a constant mean and covariance function across the time series - in short: constant moments

A time series needs to be stationary in order to make good predictions

How do we know?

  • ADF test - Null Hypothesis: Not Stationary

  • KPSS test - Null Hypothesis: Stationary

We can also look at the autocorrelation function ACF

Rolling Mean#

Carefull: This rolling mean is actually an AR(p) process, not a MA(q)

Exponential Smoothing Methods#

Modelling Time Series#

The oldschool stuff - Decomposition#

What makes a time series?#

A simple additive decomposed model

\[x_t = m_t + s_t + e_t\]
  • \(m_t\) is the trend

  • \(s_t\) is the seasonality

  • \(e_t\) is the error or random noise

Trend -> increase or decrease of the values in the series

Seasonality -> the repeating short term cycles in the series

Random noise -> random variation in the series though there might also be some autocorrelations that can be discovered?!

Additive vs Multiplicative#

Components of TS#

Signal#

Temperature measurements in Basel

../_images/71253d662e7e254f4008d58093a5d41492b9d4942c70fc2c6d7bcedc07ab4060.png

Trend#

What does the trend look like for the temperatures?

  • (strong) upward trend

  • temperature is gradually rising

Note: The trend is computed based on the moving average
../_images/71253d662e7e254f4008d58093a5d41492b9d4942c70fc2c6d7bcedc07ab4060.png ../_images/a7992dd30e63f36bc5d81c04be8a323453a487a312533a73fa8d36931bfbd056.png

Seasonality#

What does the seasonality look like for the temperatures?

  • We see a strong seasonality for summer and winter

  • in business cycles we have the problem of very long cycles - so very long persistence

Note: The seasonality is the mean of the seasonal periods
../_images/71253d662e7e254f4008d58093a5d41492b9d4942c70fc2c6d7bcedc07ab4060.png ../_images/f51179af70ba978e2c7499e439fac6aa81aa53f2a6be906a2c310291fe6871be.png

Residuals#

We want residuals to look random

  • it is random looking

  • it shows the variation in the series

  • unexplained variance happening due to chance

Note: The Residual is what remains after removing trend and seasonality.
../_images/de06de2269d9993f01dd04d1fa1f931a46f43fc58d11b0dcab8589716bd8126f.png

Components of TS#

../_images/5681541885483fbb7dd62ecaafbfdc5f55fda6ec4b5417307a5d9f62a3e362b7.png

The oldschool stuff - ARMA#

Data Generating ARMA process example#

  Store

  Current stock of cake \(\hspace{0.5cm}X_t\)

  Production

  About 100 cakes are produced every day \(\hspace{2cm}\varepsilon_t\)
  (normally distributed)

  Thieves!

  15% of the cakes from stock
  are eaten by Larissa

  Sales

  40% of the production are picked up
  the next day



  20% of the production are picked up
  the day after

Data Generating ARMA process example#

  Store

  Current stock of cake \(\hspace{0.5cm}X_t\)

  Production

  About 100 cakes are produced every day \(\hspace{2cm}\varepsilon_t\)
  (normally distributed)

  Thieves!

  15% of the cakes from stock
  are eaten by Larissa

  Sales

  40% of the production are picked up
  the next day



  20% of the production are picked up
  the day after















\(X_t = \underbrace{0.85 X_{t-1}}_{\text{AR(1)}} \underbrace{- 0.2\varepsilon_{t-2} - 0.4\varepsilon_{t-1}}_{\text{MA(2)}} + \varepsilon_t\)

Data Generating ARMA process example#

../_images/fec3c9cbf433f8b07157d0183d2b386d25ecdf234f2d91a12188cadcfba152a3.png

\(X_t = \underbrace{0.85 X_{t-1}}_{\text{AR(1)}} \underbrace{- 0.2\varepsilon_{t-2} - 0.4\varepsilon_{t-1}}_{\text{MA(2)}} + \varepsilon_t\)

Data Generating ARMA process example#

print_rmse()
Train RMSE of predict_mean: 11.32
Train RMSE of predict_ARMA: 9.88
Test RMSE of predict_mean: 12.58
Test RMSE of predict_ARMA: 12.27
../_images/a02786cd88271031697e5e939aba5a5c4aa6be7ce267aa5a9fab180aad6c7dee.png

\(X_t = \underbrace{0.85 X_{t-1}}_{\text{AR(1)}} \underbrace{- 0.2\varepsilon_{t-2} - 0.4\varepsilon_{t-1}}_{\text{MA(2)}} + \varepsilon_t\)

Forecasting with ARMA#

Step 1: Estimate ARMA model

  • get parameters

Step 2: Use parameters to forecast

  • calculate innovations recursively

  • compute forecasts with observations and innovations

  • software-implemented (statsmodels)

../_images/91ab2a9424bf9704c91e2c55caaf7d023877c8bc1336e88fc0529723038e6425.png

Other models from ARMA class#

  • ARMAX

    • Adding exogeneous variables to the model

  • ARIMA

    • Adding an integrated part for non-stationarity (in the mean)

  • SARIMA

    • Adding a further seasonal part to the model

  • ARFIMA

    • Models with long memory

  • VAR

    • Multivariate (vector) autoregressive models

  • Time-Varying coefficients

The “normal” stuff#

With a little trick you can also use many of the models we have used so far (often XGboost is working very well)

y
0 0.496714
1 0.060421
2 0.324157
3 1.660069
4 0.209006
5 -1.280167
6 1.086848
7 2.119192
8 -0.510608
9 -1.036433
y y_t1 y_t2 y_t3 y_t4 y_t5 y_t6
0 0.496714 NaN NaN NaN NaN NaN NaN
1 0.060421 0.496714 NaN NaN NaN NaN NaN
2 0.324157 0.060421 0.496714 NaN NaN NaN NaN
3 1.660069 0.324157 0.060421 0.496714 NaN NaN NaN
4 0.209006 1.660069 0.324157 0.060421 0.496714 NaN NaN
5 -1.280167 0.209006 1.660069 0.324157 0.060421 0.496714 NaN
6 1.086848 -1.280167 0.209006 1.660069 0.324157 0.060421 0.496714
7 2.119192 1.086848 -1.280167 0.209006 1.660069 0.324157 0.060421
8 -0.510608 2.119192 1.086848 -1.280167 0.209006 1.660069 0.324157
9 -1.036433 -0.510608 2.119192 1.086848 -1.280167 0.209006 1.660069

The fancy stuff#

Facebook’s Prophet#

Harvey, A.C. and Peters, S. (1990), “Estimation Procedures for Structural Time Series Models” Taylor, S.J. and Letham, B. (2017), “Forecasting at Scale”

  • Library open-sourced for automated forecasting

    • Based on decomposable time series model
      \(x_t = g(t) + s(t) + h(t) + \varepsilon_t\)
      \(g(t)\) : trend function
      \(s(t)\) : periodic changes
      \(h(t)\) : holiday effects

  • Time is the only feature

    • several linear and non-linear functions of time

  • Needs not to be regularly spaced

  • Easily interpretable

Harvey, A.C. and Peters, S. (1990), “Estimation Procedures for Structural Time Series Models”
Taylor, S.J. and Letham, B. (2017), “Forecasting at Scale”

Rocket#

  • Open-source library for time classification

  • Uses CNN with >10.000 random convolutional filters

    • Then Logistic Regression

  • Exceptionally fast

    • 1h 15min vs. 16h of alternative methods

Dempster, A. et al. (2019): “ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels”

Convolutional neural networks#

  • As with images slide the kernel over the sequence data to detect patterns

    • If the data pattern matches the filter we see spikes

  • Later layers in CNNs respond to more complicated patterns, i.e. higher {specialization}

    • In 1D-convolutions we can use longer filters or add layers

Long-Short-Term-Memory networks#

  • LSTMs are special network layers that carry memory for short-term and long-term information

  • Appropriateness ambiguous

    • Often do not beat exponential smoothing

    • Sometimes very good results

  • Very data hungry

Transformer models#

  • A very promising model is autoformer

  • Good long-term predictions

  • Very data hungry, not much experience yet

  • Also, hard to predict when they will work well

[medium article](https://medium.com/mlearning-ai/transformer-implementation-for-time-series-forecasting-a9db2db5c820) [tds article](https://towardsdatascience.com/how-to-use-transformer-networks-to-build-a-forecasting-model-297f9270e630)

Conclusion#

Many models - where to start#

As usual: with a good EDA

  • Find what is characteristic

Start with simple models

  • Exponential smoother are very good for forecasting

  • ARMA models are well interpretable and allow simulations and control

Only go to more complex or specific models, if you have to

  • and if your data amount allows you to

Resources#

ACF / PACF#

acf_pacf(df_acf_pacf)
../_images/0d553cdf7b27f4b6a262b1cee92c5fc4c75f1c05ddc657c9db663e1c7979c054.png

They all have 0/1 as a first value (auto correlation without lag. “what is the temperatur now compared with the temperatur now. wow the same, hence 1”) they have the same value for the first lag. After that, they differ.

ACF / PACF#

../_images/3e01babee7deb3ee4bf2cd0e52af80e16f34c839b92fa1e0a81fedab1f5e37c0.png

Step - by - step visualisation to aid explaining the difference between ACF and PACF. For PACF, in each step the stuff that was already explained by the shorter autocorrelations is removed.