# ARIMA & SARIMA: Real-World Time Series Forecasting (Advanced Guide)

ARIMA is one of the most popular and widely used statistical methods for time series forecasting. Before we take a deep dive into ARIMA, we need to discuss a couple of concepts.

What if the times series model is non-stationary? How do we make it stationary?

To know more about what exactly is stationary in time series, I wrote an article explaining this in detail. Please read it if you want to understand time series forecasting even better.

1. Methods used for converting non-stationary data into stationary data,
2. The ARIMA model,
3. The SARIMA model,
4. A real-world example of predicting the stock price of Microsoft,
5. Some hyper-parameter tuning to make the model more robust.

So, in machine learning, when the data is not in a Gaussian distribution we typically employ transformations like BOX-COX, or LOG. Similarly, when we have non-stationary time series data, there are two types of techniques to convert into stationary time series:

1. Differencing
2. Transformations

## Differencing

Differencing is one of the most important strategies to make a time series stationary. How does it work?

Let me give you the intuition: 1st order Differentiation

Differencing says instead of predicting yt  directly try to predict the gap between  y and yt-1:

Differencing is very similar to differentiation. Yt’ is nothing but yt  – yt-1. If the time series is non-stationary, taking the difference is a great way. Now, instead of predicting yt  predict Yt

Because we can predict  Yt’ we can compute yt as:

yt  =  Yt’ + yt-1   (Reconstruction)

Research has shown that Yt’ is typically more likely to be stationary than yt itself. Given

y1 , y2, y3, y4, . . . . . . . . . . . yt, yt – 1, yt + 1  —-> Non stationary

Now we take the difference between y1 and y2 let’s call it y1.

Similarly, we take the difference between y2 and y3 let’s call it y2.

So on yt.

y1’, y2’, y3’, . . . . . . . . . . . . yt’   —->  1st order difference (more likely to be stationary)

Instead of using 1st  order differentiation values, why not use 2nd order differentiation values?

Y1’’ , y2’’, y3’’, . . . . . . . . . . . . yt’’ 2nd  order differentiation.

If we plugin the formula for  ytin 2nd order differentiation, we get:

Now we can construct y1 from the above equation as: If your time series data is non-stationary, use differencing. It’s an extremely powerful technique. If the 1st order differentiation doesn’t work, you can use 2nd order differencing.

We can do Dth order differencing. This is the hyper-parameter here. We need to experiment with  1st, 2nd, 3rd so on to find the best value.

## Log-Transform

Differencing works very well with practical time series data. But it can’t be applied to all time series. There are other methods, like Log transform, just like there are different methods to convert data into a Gaussian distribution. Log transforms are some of the most popular transforms used.

In Log transforms, instead of modeling yt points, try to apply the logarithmic function to a point, and model the time series with the new log-transformed data. Let’s called the new point t(y Tilda)

All of these methods are from the 1970s, before modern computers took off or modern machine learning came into existence. The above two are the most popular methods to convert time series into stationary data. Of course, there are plenty of other methods that you can employ depending on your use case.

## ARIMA(p,q,d)

ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. It’s a class of models that captures a suite of different standard temporal structures in time series data. It explicitly caters to a suite of standard structures in time-series data, and as such provides a simple, powerful method for making skillful time-series forecasts. It’s a generalization of the simpler AutoRegressive Moving Average, with the added notion of integration.

• AR: Autoregression. A model that uses the dependent relationship between observation and some number of lagged observations.
• I: Integrated. The use of differencing of raw observations (e.g. subtracting an observation from observation at the previous time step) to make the time series stationary.
• MA: Moving Average. A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

We’ve covered the pieces, now we’ll combine them to form the ARIMA model.

Each of these components is explicitly specified in the model as a parameter. Standard notation is used of ARIMA(p,d,q), where the parameters are substituted with integer values to quickly indicate the specific ARIMA model being used.

The parameters of the ARIMA model are defined as follows:

• p: AR. The number of lag observations included in the model also called the lag order.
• d: I. The number of times that the raw observations are differenced, also called the degree of difference.
• q: MA. The size of the moving average window is also called the order of moving average.

I have explained in detail what Auto Regression and Moving Averages are in this blog post, if you need a refresher.

### How does the ARIMA model work?

Simply put, we have 3 parameters in ARIMA(p,q,d).

p is from Auto Regression, q is from Moving Averages, and d is from differencing.

d can be any order of differencing.

All three parameters are hyper-parameters that need to experiment and figure out which fits best, just like K in K-NN. If d = 2 instead of predicting yt we will use yt’’ to model.

Now we have ARMA(p,q). What is ARMA? ARMA is ARIMA without the I, the Integrating part.

p corresponding to AR and q corresponding to MA. The model looks like this:

Here, μ is some constant + linear combination of the previous p + linear combination of the previous q errored terms + the error this time(𝜖t).

What happens if we take 𝜖t to the other side of the equation? This becomes:

This equation is the same as the previous equation. We have a constant + linear combination of previous P values + linear combination of previous errored terms.

Imagine how we can model this into Linear Regression? We’ll take the previous p values as features, and previous q errors as features, 𝝰i and 𝞱j will be linear regression. This is also a linear regression problem. We can say ARIMA is nothing but a linear regression.

In a nutshell, ARIMA is:

• P, q, d are hyper-parameters,
• ARIMA(p, q, d) is a linear regression model on previous p values and previous q errors post differencing d times,
• Also know as the Box-Jenkins model(1976).

## SARIMA(p, q, d)(P, Q, D)m

We know the ARIMA model. AR is auto regressive, which says we want to predict the time series values based on some periods in the past. I is integrating, which is an upward or downward trend and to get rid of it, we use differencing. MA is moving average, which is informing the errors from the previous period to the next period. The new thing we see here is S – seasonality.

Remember Seasonality? If not, I’ve explained it in detail in part 1 of the time-series forecasting series. You’ll need to know this before getting started with SARIMA.

Let’s incorporate seasonality into the ARIMA model.

How do we know we should use the seasonal ARIMA(SARIMA) model? The above is drawn to show the seasonality. We see a very clear W-type pattern repeating, so we clearly have seasonality.

In SARIMA(P, Q, D)m: m is the seasonal factor. It’s the number of time steps for a single seasonal period.

In the above graph, consider each year has 4 quarters. Now we’ll have m value equal to 4.

The (P, D, Q) are the analogs of (p, q, d), except for the seasonal components.

To derive the equation for the SARIMA model is slightly math-intensive. If you want to try, check this out –> https://stats.stackexchange.com/questions/129901/sarima-model-equation.

Now let’s apply what we’ve discussed in a real-world scenario. Let’s do Microsoft stock prediction, and apply all the methods and techniques we discussed. We’ll generate the data using Google Finance.

STEP 1: Open Google Sheets and type:

`=GOOGLEFINANCE("MSFT", "all", "04/01/2015", "04/01/2021")`

It may take a couple of seconds to get the data. Once the data is generated, we can see 6 columns (Date, Open, High, Low, Close, Volume).

Save the file in .CSV format, and integrate it with Google Colab (you can also run it on your local machine).

Let’s try to predict the closing price. We can check the graph of close with Date:

The price in 2021 is almost 5 times the price from 2015.

Now, let’s use PROPHET to fit a forecasting model to the data. But first, let’s explore what the Prophet is.

Prophet is a procedure for forecasting time series data based on an additive model, where non-linear trends are fit with yearly, weekly, and daily seasonality plus holiday effects. It works best with time series that have strong seasonal effects, and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

Prophet is open-source software released by Facebook’s Core Data Science team. It’s available for download on CRAN and PyPI.

In our data, there are some missing values. For some days, we don’t know what the closing stock price was. Prophet is very robust to outliers, and handles them very well along with missing data.

We need to do one thing – check if the data follows a normal Gaussian distribution. This is useful, because normality is one of the theoretical assumptions when the FB Prophet builds a regression model. I’m going to apply the Power transformation, Box_Cox, which transforms the data into a normal curve. To know more about box-cox, please refer to this link.

One key takeaway is that it estimates the exponent called λ, which is used to transform our dataset. We’ll keep track of λ for later use, when we want to get our original un-transformed data.

The BOX_COX model is implemented in python in the statsmodels library:

```from statsmodels.base.transform import BoxCox
bc = BoxCox()
df['Close'], lmbda = bc.transform_boxcox(df['Close'])```

Now it gets interesting. We’ll create a simple modeling prophet. Prophet requires renaming. The Date column should be renamed as ds, and the target column – Close – here should be renamed to y. These are the standard names in all the prophet models.

```data = df[['Date' , 'Close']]
data.columns = ['ds' , 'y']
data = df[['Date' , 'Close']]
data.columns = ['ds' , 'y']```

Let’s add the parameters to our model.

```model_params = {
"daily_seasonality": False,
"weekly_seasonality": False,
"yearly_seasonality": True,
"seasonality_mode": "multiplicative",
"growth": "logistic"
}```

Finally let’s import the Prophet. Because of the logistic growth, let’s set up the upper limit of the forecast using the standard deviation of 5%.

```model = Prophet(**model_params)
data['cap'] = data['y'].max() + data['y'].std() * 0.05```

At this point, we can fit the model to the data. Let’s create a new variable to hold the future predictions with a period of 365 days.

```model.fit(data)

future = model.make_future_dataframe(periods=365)
future['cap'] = data['cap'].max()

forecast = model.predict(future)```

Let’s plot the data:

This is the overall trend of the stock price, we can see it is raising and also we can see the yearly seasonality:

Finally, we have our forecasting model. The black dots are the actual values, the blue line is our forecast. This is the general forecasting procedure.

Now, to make the model more robust let’s add a Monthly trend, Quarterly trend, and Holidays.

```model = Prophet(**model_params)
model = model.add_seasonality(name = 'monthly' , period = 30 , fourier_order= 10)
model = model.add_seasonality(name = 'quarterly' , period =  92.25, fourier_order= 10)

model.fit(data)

future = model.make_future_dataframe(periods = 365)
future['cap'] = data['cap'].max()

forecast = model.predict(future)```

Here we can see the extra generated graphs which are quarterly trends, monthly trends, and holidays. During the holiday times, the price has increased between 2015 and 2016 during the October season.

Now let’s see the final predicted trend. The black dots are the actual values, and the blue line is our forecast, which has changed when compared with the previous model. The dotted line is the cap.

Now, let’s take one more step to do hyper-parameter tuning.

### Related

The changepoint_prior_scale parameter controls the flexibility of the trend in the time series. Its value ranges from 0.001 to 0.5. Higher values mean higher importance of the trend. So, the trend gets more influence in the forecasting model. Usually, this value is set at 0.05, but we can do hyper-parameter tuning to see which value fits best.

The seasonality_prior_scale parameter controls the flexibility of the seasonality of the component. Its value ranges from 0.01 to 10.0. Higher values mean higher importance of seasonality to influence the forecasting model.

```param_grid = {
"daily_seasonality" : [False],
"weekly_seasonality" : [False],
"yearly_seasonality" : [True],
"growth" : ['logistic'],
'changepoint_prior_scale': [0.001,0.01,0.1,0.5],
'seasonality_prior_scale': [0.01,0.1,0.5,10.0]

}

#generate all combiations of parameters
all_params = [
dict(zip(param_grid.keys(), v))
for v in itertools.product(*param_grid.values())
]

print(all_params)```

In the hyper-parameter tuning, I changed only the changepoint_prior_scale and seasonality_prior_scale to the influence in the forecasting model. The other parameters are kept constant, as we did in the earlier model.

```#foarecast with the best parameters
best_model = Prophet(**best_params)

best_model = best_model.add_seasonality(name = 'monthly', period = 30, fourier_order = 5)
best_model = best_model.add_seasonality(name = 'quarterly', period = 92.25, fourier_order = 10)

best_model.fit(data)

future = best_model.make_future_dataframe(periods=365, freq = 'D')

future['cap'] = data['cap'].max()

forecast = best_model.predict(future)```

After the hyper-parameter tuning, we can compare the graphs to check for trends for monthly, quarterly, and how holidays impact the stock price.

Let’s see the final predicted trend. The black dots are the actual values, and the blue line is our forecast, which has changed when compared with the previous model. The dotted line is the cap.

To get the real values back, we should un-transform the data. For that, let’s again use the stats.model library.

```forecast['yhat'] = bc.untransform_boxcox(x = forecast['yhat'], lmbda = lmbda)
forecast['yhat_lower'] = bc.untransform_boxcox(x = forecast['yhat_lower'], lmbda = lmbda)
forecast['yhat_upper'] = bc.untransform_boxcox(x = forecast['yhat_upper'], lmbda = lmbda)
forecast.plot(x = 'ds', y = ['yhat_lower', 'yhat', 'yhat_upper'])```

Finally, we get the below graph predicted for the next 2 years, where yhat_lower has a 5% lower confidence interval, and yhat_upper has a 5% upper confidence interval.

## To summarize

We built a regression model in Prophet to predict the closing stock price of Microsoft Corp. Specifically, we:

• Created a stock price dataset in Google Sheets,
• Created a simple model in Prophet,
• Added custom seasonalities and holiday effects,
• Performed hyperparameter tuning.

For the entire code for Microsoft stock price prediction, I have added the Google Colab file. Please check the complete code and dataset.

References:

Note: All images were created by the author.

## ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

10 mins read | Author Jakub Czakon | Updated July 14th, 2021

Let me share a story that I’ve heard too many times.

”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…

…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…

…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”

– unfortunate ML researcher.

And the truth is, when you develop ML models you will run a lot of experiments.

Those experiments may:

• use different models and model hyperparameters
• use different training or evaluation data,
• run different code (including this small change that you wanted to test quickly)
• run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)

And as a result, they can produce completely different evaluation metrics.

Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.

This is where ML experiment tracking comes in. ### Time Series Forecasting – Data, Analysis, and Practice ### Anomaly Detection in Time Series: 2021  