N-BEATS: NEURAL BASIS EXPANSION ANALYSIS FOR INTERPRETABLE TIME SERIES FORECASTING

Keshav G
5 min readMay 17, 2020

--

N-beats is a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. Two configurations of N-BEATS demonstrate state-of-the-art performance for M3, M4 and TOURISM competition datasets containing time series from diverse domains, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year’s winner of the M4 competition

N-BEATS Architecture

Figure 1: N-BEATS Architecture

1) Time Series Input

Figure 2:Time Series in consideration

The model takes time series data upto x ₜ (t=data points upto time t) as its input and predict future x ₜ₊ₕ (h = Forecast window length). The size of the input is n*H (n generally ranging from 2 to 7) also called the Lookback Period, model learn the behaviour of the time series over the lookback period and tries to predict the behaviour of data upto “H future points” also called Forecast Period.

2) Architecture Blocks:

Time series data over the Lookback period serves as input to Stack 1, which in-turn is made up of multiple Blocks, which are arranged in a Doubly Residual Stacking manner. In order to understand working of the architecture we have to understand the Basic Block first.

i) Basic Block

Figure 3: Generic Basic Block
Figure 4: Internal structure of Generic Basic Block

Input to Stack 1 will pass through a Block which looks exactly as represented in Figure 3, Figure 4 is the internal layers configuration of Basic Block for clear understanding.

We have set length of Forecast period to 5 data points and Lookback Period to 15 data points.

15 dimensional input from the Lookback Period is first passed through a 4 layer [FC+Relu] stack and afterwards divided into two parts. Each of which is further passed through another FC and finally we get two outputs, a 15 dim vector in the form of Backcast and a 5 dim vector in the form of Forecast. This Basic Block learns to predict not only the future data points in the form of Forecast but also predicts the input data as well in the form of Backcast.

ii) Doubly Residual Stacking Of Blocks:

Figure 5: Internal Structure of a Stack

A single Stack consists of multiple Basic Blocks, arranged in a manner following Double Residual Stacking principal. There are two arithmetic operations going on with the output of the Basic Block (i.e Backcast and Forecast) hence the term Double Residual Stacking.

A 15 dim input of the Lookback Period (Lookback_inp) is passed through Block 1 which gives us two outputs Backcast_1 and Forecast_1. The 15 dim input to Block 2 will be element wise subtraction of Backcast_1 with Lookback_inp( Backcast_1-Lookback_inp ). By subtracting Lookback_inp from Backcast_1, a vector which incorporates only those learnings not learned enough by Block 1 will be passed as input to Block 2.

Following this logic, input to every Block would be a 15 dim vector which is made up of element wise subtraction of previous Block’s Backcast output and input. The Backcast output of the last Block in a Stack is called Stack Backcast output.

Whereas the Forecast output of all the blocks in the Stack will be added element wise to yield a 5 dim vector. This vector is going to serve as Stack Forecast Output.

iii) Combining the Stacks

Figure 6: Stacks arrangement

It is pretty evident from the above image that a 15 dim lookback_inp vector when passed through Stack 1 will yield two outputs, a 15 dim Stack Backcast Output and a 5 dim Stack Forecast Output. Stack Backcast Output will serve as input to Stack 2 (this vector represents learnings not learnt by Stack 1) and similarly Stack Backcast Output from individual Stack will act as input to next Stack down the line. Stack Forecast Output from all the Stacks will be summed element wise together to yield the final 5-dim Global Forecast Vector.

Loss using MSE(mean squared error) will be calculated on this predicted Global Forecast Vector and Ground Truth data.

All the Gradients in the architecture will be updated based on this Loss.

Learning Trend:

Figure 7: Trend Block

In order to learn trend from the lookback period, we remove the last layer of the Basic Block and multiply outputs X and Y with two matrices as mentioned in Figure 7. This change is only incorporated in the Basic Block, author of the paper refers this new Block as Trend Block. These matrices are formed according to the logic mentioned in the N-beats paper.

Figure 8: Equation for Trend Block

Learning Seasonality

Figure 9: Seasonality Block

In order to learn seasonality from the lookback period, we remove the last layer of Basic Block and multiply outputs X and Y with two matrices as mentioned in Figure 9. This change is only incorporated in the Basic Block, author of the paper refer this new Block as Seasonality Block. These matrices are formed according to the logic mentioned in the N-beats paper.

Figure 10: Equation for Seasonality Block

Resources:

paper : https://arxiv.org/pdf/1905.10437.pdf

github implementation: https://github.com/philipperemy/n-beats/blob/master/nbeats_pytorch/model.py

About Me:

I work as a ML Engineer in a fast-growing startup, Greendeck. We, at Greendeck, help retailers with pricing intelligence. We have offices both in London and Indore,India. If you are passionate about deep learning, or simply want to say hi, please drop me a line at keshav.gupta@greendeck.co. Any suggestion regarding the blog will be great as well. www.linkedin.com/in/kshavG

--

--