Getting Started with Time Series Analysis in Python.

Gain a comprehensive understanding of time series analysis using Python, including data visualization, statistical modeling, and forecasting.

· 4 min read
Getting Started with Time Series Analysis in Python.
Graphics created by Auth

Time series analysis is a crucial technique in data science, particularly in finance, economics, and econometrics. This tutorial will guide you through the basics. You'll learn essential skills that will enable you to tackle complex data issues easily. You'll learn to analyze and extract insights from data that change over time.

This guide will give you a solid foundation in time series analysis concepts, including data visualization, statistical modeling, and more. You'll learn how to evaluate the performance of time series models and how to use them to make accurate predictions. By the end of this guide, you'll be equipped with the knowledge and skills you need to take on real-world time series analysis challenges. Get started today and take your data science prowess to the next level!s of time series analysis using Python.

Prerequisites:

Familiarity with Python programming language

  • Knowledge of Pandas library for data manipulation and analysis
  • Basic understanding of statistics

Step 1: Importing Required Libraries

We will begin by importing the necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Step 2: Loading and Exploring the Time Series Data

Next, we will load the time series data into a Pandas data frame. The data can be loaded from a file or obtained directly from an API. In this example, we will use the "Air Passengers" time series dataset in the Seaborn library.

import seaborn as sns
df = sns.load_dataset('flights')

We can explore the first few rows of the data by using the head() method.

print(df.head())
	OUTPUT:

       year     month  passengers 
    0  1949   January         112 
    1  1949  February         118 
    2  1949     March         132 
    3  1949     April         129 
    4  1949       May         121   

Step 3: Preprocessing the Time Series Data

The next step is to preprocess the data. In this example, we will convert the "month" column into a date-time index, the standard format for time series data. We can use the pd.todatetime() method to convert the "month" column into a date-time format, then set it as the index of the DataFrame.


df['month'] = pd.to_datetime(df['month'], format='%B')
df.set_index('month', inplace=True)    

Step 4: Visualizing the Time Series Data

We can visualize the time series data by plotting it as a line chart. This will give us an idea of the trend and seasonality in the data.

plt.figure(figsize=(12,6)) #setting the figure size
plt.plot(df['passengers'])  
plt.xlabel('Year')#naming the x-axis
plt.ylabel('Passengers') #naming the y-axis
plt.title('Time Series Plot of Air Passengers') #givin a title to the plot
plt.show()     

Step 5: Stationarity and Differencing

A time series is considered stationary if its mean, variance, and autocovariance do not change over time. Most time series models assume that the data is stationary, so it is important to check for stationarity and make any necessary transformations.

One way to check for stationarity is to plot the rolling statistics, such as mean and variance. We can use the rolling().mean() and rolling().var() methods to calculate the rolling mean and variance, respectively.‌‌


rolling_mean = df['passengers'].rolling(12).mean() 
rolling_var = df['passengers'].rolling(12).var()

Next, we can plot the rolling mean and variance. The rolling mean and variance should be constant over-time if the data is stationary.


plt.figure(figsize=(12,6)) 
plt.plot(rolling_mean, label='Rolling Mean') 
plt.plot(rolling_var, label='Rolling Variance') 
plt.legend() 
plt.show()

If the data is not stationary, we can make it stationary by differencing the time series. This involves subtracting the time series from a lagged version of itself. We can use the diff() method to perform this transformation.

 
 df['diff'] = df['passengers'].diff()   
 

Step 6: Decomposing the Time Series Data

The next step is to decompose the time series into its constituent parts: trend, seasonality, and residuals. We can use the seasonal_decompose() method from the statsmodels library to perform this transformation.


from statsmodels.tsa.seasonal import seasonal_decompose 

result = seasonal_decompose(df['passengers'], model='multiplicative')

result.plot() 

plt.show()

Step 7: Model Selection and Fitting

We can use several time series models to forecast the future values of a time series, such as ARIMA, SARIMA, and Exponential Smoothing. In this example, we will use the SARIMA model.

We can use the SARIMAX() method from the statsmodels library to fit the SARIMA model.

The SARIMAX() method requires several parameters, such as the order of differencing (d), the order of the autoregressive term (p), the order of the moving average term (q), and the order of the seasonal term (P, Q, m).

These parameters can be determined using a combination of domain knowledge and statistical methods, such as the ACF and PACF plots.‌‌


from statsmodels.tsa.arima.model import SARIMAX 

model = SARIMAX(df['passengers'], order=(1,1,1), 
				seasonal_order=(1,1,1,12)) 
                
results = model.fit() 

Step 8: Model Evaluation and Forecasting

Once the model is fit, we can evaluate its performance by comparing its predictions with the actual values. We can use the predict() method to generate predictions and the mean_squared_error() method from the sklearn library to calculate the mean squared error.

from sklearn.metrics import mean_squared_error 

predictions = results.predict(start=pd.to_datetime('1949-01-01'),
							end=pd.to_datetime('1960-12-01'), 										dynamic=False) 

mse = mean_squared_error(df['passengers'], predictions) 

print('Mean Squared Error:', mse)

We can also visualize the predictions by plotting them against the actual values.

plt.figure(figsize=(12,6)) 
plt.plot(df['passengers'], label='Actual') 
plt.plot(predictions, label='Predicted') 
plt.legend() 
plt.show()        

Step 9: Making Final Forecast

Finally, we can use the fitted model to forecast future values. We can use the getforecast() method to forecast future values for a specified number of steps.

forecast = results.get_forecast(steps=12) 

We can also plot the forecast along with the actual values and the confidence intervals for the forecast.


forecast_mean = forecast.predicted_mean 
forecast_conf = forecast.conf_int()   

plt.figure(figsize=(12,6)) 

plt.plot(df['passengers'], label='Actual')

plt.plot(forecast_mean, label='Forecast')

plt.fill_between(forecast_conf.index, forecast_conf.iloc[:,0], 							forecast_conf.iloc[:,1], color='gray', alpha=0.5) 

plt.legend() 
plt.show()   

And that's it! We have successfully performed time series analysis and forecasting in Python. This tutorial provides a starting point for further exploration of time series analysis and forecasting with Python.

With this tutorial, you'll gain a comprehensive understanding of time series analysis using Python, including data visualization, statistical modeling, and forecasting. Time series analysis is a fundamental technique in data science, particularly in finance, economics, and econometrics. The skills and knowledge you'll acquire from this tutorial will enable you to confidently tackle real-world time series analysis challenges and make accurate predictions. Keep in mind that practice makes perfect, so continue to explore and experiment with time series analysis in Python to enhance your data science prowess.