As the historic coronavirus pandemic continues to unfold, governments, businesses, and individuals all around the world are making unprecedented changes. Most countries have implemented policies like mandated lockdowns and universal basic income that might have been politically unthinkable only a few months ago. Meanwhile, to confront emerging economic and health challenges, citizens have had to transform the way that they work and live almost completely.

All these unpredictable shifts in human and institutional behavior have serious ramifications for businesses, extending far beyond the direct impact of the virus itself. Many organizations have invested millions of dollars in AI and machine learning (ML) systems, which they depend on to inform their strategic decision making. The abrupt changes in consumer and corporate behavior brought on by COVID-19 will likely have a major impact on the accuracy of forecasting models that rely on historical data to inform their predictions.

It’s an extreme case of “concept drift,” which is one of the core problems at the heart of data science and machine learning: namely, dealing with data whose fundamental nature changes over time. And concept drift can have a major impact on the long-term success of ML projects and, ultimately, businesses’ ability to make smart, data-driven decisions.

In this paper we’ll explore:

- Understanding concept drift
- Testing for stationary vs. non-stationary data
- Combating and remediating drift
- Detecting drift and automating ML pipelines

## Concept Drift in the Coronavirus Era

The radical shifts in human behavior due to the COVID-19 pandemic — many of which occurred almost overnight — are a textbook example of concept drift in action. In response to unpredictable fluctuations in local COVID-19 cases, as well as public health policy, people are refraining from certain long-standing behaviors, such as driving into the office or going out to eat, while abruptly re-engaging in others. The ensuing months of recovery and possible long-term recession are sure to bring more unforeseen twists and turns.

As a result, the accuracy of ML predictive models that assume a fixed relationship between input and output variables will suffer, due to the continual shifts occurring in the underlying data patterns; additionally, the relationships in the data may abruptly change in real time, further hindering the predictive performance.

To illustrate how high levels of concept drift such as those caused by the pandemic might impact an organization’s predictive capabilities, let’s look at an example data set from New York City, where a severe COVID-19 outbreak has had implications across many areas, such as shopping behavior, electricity usage, motor vehicle collisions, etc.

Specifically, let’s imagine a use case where we need to predict the number of future motor vehicle collisions. The NYPD is required to report all collisions where someone is injured or killed, or where damages amount to at least $1,000 (see Figure 1).

Figure 1: New York City motor vehicle collisions from 2017 to 2020. The dataset is publicly available and can be obtained from here.

We can clearly see a much lower annual collision count compared to previous years — the lowest in a decade — as a clear consequence of the lockdown and business closures resulting from COVID-19. However, in general, our time series (Figure 2) does not have a general upward or downward trend. It increased significantly from 2015 and was steady until the end of 2019 before plunging down to record lows in the 1st and 2nd quarters of 2020.

Figure 2: Collisions resampled over days, weeks, months, quarters, and years.

### Testing for Concept Drift

The sudden plunge in collisions is a red flag that our NYC data set may be undergoing concept drift; it’s important, then, to confirm that a fundamental shift in the underlying data pattern has indeed occurred. This can be accomplished via the Dickey-Fuller test, a statistical test used to determine if a time series is either non-stationary or stationary (or at least trend-stationary). If the time series is non-stationary, the change it’s undergoing may then be attributable to concept drift — particularly if it’s out of line with expected seasonal shifts and trends.

The Dickey-Fuller test is a hypothesis test; it compares a null hypothesis that the data is non-stationary against an alternate hypothesis that the data is stationary. Depending on the test statistic or p-value, we then decide whether the null hypothesis should be rejected.

Null Hypothesis (H0): the time series has a unit root, indicating that it is non-stationary. It has some time dependent structure.

Alternate Hypothesis (H1): the time series does not have a unit root, indicating that it is stationary. It does not have time-dependent structure.

- p-value > 0.05: Accept the null hypothesis (H0), the data has a unit root and is non-stationary.
- p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is stationary.

Returning now to our example, let’s use the Dickey-Fuller test to confirm or disconfirm whether our NYC collisions data set is stationary, or whether its undergoing change that might be attributable to concept drift:

Figure 3: Rolling mean and standard deviation from 2017 to 2019.

Table 1: The Dickey-Fuller test statistic before the pandemic.

### Pre-Drift Forecasting

To begin with, we only used data from 2017 to 2019 from the above test results (Table 1) — in other words, before the pandemic. The outcome is to reject the null hypothesis H0, the data does not have a unit root and is stationary. These results are reflected in our forecasting model as well:

Figure 4: The 22 days forecast under normal conditions.

The graph shows the forecast *before* COVID-19 strikes NY in early 2020 (Figure 4). We use a DeepAREstimator (LSTM) recurrent neural network to forecast 22 days into the future, using 1,214 days of historical vehicle collisions. And as you can see, the blue line, which represents actual collisions observed, aligns closely with our model’s predictions, as represented by the areas shaded in green. The performance is measured by Root Mean Square Error (RMSE): 0.076

Because our pre-pandemic time series was stationary — indicating there was no concept drift, as per the test (Table 1) — there was no need to do any special data preparation.

### Post-Drift Forecasting

Now let’s examine the full 2020 data to see what kind of impact COVID-19 has had on both the Dickey Fuller test results and our data science models.

Figure 5: Rolling mean and standard deviation from 2017 to 2020.

Table 2: The Dickey-Fuller test statistic with the pandemic.

First, we can note from Figure 5 the abrupt change in early 2020 and Table 2 that the P-value highlighted in red is well above 0.05, meaning the data has a unit root and is non-stationary. This indicates that the distribution has changed and the time series is no longer stationary, suggesting that the data set may be experiencing concept drift. And when we run the DeepAREstimator, we see how dramatically the drift impacts forecast accuracy:

Figure 6: The 22 days forecast under COVID-19 pandemic.

Now the observed and predicted collisions deviate significantly (Figure 6). The RMSE is 0.303 and is performing 75.0% lower than before the pandemic.

The bottom line? In the wake of the pandemic, sweeping behavioral changes — veering far from the historical norms our ML models rely on — have severely undermined our forecast’s predictive ability. Without addressing the influence of concept drift, we no longer have usable intelligence to guide future decision making.

## Addressing Concept Drift

Once you understand how to identify concept drift and how it might impact your business intelligence systems, the next step is to plan how to account for it. At a high-level, teams looking to remedy or avoid drift will need to account for the following:

- Detecting significant changes to the data distribution or to the input-output relationship
- Deciding which input data should be discarded and which are still valid for training
- Revising the current models when significant change has been detected

Let’s examine other common remediation strategies in further depth:

### Common Offline Strategies

- Keep existing models and monitor their performance over time, intervening once performance drops below a prespecified threshold.
- Update models by periodically refitting (i.e. creating a new model) with a small portion of that recent historical data which best captures the new behavior. This is referred to as a “sliding window.”
- Train new models that capture the most recent events, and correct the prediction by combining the prior static model. The significance is to differentiate between the recently learned weight, as compared to historically trained models.

### Online Strategies

- Leverage machine learning models such as regression and neural network algorithms, which use coefficients or weights to update the model with the most recent historical data. These types of algorithms use the internal states to increment the learning without the need to completely discard already learned historical patterns.
- Use Data Stream algorithms, which have the ability to incrementally update their models without the need to refit, and are therefore able to learn new behavior much faster, in an online manner. The streaming algorithm pays more attention to recent data by increasing its weight and less attention to historical data by lowering its weight.

### Data Preparation

- In some domains, such as time series problems, the data may be expected to change over time. This issue can be addressed by removing the systematic changes to the data over time, such as trends and seasonality, by differencing or decomposition.

## The Use of Differencing to Handle Concept Drift

In our NYC motor collisions example, because we’re working with a Deep Neural Network, we can update or refit the static model with the latest historical data. However, one of the most common methods for dealing with concept drift in time series is by differencing. Differencing is a technique that deals with both trend and seasonality. It stabilizes the mean by removing changes in the level of observations. The method takes the difference of the observation at a particular instant with that at the previous instant. We perform the following transformation (see Figure 6):

*diff*[*t*] = *y*[*t*] – *y*[*t*–*lag*]

where, *y[t]* is the observation at time *t* and y[t-lag] is the previous instant. The lag is the order differences.

Figure 7: Time series after performing 1st order differencing.

Figure 8: Rolling mean and standard deviation from 2017 to 2020 after performing differencing.

Table 3: The Dickey-Fuller test statistic with the pandemic after performing differencing.

We can see that the mean and standard deviation variations have small variations over time. Also, the Dickey-Fuller test statistic is 0% critical value; thus, the time series is stationary with 100% confidence (see Figure 8 and Table 3). We can also take second or third-order differences, which might yield even better results in certain applications (you can explore the subject further here).

Figure 9: The 22 days forecast under COVID-19 pandemic after performing differencing.

As can be observed from Figure 9, the observed and predicted collisions have an adequate fit, after performing differencing. The RMSE is 0.0523 and performs 83.0% better than before.

### Detecting Concept Drift in Data Stream

One key thing that all the above techniques have in common is monitoring and detecting the presence of concept drift in its infancy. And for many companies, *that’s* the critical, make-or-break factor in the current situation — the ability to identify and respond to drift in real time. And Data-Stream algorithms can do just that.

The strategy here is to use a series of sliding windows maintaining the most recent data in real time, while allowing older data to be slowly forgotten as time elapses. Drift will be detected once two subwindows are “distinct enough”.

Whereas offline algorithms will need to be updated to detect drift — or may even require refitting the model, in the worst cases — the Data-Stream algorithms can automatically continue monitoring the data traffic, and raise an alarm once an anomaly is detected. For instance, in our example, we successfully detected the drift in its earliest stage related to the COVID-19 pandemic (highlighted as red dots in Figure 10).

Figure 10: Time series from 2017 to 2020 and an online based concept drift detection model.

## Adapting at-Scale with MLOps and Automation

Dealing with concept drift is always a key priority for data scientists and ML engineers; and in the wake of events like the current coronavirus pandemic that trigger widespread and unpredictable changes in behavior, it’s absolutely critical.

When AI and ML models are unable to adapt to fundamental shifts in underlying data patterns, accuracy is bound to suffer. This not only undermines businesses’ investments in those technologies; it leaves them flying blind — unable to make informed, data-driven decisions in response to a chaotic and fast-evolving situation.

Because these pattern shifts tend to occur in real time, even as data continues to pile up, preparing for concept drift can be a major challenge. That’s why it’s so important to ensure ML data pipelines and models are built with automation and visibility in mind. Data science teams need a programmatic MLOps framework in order to sustainably deploy and maintain ML models into production; to handle model versions and model selections; to monitor for concept drift and other performance issues; and most importantly, to manage the dynamic nature of ML applications, with the ability to refit, update, and prepare data.

All that needs to happen on-the-fly, while continuing to provide the highest quality of services for business applications. That means you need both the data science expertise *and *the data engineering know-how to automate and monitor pipelines in a sustainable, systematic way. Only then can you be confident in your ability to deal with concept drift in the wake of a seismic event like the COVID-19 pandemic, and continue getting value from your data.

## Are your systems resilient when faced with disruptive events or shifting data?

At phData, we have experience solving tough machine learning problems and putting robust solutions into production. Whether you need to develop a novel proof of concept, see an enterprise-level project through from inception to deployment, or you just need an expert partner that can evaluate your current systems to determine your potential risk in the event of an unforeseen disruptive event, phData’s Machine Learning practice is here to help. Get in touch with us today!