DATA 330 | Spring 2021
Questions?
Why do we care about time series in data science?
But why?
Modeling.
Why?
I can think of two cases:
First, there’s big data.
“Resolution” or “number of samples” are not “big data.”
The four V’s of big data.
Focus on the “hows” and “whys”
and not the results to explain what we depict.
For example:
“Massive Dataset Analysis for Geoscience Data”
“One approach is to reduce data in a way that preserves spatial, temporal, and inter-scale structures via discrete probability distribution estimates associated with cells of space-time grids at different resolutions. It is then possible to study relationships between cells at different scales. Data are stratified […] to form subsets. Each subset is reduced using a clustering algorithm […]. The clusters’ centroids and populations define a set of discrete probability distributions, which become the fundamental units for data analysis.” – AJ Braverman, Jet Propulsion Laboratory, CA
We see evidence of time series + data science.
Imagine having only to store data models
in place of big data.
There’s a popular belief that all data is a mixture of parametric structures and stochastic noise.
When the shared sample space for the stochastic process is time,
we refer to this data as a time series.
Which brings us to my second case …
forecasting.
If there truly are patterns in data and we know what happened in the past, can we predict the future?
What’s so great about predictions?
Restate my assumptions:
Evidence:
So what about the stock market?
We’re not there yet.
Over the next 15 weeks, I’d like to
WARNING: There’s a lot of math in our textbook.
We’ll have opportunities for discussions.
Sorry about the pacing. It’s going to be off.
Okay. Let’s begin.