April 29, 2015 · R

plot large time series with R

Trying to plot a huge time series in R is messy. It takes ages to have the plot rendered and the saved pdf wants to eat my hard-disk.

As our screen has limited number of pixels, we do not really need to plot all the data points. However a simple downsampling (take a point each n points) does not fit my requirements as it remove high frequency content from the signal link.

A solution is to take the large univariate time series and transform it into a bivariate time series with the min and the max over successive blocks. Thus instead of plotting N concentrate overlapping points, the idea is to only plot the max and the min of these overlapping points.

The R snippet here:

# Large time series compression for plot
ts.compress <- function(x, start=NULL, n=2^10)  
{
  p <- length(x)
  l <- floor(p/n)
  if (p<n){
      return(x)
  }else {
    y  <- matrix(as.numeric(x[1:(l*n)]),n,l,byrow=TRUE)
    y.min <- apply(y, 1, min) 
    y.max <- apply(y, 1, max)
    y.minmax <- matrix(rbind(y.min,y.max),2*n,1,byrow=FALSE)
    return(ts(y.minmax, frequency=frequency(x)*2/l, start=start))
  }
}

So let's try it

# Simulation of a time large dataset
set.seed(120)  
data <- ts(cumsum(rnorm(2**20)), start=c(1946,1), frequency = 24*60*60)  
data.compress <- ts.compress(data, start=c(1946,1))

# The plot
pdf('~/tmp/rplot.pdf',width = 8, height = 4)  
plot(data)  
dev.off()  
pdf('~/tmp/rplot-compress.pdf',width = 8, height = 4)  
plot(data.compress)  
dev.off()  

Results in rplot.pdf (2.1 Mo) and rplot-compress.pdf (16 Ko).

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket
Comments powered by Disqus