File talk:Correlation examples2.svg

维基共享资源,媒体文件资料库
跳转到导航 跳转到搜索

Data is not random[编辑]

Corrected one line of the R source code accompanying the image. These examples in the last row are not iid observations, except for the last one. Note that in the first lines of the R function "Others" that generates all of the samples, the x variable is a regular, increasing sequence from -1 to 1. So it is critical to note that this x is a time series. e.g. The first example generates both x and y as serially dependent data:

  x = seq(-1, 1, length.out = n)
  y = 4 * (x^2 - 1/2)^2 + runif(n, -1, 1)/3

The x data is nonrandom, and the y data has strong autocorrelation:

  > print(acf(y))
  Autocorrelations of series ‘y’, by lag
      0     1     2     3     4     5     6     7     8     9    10    11    12 
  1.000 0.767 0.773 0.773 0.769 0.762 0.759 0.760 0.751 0.749 0.767 0.746 0.745 

Tests of independence or linear dependence are typically designed for random samples from the bivariate distribution of (x, y), and in general are not valid for time series like these examples.

The R code was modified as follows: at the beginning of "Others" function

 replaced:  x = seq(-1, 1, length.out = n)
 with:      x = runif(n, -1, 1)

This correction was made on wiki by another editor, also. Mathstat (talk) 00:24, 3 March 2012 (UTC)[回复]