Friedman 1994 MARS: variances in simulation 2 and 3

56 Views Asked by At

In Friedman (1994) p.41 & 42, the following random variables (RVs) are simulated (given below in a different parametrization):

  1. $$ y_1 = (x_1^2 + (x_2x_3 - \frac{1}{x_2x_4})^2 )^{1/2} + \sigma_1\epsilon_1, $$
  2. $$ y_2 = \tan^{-1}( \frac{x_2x_3 - \frac{1}{x_2x_4}}{x_1}) ) + \sigma_2\epsilon_2 .$$

All RVs (except $y_1$ and $y_2$) are independent. The $x_i$ variables are uniformly distributed on:

$$ 0 \leq x_1 \leq 100, $$ $$ 20 \leq x_2/2\pi \leq 280, $$ $$ 0 \leq x_3 \leq 1, $$ $$ 1 \leq x_1 \leq 11. $$

The epsilon noise terms are distributed standard normally, $N(0,1)$.

Friedman (1994) continues:

"The variance of the noise was chosen to give a 3 to 1 signal-to-noise ratio for both [...]"

My questions are:

  • What is the variance of each noiseless part?
  • And how to derive those variances analytically [EDIT]?
1

There are 1 best solutions below

5
On

I will take the lazy path and resist deriving the distributions and their variances, leaving it to someone who is more eager to do multiple integrations than I to give a 'real' Answer. If I were to do this, I might start by finding the distribution of $Q = X_2 X_3 - (X_2 X_4)^{-1},$ which appears in your definitions of both $Y_1$ and $Y_2.$ I ignore the normal noise component of each.

In case it is of any help, here are simulations in R statistical software of $Q, Y_1,$ and $Y_2.$ I simulated 100,000 realizations of each. The SDs should be accurate to a couple of significant digits. Histograms suggest the shapes of the densities. (The short bar at the right side of the histogram for $Y_2,$ seems to be an artifact of the binning.)

m = 10^5
x1 = runif(m, 0, 100)
x2 = runif(m, 40*pi, 560*pi)
x3 = runif(m)
x4 = runif(m, 1, 11)

q = x2*x3 - 1/(x2*x4)
y1 = sqrt(x1^2 + q^2)
y2 = atan(q/x1)   

var(y1);  var(y2);  var(q)
## 144027.1
## 0.09909435
## 148589.8

sd(y1); sd(y2); sd(q)
## 379.509
## 0.3147925
## 385.4735

enter image description here

Plot of 30,000 simulated $(Y_1, Y_2)$ pairs hints at bivariate distribution (and its support).

enter image description here