Hoeffding's type inequality for isotonic regression

77 Views Asked by At

Let $n \in \mathbb{N}$ and let $0 \le y_1 \le \dots \le y_n \le 1$. Suppose that $(Y_{k,t})_{k\in\{1,\dots,n\}, t\in\mathbb{N}}$ is a family of independent random variables such that, for each $k\in\{1,\dots,n\}$, $(Y_{k,t})_{t\in\mathbb{N}}$ is a sequence of Bernoulli random variables of parameter $y_k$. Let $T_1,\dots,T_n\in\mathbb{N}$.

Let $(\hat{Y}_1,\dots,\hat{Y}_n)$ be the solution of the quadratic constrained minimization problem \begin{equation*} \min_{0\le\hat{y}_1\le\dots\le\hat{y}_n\le1} \sum_{k=1}^n \sum_{t=1}^{T_k}(\hat{y_k}-Y_{k,t})^2 \end{equation*}

I'm looking for Hoeffding's inequality type guarantees for these estimators. Precisely, can we found (sharp, or nearly sharp, and hopefully better than the ones that hold for Hoeffding's inequality) constants $c_1,c_2>0$ such that, for each $n \in \mathbb{N}$, each $0\le y_1 \le \dots\le y_n \le 1$, each $T_1,\dots, T_n\in\mathbb{N}$, each $a_1,\dots,a_n \in \mathbb{R}$ and each $\delta\in (0,1)$ it holds

\begin{equation*} \mathbb{P}\bigg[ \bigg| \sum_{k=1}^n a_k\hat{Y}_k- \sum_{k=1}^n a_ky_k\bigg|\ge \sqrt{c_1 \sum_{k=1}^n\frac{a_k^2}{T_k} \log \Big(\frac{c_2}{\delta}\Big)} \bigg] \le \delta \;? \end{equation*}

(Notice that this inequality would hold with $c_1 = \frac{1}{2}$ and $c_2 = 2$ if we replace each estimator $\hat{Y}_k$ with $\frac{1}{T_k} \sum_{t=1}^{T_k} Y_{k,t}$, due to Hoeffding's inequality).

Even the cases where

  1. all the $a_k$ are $0$ except one whose value is $1$, or
  2. all the $a_k$ are $0$ except two whose value is $-1$ and $1$ respectively, or
  3. all the $a_k$ are equal to $1/n$,

would be interesting for me.

I've seen that the solution to the previous minimization problem goes under the name isotonic regression, but I realized that the literature is extremely vast and found myself quickly lost. I would very much appreciate even a pointer to some reference where this problem is properly addressed.