I want to pose this question as general as possible and ask for reference of what to do in similar situations. I'll incrementally add details to narrow down the problem.
I want to derive large deviation bound for $\| X + Y \|_2^2$, where $X$ and $Y$ are $p$-dimensional dependent vectors; I believe in my situation it is impossible to compute the quantiles of $Z = \| X + Y \|_2^2$. Formally, I want to get something like $$ \mathbb{P}( \| X + Y \|_2^2 > \lambda(x)) \le e^{-x}, $$ where $\lambda(x)$ is a deterministic function of $x$ and characteristics of $X$ and $Y$.
Are there any general techniques, approaches or ways of thinking in this most generic situation?
Detail 1. I know that $Y \sim \mathcal{N}(0, \mathbf\Sigma)$.
Detail 2. I know that $\mathbb{E} \| X \|_2^2 \le \Delta$, for fixed $\Delta$.
Detail 3. I know that the magnitude of $Y$ is much higher than that of $X$, but I cannot formally deduce the above-mentioned inequality to something similar to $$ \mathbb{P}(\| Y \|_2^2 \ge \lambda_Y(x)) \le e^{-x}. $$ However, I know that $\lambda(x)$ should be large only because of $Y$ and not $X$.
Notice that if $\lambda(x)$ is allowed to be random then the choice $\lambda(x) = \lambda_Y(x) + \| X \|_2 + 2X^\top Y$ would do the job just fine.
I'd appreciate any ideas, suggestions and/or comments.
This was too long for a comment, but is not an answer.
First, unless you have some LLN scaling for $\lambda(x)^{-1}(\|X+Y\|^2)$, large deviations theory seems to me only tangentially related through the techniques it uses. If there is a legitimate LDP here, then an obvious way to proceed is via the contraction principle using something like the transformation $T(x,y) = (x+y)^2$.
Next, as PhoemueX, points out, if you know something like $\|Y\| \geq \|X\|$ a.s., then the answer is fairly straightforward. We would have, for any $t>0$, \begin{align*} P(\|X+Y\|^2 \geq \lambda (x)) &= P(e^{t\|X+Y\|^2} \geq e^{t\lambda(x)}) \\ &\leq e^{-t\lambda(x)} E[e^{t(\|X\|^2+\|Y\|^2+2 X^T Y)}]\\ &= e^{-t\lambda(x)} E[e^{t(\|Y\|^2+\|Y\|^2+2 \|Y\|^2)}]\\ &= e^{-t\lambda(x)} E[e^{4t \|Y\|^2}]. \end{align*} Since $Y$ is a mean-zero Gaussian vector with covariance matrix $\Sigma$, we can identify the distribution of the sum of squares. If for instance, $\Sigma = I$ is diagonal, then $\|Y\|^2$ is $\chi^2$, and you can evaluate the moment generating function to be $(1-2t)^{-p/2}$. Optimizing over $t>0$, we would get \begin{align*} P(\|X+Y\|^2 \geq \lambda (x)) &\leq \inf_{t>0} e^{-t\lambda(x)-(p/2)\log(1-2(4t))}, \end{align*} which amounts to optimizing $$\sup_{t>0} \bigg(\lambda(x) - \frac{p}{2} \log(1-8t)\bigg).$$ The notation $\asymp$ means different things depending on the context, so you have to clarify. If you do not have the almost sure inequality written above, then you can probably use some combination of Jensen and Holder to get a crude inequality that is what you want.