I know how to use least square for estimating a constant value given a bunch of measurements. It is the average assuming measurements have same weight of variance. $$ \hat{x} = (H^{T}H)^{-1} H^{T}z $$
where $H = 1$ in the case of estimating a constant value and $z$ are a bunch of measurements. Now I would like to simulate a sensor that provides the range and the angle to a point $<x,y>$ with some Gaussian noise. From the sensor, we can get $<x,y>$ as follows $$ z_{1} = r*cos\theta \\ z_{2} = r*sin\theta $$
How can I apply least square to filter $z_{1}$ and $z_{2}$. Is $H$ as follows $$ H = \begin{bmatrix} cos \theta & r*sin \theta \\ sin \theta & r*cos \theta \end{bmatrix} $$
then I apply the least square formula since I have $z$ and $H$? This gives me wrong results.
The linear least squares estimation formula you consider implies that the observation $z \in \mathbb{R}^m$ and the parameter of interest $x \in \mathbb{R}^n$, ($n\leq m$), are linearly related as \begin{equation} z=Hx+n \;(1) \end{equation} where $H \in \mathbb{R}^{m\times n}$, and $n \in \mathbb{R}^m$ is a noise vector with uncorellated elements of zero mean and equal variance.
For the sensor problem, since the sensors provide the cartesian coordinates of the point (if I understood correct), it is convenient to consider as parameter of interest the cartesian coordinates $x\in \mathbb{R}^2$ of the point. Noting that the $i$-th sensor provides a measurement $z_i=x+n_i \in \mathbb{R}^2$, stacking all, say, $N$ measurements to a single observation vector $z\triangleq [z_1^T, z_2^T,\ldots,z_N^T]^T \in \mathbb{R}^{2N}$ results in the linear model of (1). I will leave it to you to determine the form of $H \in \mathbb{R}^{2N\times 2}$. You may then proceed in determining the LS estimate of $x$, from which you can then obtain an estimate of the polar coordinates by the well-known transformation formulas.
Note that the above approach estimates the polar coordinates implicitly (as a by-product of the cartesian coordinates estimate). The reason is that the polar coordinates are not linearly related to the observations (i.e., for $x=[r,\theta]$, you cannot find a matrix $H$ so that (1) holds). If you wish to obtain a direct estimate of the polar coordinates from the observations you would have to perform a non-linear parameter estimation procedure.
EDIT: Since your question considers only $N=1$ sensor, the estimate of the cartesian coordinates is simply $z=(z_1,z_2)$ and the corresponding polar coordinates estimate is $(\hat{r},\hat{\theta})=\left(\sqrt{z_1^2+z_2^2},\tan^{-1}(z_2/z_1)\right)$