It seems like we should start with a probability distribution and infer a log likelihood based on that. Why is it a bad idea to start with regularizers and map this onto priors? For example, if I were walking around an ideal city grid, would distances not follow a Laplacian distribution given the natural L1 distances?
It seems like priors for more generalized Lp norms vaguely follow $e^{-|x|^p}$ (usually with some normalizing constants). This doesn't seem well written about in literature. Is this not used often?