From a longitudinal cohort-study I have two datapoints per patient on arterial calcification ($x_{new}$ and $x_{old}$). To determine risk factors for calcification-change I want to use a linear regression model. To determine the change I have calculated actual change in mass ($d_{actual}=x_{new} - x_{old}$) and relative change ($d_{relative}= (x_{new} - x_{old}) / x_{old}$ * 100%). Both change measures are important for my analyses.
However, to deal with the high skewness of $x_{new}, x_{old}, d_{actual}$, and $d_{relative}$ I want to transform these variables. I've done this by taking the natural logs of the original variables (as that produced the most normal distribution), first adding +1 to scores as a lot of patients have calcification scores of 0. So: $x_{new, transf}= log (x_{new}+1)$ and $x_{old, transf}= log(x_{old}+1)$. In these transformed scores, the patients that had no calcification still have a score of 0, as Log(1)=0.
Subsequently, I calculated the relative change score for the transformed variable using the formula above ($d_{relative, transf}$= ($x_{new, transf}$ - $x_{old, transf}$) / $x_{old,transf}$ * 100%), but first adding a random constant (1 in this case) to $x_{new,transf}$ and $x_{old,transf}$.
I've got a few questions about this:
- For the transformed variables, does adding the constant +1 áfter transformation not alter the relative changes?
- What would be the difference between the transformed change variables as described above (so first transform, and then calculate change), and if I would have just taken the log of the change scores (so first calculate change and then transform).
Any help would be greatly appreciated!