if i cross correlate an audio file with itself (which means there is no delay or difference), the peak position should be a zero or at the center of the cross correlation? 
that is really confusing me.
if i cross correlate an audio file with itself (which means there is no delay or difference), the peak position should be a zero or at the center of the cross correlation? 
that is really confusing me.
Copyright © 2021 JogjaFile Inc.
cosinder [5,123,4] as a signal. At n=-3, multiplying and accumulating [5,123,4,0,0,0] and [0,0,0,5,123,4] we get zero.
at n=0, we get 5*5+123*123+4*4
Again at n=3, we get zero.
So, the peak will be at zero displacement. In the graph you have given, there could be a inherent shift in one of the signals, which will cause that offset. Can you check that?
Maybe, this would be better posted in dsp.SE ?