Resample data with given sample (Python, numpy)

96 Views Asked by At

I consider two data D1 = (x1, y1) and D2 = (x2, y2), where x1, x2, y1 and y2 are arrays and thus D1 and D2 each describe a graph. Problem: x1 and x2 are different, making it difficult to compare the data using analytical methods.

Example:

x1 = np.linspace(0,10,1001)
x2 = np.linspace(0.4,11.2,1234)
y1 = 20 * np.exp(-x1/100) * np.cos(np.pi*x1)
y2 = 18 * np.exp(-x2/120) * np.cos(np.pi*x2)

Accordingly, in order to use analytical methods, I need to do two things:

  1. Determine interval in which both data exist

  2. Bring the data to the same data points

The goal is to get the following data at the end:

x1_new = np.linspace(0.4,10,1001)
x2_new = np.linspace(0.4,10,1001)
y1_new = 20 * np.exp(-x1_new/100) * np.cos(np.pi*x1_new)
y2_new = 18 * np.exp(-x2_new/120) * np.cos(np.pi*x2_new)

My solution:

  1. Determine the interval by a simple if-else query.

  2. Linear Interpolation:

In my example, x2 has more data points in the interval [0.4,10] than x1. Therefore, we will "downsample" x2 to the interval of x1.

We want to determine a new value for y2 between the points (x2[i],y2[i]) and (x2[i+1],y2[i+1]) at the location x1[j], which I will call new_y2[j]. To make this possible, we construct the following straight line:

f_i(x) = ( y2[i+1] - y2[i] ) / ( x2[i+1] - x2[i] ) * ( x - x2[i] ) + y2[i]

$$f_i(x) = \frac{y_{2,i+1}-y_{2,i}}{x_{2,i+1}-x_{2,i}} (x-x_{2,i}) + y_{2,i}$$

We then finally obtain the following value for new_y2[j]:

new_y2[j] = f_i(x1[j]) = ( y2[i+1] - y2[i] ) / ( x2[i+1] - x2[i] ) * ( x1[j] - x2[i] ) + y2[i]

$$y_{2[new],j} = f_i(x_{1,j}) = \frac{y_{2,i+1}-y_{2,i}}{x_{2,i+1}-x_{2,i}} (x_{1,j}-x_{2,i}) + y_{2,i}$$

Implementation as Python code:

def resize_data(x1, x2, y1, y2):
    if len(x1) != len(y1) or len(x2) != len(y2):
        print("Data sets don't match!")
        return x1, x2, y1, y2
    
    N1 = len(x1)
    N2 = len(x2)
    
    # Delta X
    dx1 = (x1[N1-1] - x1[0])/N1
    dx2 = (x2[N2-1] - x2[0])/N2
    
    # Left and Right of intervals
    x1_idxL = 0
    x1_idxR = N1-1
    
    x2_idxL = 0 
    x2_idxR = N2-1
    
    # Find Most Left and Most Right
    
    if x1[0] <= x2[0]:
        x1_idxL = np.min(np.where(x1 >= x2[0]))
    else:
        x2_idxL = np.min(np.where(x2 >= x1[0]))
    
    if x1[N1-1] <= x2[N2-1]:
        x2_idxR = np.min(np.where(x2 >= x1[N1-1]))
    else:
        x1_idxR = np.min(np.where(x1 >= x2[N2-1]))
       
    # Resize Data  
       
    if x1_idxR - x1_idxL < x2_idxR - x2_idxL:
        new_y2 = np.zeros(len(y1))
        for j in range(x1_idxL,x1_idxR+1):
            i = int((x1[0]+j*dx1-x2[0])/dx2)
            new_y2[j] = (y2[i+1]-y2[i])/(x2[i+1]-x2[i])*(x1[j] - x2[i]) + y2[i]
        return x1[x1_idxL:x1_idxR+1], x1[x1_idxL:x1_idxR+1], y1[x1_idxL:x1_idxR+1], new_y2[x1_idxL:x1_idxR+1]
    
    else:
        new_y1 = np.zeros(len(y2))
        for j in range(x2_idxL,x2_idxR+1):
            i = int((x2[0]+j*dx2-x1[0])/dx1)
            new_y1[j] = (y1[i+1]-y1[i])/(x1[i+1]-x1[i])*(x2[j] - x1[i]) + y1[i]
        return x2[x2_idxL:x2_idxR+1], x2[x2_idxL:x2_idxR+1], new_y1[x2_idxL:x2_idxR+1], y2[x2_idxL:x2_idxR+1]

Example:


x1 = np.linspace(0,10,1001)
x2 = np.linspace(0.4,11.2,1234)
y1 = 20 * np.exp(-x1/100) * np.cos(np.pi*x1)
y2 = 18 * np.exp(-x2/120) * np.cos(np.pi*x2)
y2_ = 18 * np.exp(-x1/120) * np.cos(np.pi*x1)

nx1, nx2, ny1, ny2 = resize_data(x1, x2, y1, y2)

# For Comparison

x1_idxL = 0
x1_idxR = len(x1)-1    
x2_idxL = 0 
x2_idxR = len(x2)-1

if x1[0] <= x2[0]:
    x1_idxL = np.min(np.where(x1 >= x2[0]))
else:
    x2_idxL = np.min(np.where(x2 >= x1[0]))
    
if x1[len(x1)-1] <= x2[len(x2)-1]:
    x2_idxR = np.min(np.where(x2 >= x1[len(x1)-1]))
else:
    x1_idxR = np.min(np.where(x1 >= x2[len(x2)-1]))

# Plot
    
fig = plt.figure(figsize = (10, 20))

plt.subplot(4,1,1)
plt.plot(x1,y1, color = 'blue', label = 'y1')
plt.plot(x2,y2, color = 'red', label = 'y2')
plt.legend()

plt.subplot(4,1,2)
plt.plot(x1,y1, color = 'blue', label = 'y1')
plt.plot(nx1,ny1, color = 'green', label = 'new y1')
plt.legend()

plt.subplot(4,1,3)
plt.plot(x2,y2, color = 'red', label = 'y2')
plt.plot(nx2,ny2, color = 'green', label = 'new y2')
plt.legend()

plt.subplot(4,1,4)
plt.plot(x2[x1_idxL:x1_idxR+1],y2_[x1_idxL:x1_idxR+1]-ny2, color = 'black', label = 'y2 - new y2')
plt.legend()

This works, but is somewhere unsightly. Also, I think my approach with a linear interpolation is inelegant. Surely there is already a function in numpy or math exactly for this (PLEASE TELL ME). I would like to get some feedback on this to minimize the error and speed up the process.

Pictures:

Plots of Example