Hi Chao,

Matrices `FFTW_X_temp, FFTW_Y_temp, and FFTW_Z_temp`

serve as temporary storage for data in the complex domain at various places of the code.

Any time I want to take FFT of something, I need to store the result somewhere. I usually need to take FFTs of three different spatial components, e.g. ux, uy, uz calculate duxdx, dyuduy, duzdz, so that’s why I need three temporary matrices. In the special case of pressure, I don’t need to take FFT of pressure 3 times, as the result will always be the same. That’s why I do so only once and store the result in `FFT_X_temp`

. That’s the first line of `Compute_ddx_kappa_fft_p()`

.

However, when I multiply the FFT(p) with ddx/ddy/ddz and kappa I get 3 different results. Thus I need two more temporary matrices and store the output in FFT_X_temp, FFT_Y_temp and FFT_Z_temp in order to take the inverse FFT to calculate ux, uy, uz.

Let’s dismantle the Matlab code for this part (the C++ code follows the MATLAB):

```
% calculate ux, uy and uz at the next time step using dp/dx, dp/dy and
% dp/dz at the current time step
ux_sgx = bsxfun(@times, pml_x_sgx, ...
bsxfun(@times, pml_x_sgx, ux_sgx) ...
- dt./rho0_sgx .* real(ifftn( ...
bsxfun(@times, ddx_k_shift_pos, kappa .* p_k) )));
uy_sgy = bsxfun(@times, pml_y_sgy, ...
bsxfun(@times, pml_y_sgy, uy_sgy) ...
- dt./rho0_sgy .* real(ifftn( ...
bsxfun(@times, ddy_k_shift_pos, kappa .* p_k) )));
uz_sgz = bsxfun(@times, pml_z_sgz, ...
bsxfun(@times, pml_z_sgz, uz_sgz) ...
- dt./rho0_sgz .* real(ifftn( ...
bsxfun(@times, ddz_k_shift_pos, kappa .* p_k) )));
```

In order to save computational resources I calculate all of these statements simultaneously (kappa and p_k are the same think in all 3 statements).

1) We take fft(p) and store the result in `p_k`

. (in the C code this is`FFT_X_temp`

)

2) We calculate

a. `bsxfun(@times, ddx_k_shift_pos, kappa .* p_k)`

b. `bsxfun(@times, ddy_k_shift_pos, kappa .* p_k)`

c. `bsxfun(@times, ddz_k_shift_pos, kappa .* p_k)`

We’ll do this at once meaning taking an element from p_k, multiplying it with kappa (which is same in all statements) and store this in a single float variable say `p_k_element`

. Then we carry out bsxfun on this element and ddx/ddy/ddz and store three results into target elements in `FFT_X_temp, FFT_Y_temp, FFT_Z_temp`

3) Now, we will take inverse FFTs and we need some place where to store it. We can’t override `ux_sgx`

or `uy_sgy`

or `uz_sgz`

because we will need them. So we have to do

a. `Temp_1_RS3D = real(ifft(FFT_X_temp));`

b. `Temp_2_RS3D = real(ifft(FFT_Y_temp));`

c. `Temp_3_RS3D = real(ifft(FFT_Z_temp));`

4) Now we finish the rest of the statements:

a. `ux_sgx = bsxfun(@times, pml_x_sgx, bsxfun(@times, pml_x_sgx, ux_sgx) .- dt./rho0_sgx .* Temp_1_RS3D );`

b. `uy_sgy = bsxfun(@times, pml_y_sgy, bsxfun(@times, pml_y_sgy, uy_sgy) .- dt./rho0_sgy .* Temp_2_RS3D );`

c. `uz_sgz = bsxfun(@times, pml_z_sgz, bsxfun(@times, pml_z_sgz, uz_sgz) .- dt./rho0_sgz .* Temp_3_RS3D );`

Please note that, the `Temp_1_RS3D, Temp_2_RS3D and Temp_3_RS3D`

matrices are used at different places in the code in different meanings. Simply said, always I need a memory buffer I use one of these matrices since dynamic allocation could be very very very expensive!

The code makes a lot of compromises to the readability in order to squeeze every possible FLOP and B/s from the HW :-)

Jiri