k-Wave Toolbox |
![]() ![]() |
On this page… |
---|
This example demonstrates how to increase the computational performance of k-Wave using optional input parameters and data casting. A separate standardised benchmarking script benchmark
is also included within the k-Wave toolbox to allow computational times to be compared across different computers and GPUs.
To investigate where the computational effort is spent during a k-Wave simulation, it is useful to use the inbuilt MATLAB profiler which examines the execution times for the various k-Wave and inbuilt functions. Running the profiler on a typical forward simulation using kspaceFirstOrder2D
with a Cartesian sensor mask and no optional inputs gives the following command line output (set example_number = 1
within the example m-file):
Running k-Wave simulation... start time: 19-Dec-2011 12:02:43 reference sound speed: 1500m/s dt: 3.9063ns, t_end: 9.4258us, time steps: 2414 input grid size: 512 by 512 grid points (10 by 10mm) maximum supported frequency: 38.4MHz smoothing p0 distribution... calculating Delaunay triangulation (TriScatteredInterp)... precomputation completed in 1.9364s starting time loop... estimated simulation time 2min 27.5386s... memory used: 1.5539 GB (of 11.9985 GB) simulation completed in 2min 43.9508s total computation time 2min 45.907s
The corresponding profiler output is given below.
Aside from computations within the parent functions, it is clear the majority of the time is spent running ifft2 and fft2. Several seconds are also spent computing the Delaunay triangulation used for calculating the pressure over the Cartesian sensor mask using interpolation. The triangulation is calculated once during the precomputations and this time is encapsulated within the precomputation time printed to the command line. The Delaunay triangulation can be avoided by using a binary sensor mask, or by setting the optional input 'CartInterp'
to 'nearest'
. Several seconds are also spent running the various functions associated with the animated visualisation (imagesc, newplot, cla, etc). This visualisation can be switched off by setting the optional input 'PlotSim'
to false
. Re-running the profile with these two changes gives the following command line output (set example_number = 2
within the example m-file):
Running k-Wave simulation... start time: 19-Dec-2011 12:09:07 reference sound speed: 1500m/s dt: 3.9063ns, t_end: 9.4258us, time steps: 2414 input grid size: 512 by 512 grid points (10 by 10mm) maximum supported frequency: 38.4MHz smoothing p0 distribution... precomputation completed in 0.42126s starting time loop... estimated simulation time 2min 22.7452s... memory used: 1.5134 GB (of 11.9985 GB) simulation completed in 2min 37.7895s reordering Cartesian measurement data... total computation time 2min 38.236s
The precomputation time has been reduced, and the loop computation time has also been reduced by several seconds. The corresponding profiler output is given below.
Even after the modifications above, the majority of the computational time is still spent computing the FFT and the point-wise multiplication of large matrices (within the function kspaceFirstOrder2D
).
It is possible to decrease this burden by capitalising on MATLAB's use of overloaded functions for different data types. For example, computing an FFT of a matrix of single
type takes less time than for double
(the standard data format used within MATLAB).
For most computations, the loss in precision as a result of doing the computations in single
type is negligible.
Within the kspaceFirstOrder1D
, kspaceFirstOrder2D
, and kspaceFirstOrder3D
codes, the data type used for the variables within the time loop can be controlled via the optional input parameter 'DataCast'
.
Re-running the profile with 'DataCast'
set to 'single'
gives the following command line output (set example_number = 3
within the example m-file):
Running k-Wave simulation... start time: 19-Dec-2011 12:15:06 reference sound speed: 1500m/s dt: 3.9063ns, t_end: 9.4258us, time steps: 2414 input grid size: 512 by 512 grid points (10 by 10mm) maximum supported frequency: 38.4MHz smoothing p0 distribution... casting variables to single type... precomputation completed in 0.43921s starting time loop... estimated simulation time 1min 38.9719s... memory used: 1.5375 GB (of 11.9985 GB) simulation completed in 1min 47.4474s reordering Cartesian measurement data... total computation time 1min 47.89s
The overall computational speed has been significantly reduced, in this example by more than 30%. The corresponding profiler output is given below.
The computational time can be further improved by using other data types, in particular those which force program execution on the GPU (Graphics Processing Unit).
There are now several MATLAB toolboxes available which contain overloaded MATLAB functions (such as the FFT) that work with any NVIDIA CUDA-enabled GPU. These toolboxes utilise an interface developed by NVIDIA called the CUDA SDK which allows programs written in C to run on the GPU, and then a MEX interface to allow the C programs to be run from MATLAB. Within MATLAB, the execution is as simple as casting the variables to the required data type. For example, a comprehensive MATLAB GPU toolbox has been released by Accelereyes called Jacket (http://www.accelereyes.com/). To use this toolbox within k-Wave, the optional input parameter 'DataCast'
is set to 'gsingle'
or 'gdouble'
. Note, the latest release of this toolbox also supports OpenCL and other GPU manufacturers.
To illustrate, the command line output obtained by setting 'DataCast'
to 'gsingle'
is given below (set example_number = 4
within the example m-file). The computational speed has increased by more than 8 times compared to the standard execution, and 5 times compared to setting 'DataCast'
to 'single'
. Note, the interpolation function used within kspaceFirstOrder2D
and kspaceFirstOrder3D
does not
currently support GPU usage, so the optional input parameter 'CartInterp'
should be set to 'nearest'
if using a Cartesian sensor mask.
Running k-Wave simulation... start time: 19-Dec-2011 12:21:00 reference sound speed: 1500m/s dt: 3.9063ns, t_end: 9.4258us, time steps: 2414 input grid size: 512 by 512 grid points (10 by 10mm) maximum supported frequency: 38.4MHz smoothing p0 distribution... casting variables to gsingle type... precomputation completed in 0.43642s starting time loop... estimated simulation time 21.7058s... memory used: 1.5273 GB (of 11.9985 GB) GPU memory used: 0.10841 GB (of 5.1745 GB) simulation completed in 18.3287s reordering Cartesian measurement data... total computation time 18.782s
The corresponding profiler output is given below. The majority of time is now spent on computing matrix operations and the FFT on the GPU. Further details on the speed-up obtained when using different GPUs is given in benchmark
.
The command line and profile outputs shown here were generated using MATLAB R2011a. Some earlier MATLAB versions do not include multicore support for parallelisable functions such as the FFT. If using an earlier version of MATLAB, it is possible to get a noticeable increase in computational speed simply by changing MATLAB versions.
![]() |
Modelling Nonlinear Wave Propagation | Functions — By Category | ![]() |
© 2009-2012 Bradley Treeby and Ben Cox.