k-Wave Toolbox Previous   Next

Optimising k-Wave Performance Example

Overview

This example demonstrates how to increase the computational performance of k-Wave using optional input parameters and data casting.

 Back to Top

Controlling input options

To investigate where the computational effort is spent during a k-Wave simulation, it is useful to use the inbuilt Matlab profiler which examines the execution times for the various k-Wave and inbuilt functions. Running the profiler on a typical forward simulation using kspaceFirstOrder2D with a Cartesian sensor mask and no optional inputs gives the following command line output:

Running k-space simulation...
  dt: 3.9063ns, t_end: 9.4258us, time steps: 2414
  input grid size: 512 by 512 pixels (10 by 10mm)
  smoothing p0 distribution...
  calculating Delaunay triangulation...
  precomputation completed in 13.4419s
  starting time loop...
  computation completed in 5min 16.1545s

The corresponding profiler output is given below.

Aside from computations within the parent functions, it is clear the majority of the time is spent running ifft2 and fft2 (discussed in the next section). Several seconds (in this example 12.326s) are spent computing the Delaunay triangulation used for calculating the pressure over the Cartesian sensor mask using interpolation. The triangulation is run once during the precomputations and this time is encapsulated within the precomputation time printed to the command line. This triangulation can be avoided by using a binary sensor mask, or by setting the optional input 'CartInterp' to 'nearest' (the default and only option in 3D). Several seconds are also spent running the various functions associated with the animated visualisation (imagesc, newplot, cla, etc). This visualisation can be switched off by setting the optional input 'PlotSim' to false. Re-running the profile with these two changes gives the following command line output:

Running k-space simulation...
  dt: 3.9063ns, t_end: 9.4258us, time steps: 2414
  input grid size: 512 by 512 pixels (10 by 10mm)
  smoothing p0 distribution...
  precomputation completed in 0.4761s
  starting time loop...
  reordering Cartesian measurement data...
  computation completed in 5min 10.0251s

The precomputation time has been significantly reduced, and the loop computation time has also been reduced by several seconds. The corresponding profiler output is given below.

Data casting

Even after the modifications above, the majority of the computational time is still spent computing the FFT and the point-wise multiplication of large matrices (within the function kspaceFirstOrder2D). It is possible to decrease this burden by capatilising on Matlab's use of overloaded functions for different data types. For example, computing an FFT of a matrix of single type takes less time than for double (the standard data format used within Matlab). For most computations, the loss in precision as a result of doing the computations in single type is negligible. Within the kspaceFirstOrder1D, kspaceFirstOrder2D, and kspaceFirstOrder3D codes, the data type used for the variables within the time loop can be controlled via the optional input parameter 'DataCast'. Re-running the profile with 'DataCast' set to 'single' gives the following command line output:

Running k-space simulation...
  dt: 3.9063ns, t_end: 9.4258us, time steps: 2414
  input grid size: 512 by 512 pixels (10 by 10mm)
  smoothing p0 distribution...
  casting variables to single type...
  precomputation completed in 0.54333s
  starting time loop...
  reordering Cartesian measurement data...
  computation completed in 3min 4.2111s

The overall computational speed has been significantly improved, in this example by around 1.7 times. The corresponding profiler output is given below.

 Back to Top

Running k-Wave on the GPU

The computational time can be further improved by using other data types, in particular those which force program execution on the GPU (Graphics Processing Unit). There are now several MATLAB toolboxes available which contain overloaded Matlab functions (such as the FFT) that run on any NVIDIA CUDA-capable GPU. For example, Accelereyes have released a commerical toolbox called Jacket (http://www.accelereyes.com). This toolbox is expensive but many Matlab functions are supported (a free 15 day trial version is also available). A similar toolbox has also been released by GP-You called GPUmat (http://www.gp-you.org). This toolbox is free but is still under active development, so many functions (fftn for example) are not yet supported. These toolboxes utilise an interface developed by NVIDIA called the CUDA SDK which allows C programs to run on the GPU. Within Matlab, the execution is as simple as casting the variables to the required data type. These toolboxes can be used with the first-order k-space codes by choosing the appropriate setting for the optional input parameter 'DataCast'.

For example, the Jacket toolbox can be used by setting 'DataCast' to 'gsingle'. The command line output is given below. The computational speed has increased by more than 4 times compared to the standard execution, and 2.5 times compared to setting 'DataCast' to 'single'.

Running k-space simulation...
  dt: 3.9063ns, t_end: 9.4258us, time steps: 2414
  input grid size: 512 by 512 pixels (10 by 10mm)
  smoothing p0 distribution...
  casting variables to gsingle type...
  precomputation completed in 0.32643s
  starting time loop...
  reordering Cartesian measurement data...
  computation completed in 1min 12.5302s

The corresponding profiler output is given below. The majority of time is now spent on computing matrix operations and the FFT on the GPU.

Similarly, the GPUmat toolbox can be used by setting 'DataCast' to 'GPUsingle'. A similar performance enhancement is seen.

Running k-space simulation...
  dt: 3.9063ns, t_end: 9.4258us, time steps: 2414
  input grid size: 512 by 512 pixels (10 by 10mm)
  smoothing p0 distribution...
  casting variables to GPUsingle type...
  precomputation completed in 0.86029s
  starting time loop...
  reordering Cartesian measurement data...
  computation completed in 1min 29.4422s

Note, the interpolation function used within kspaceFirstOrder2D does not currently support GPU usage, so the optional input parameter 'CartInterp' should be set to 'nearest' if using a Cartesian sensor mask. There is currently only limited support for the GPUmat toolbox (2D forward simulations only), but this will increase in future releases of k-Wave as the GPUmat toolbox is developed. Note also that the GPU computations are done with single precision (although support for double precision is now available with the most recent graphics cards and the latest release of Jacket by casting to 'gdouble').

 Back to Top

Multicore support

The command line and profile outputs shown here were generated using Matlab R2009a. This is the first Matlab release to include multicore support for parallelisable functions such as the FFT. If using an earlier version of Matlab, it is possible to get a noticeable in increase in computational speed simply by changing Matlab versions. Note, the comparison between GPU and CPU computations using earlier versions of Matlab is even more pronounced.

 Back to Top


© 2009 Bradley Treeby and Ben Cox.