kspaceFirstOrder3D-CUDA  1.1
The CUDA/C++ implementation of the k-wave toolbox for the time-domain simulation of acoustic wave fields in 3D
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
main.cpp
Go to the documentation of this file.
1 /**
2  * @file main.cpp
3  *
4  * @author Jiri Jaros \n
5  * Faculty of Information Technology \n
6  * Brno University of Technology \n
7  * jarosjir@fit.vutbr.cz
8  *
9  * @brief The main file for the kspaceFirstOrder3D-CUDA.
10  *
11  * @version kspaceFirstOrder3D 3.4
12  *
13  * @date 11 July 2012, 10:57 (created) \n
14  * 10 August 2016, 12:59 (revised)
15  *
16  * @section License
17  * This file is part of the C++ extension of the k-Wave Toolbox
18  * (http://www.k-wave.org).\n Copyright (C) 2016 Jiri Jaros and Bradley Treeby.
19  *
20  * This file is part of the k-Wave. k-Wave is free software: you can redistribute it and/or modify
21  * it under the terms of the GNU Lesser General Public License as published by the Free Software
22  * Foundation, either version 3 of the License, or (at your option) any later version.
23  *
24  * k-Wave is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
25  * the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser
26  * General Public License for more details.
27  *
28  * You should have received a copy of the GNU Lesser General Public License along with k-Wave.
29  * If not, see http://www.gnu.org/licenses/.
30  *
31  *
32  * @mainpage kspaceFirstOrder3D-CUDA
33  *
34  * @section Overview 1 Overview
35  *
36  * k-Wave is an open source MATLAB toolbox designed for the time-domain simulation of propagating
37  * acoustic waves in 1D, 2D, or 3D. The toolbox has a wide range of functionality, but at its heart
38  * is an advanced numerical model that can account for both linear or nonlinear wave propagation,
39  * an arbitrary distribution of weakly heterogeneous material parameters, and power law acoustic
40  * absorption (http://www.k-wave.org).
41  *
42  * This project is a part of the k-Wave toolbox accelerating 3D simulations using an optimized
43  * CUDA/C++ implementation to run small to moderate grid sizes (64x64x64 to 512x512x512). This
44  * code uses a single NVIDIA GPU to accelerate the simulations (AMD GPUs are not supported).
45  *
46  * Compiled binaries of the CUDA/C++ code are available from (http://www.k-wave.org/download).
47  * Both 64-bit Linux (Ubuntu / Debian) and 64-bit Windows versions are provided. This Doxygen
48  * documentation was created based on the Linux version and provides details on the
49  * implementation of the CUDA/C++ code.
50  *
51  *
52  *
53  * @section Compilation 2 Compilation
54  *
55  * The source codes of <tt>kpsaceFirstOrder3D-CUDA</tt> are written using the C++11 standard
56  * (optional OpenMP 2.0 library), NVIDIA CUDA 7.5 library and HDF5 1.8.x.
57  *
58  * There are variety of different C++ compilers that can be used to compile the source codes.
59  * We recommend using the GNU C++ compiler (gcc/g++) version 4.8/4.9, the Intel C++ compiler
60  * version 15.0, or Visual Studio 2013. The version of the compiler is limited by the CUDA
61  * architecture version. The code was tested with CUDA 6.5, 7.0 and 7.5.
62  * The codes can be compiled on 64-bit Linux and Windows. 32-bit systems are not supported due to
63  * the the memory requirements even for small simulations.
64  *
65  * Before compiling the code, it is necessary to install CUDA, C++ compiler and the HDF5 library.
66  * The GNU compiler is usually part of Linux distributions and distributed as open source.
67  * It can be downloaded from http://gcc.gnu.org/ if necessary. The Intel compiler can be downloaded
68  * from http://software.intel.com/en-us/intel-composer-xe/. The Intel compiler is only free for
69  * non-commercial use.
70 
71  * The CUDA library can be downloaded from https://developer.nvidia.com/cuda-toolkit-archive.
72  * The supported versions are 6.5, 7.0 and 7.5. We cannot guarantee the code can be compiled with
73  * later versions of CUDA.
74  *
75  * <b> 2.1 The HDF5 library installation procedure </b>
76 
77  * 1. Download the 64-bit HDF5 library package for your platform
78  * (http://www.hdfgroup.org/HDF5/release/obtain5.html). Please use version 1.8.x, the version 1.10.x
79  * is not compatible with MATLAB yet.
80  *
81  * 2. Configure the HDF5 distribution. Enable the high-level library and specify
82  * an installation folder by typing:
83 \verbatim ./configure --enable-hl --prefix=folder_to_install \endverbatim
84  * 3. Make the HDF library by typing:
85 \verbatim make -j \endverbatim
86  * 4. Install the HDF library by typing:
87 \verbatim make install \endverbatim
88  *
89  *
90  * <b> 2.2 The CUDA installation procedure </b>
91  *
92  * 1. Download CUDA version 7.5 https://developer.nvidia.com/cuda-toolkit-archive
93  * 2. Follow the NVIDIA official installation guide for Windows and Linux
94  * http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-microsoft-windows
95  * and http://docs.nvidia.com/cuda/cuda-installation-guide-linux/.
96  *
97  *
98  * <b> 2.3 Compiling the CUDA/C++ code </b>
99  *
100  * When the libraries and the compiler have been installed, you are ready to compile the
101  * <tt>kspaceFirstOrder3D-CUDA </tt> code. The Makefile only supports code compilation under CUDA/g++
102  * compiler, however using different compilers would be analogous (<tt>-ccbin</tt> parameter for
103  * the <tt>nvcc </tt>compiler)
104  *
105 
106  * 1. Download the <tt>kspaceFirstOrder3D-CUDA </tt> source codes.
107  *
108  * 2. Open the \c Makefile file.
109  * First, set the paths to CUDA and HDF5 libraries, then select how to link the code. Dynamic
110  * lining is preferred since it does not require reloading the CUDA-GPU driver, however the code
111  * will very likely only run on the machine where compiled. The static linking allows you you
112  * create a self-consistent binary with all libraries linked in.
113  *
114  * 3. Compile the source code by typing:
115 \verbatim make -j \endverbatim If you want to clean the distribution, type:
116 \verbatim make clean \endverbatim
117  *
118  *
119  * @section Parameters 3 Command Line Parameters
120  * The CUDA/C++ code requires two mandatory parameters and accepts a few optional parameters and
121  * flags. Ill parameters, bad simulation files, and runtime errors such as out-of-memory problems,
122  * lead to an exception followed by an error message shown and execution termination.
123  *
124  * The mandatory parameters \c -i and \c -o specify the input and output file. The file names
125  * respect the path conventions for particular operating system. If any of the files is not
126  * specified, cannot be found or created, an error message is shown and the code terminates.
127  *
128  * The \c -t parameter sets the number of threads used, which defaults the system maximum. For the
129  * CUDA version, the number of threads is not critical, as the CPU only preprocess data. When
130  * running on desktops with multiple GPUs, or clusters with per-GPU resource allocations, it may
131  * be useful to limit the number of CPU threads.
132  *
133  * The \c -g parameter allows to explicitly select a GPU for the execution. The CUDA capable GPUs
134  * can be listed by the system command \c nvidia-smi. If the parameter is not specified, the code
135  * uses the first free GPU. If the GPUs are set in the CUDA DEFAULT mode, the first CUDA device
136  * is selected. In order to get the automatic round-robin GPU selection working (to e.g. execute
137  * multiple instances of the code on distinct GPUs), please set the GPUs into PROCESS_EXCLUSIVE mode.
138  * On clusters with a PBS scheduler, this is usually done automatically, so no need to change it
139  * by user.
140  *
141  * The \c -r parameter specifies how often information about the simulation progress is printed out
142  * to the command line. By default, the CUDA/C++ code prints out the progress of the simulation,
143  * the elapsed time, and the estimated time of completion in intervals corresponding to 5% of
144  * the total number of times steps.
145  *
146  * The \c -c parameter specifies the compression level used by the ZIP library to reduce the size of
147  * the output file. The actual compression rate is highly dependent on the shape of the sensor mask
148  * and the range of stored quantities and may be computationally expensive. In general, the output
149  * data is very hard to compress, and using higher compression levels can greatly increase the
150  * time to save data while not having a large impact on the final file size. That's why we decided
151  * to disable compression in default settings.
152  *
153  * The \c <tt>\--benchmark</tt> parameter enables the total length of simulation (i.e., the number
154  * of time steps) to be overridden by setting a new number of time steps to simulate. This is
155  * particularly useful for performance evaluation and benchmarking. As the code performance is
156  * relatively stable, 50-100 time steps is usually enough to predict the simulation duration.
157  * This parameter can also be used to quickly check the simulation is set up correctly.
158  *
159  * The \c <tt>\--verbose</tt> parameter enables to select between three levels of verbosity. For
160  * routine simulations, the verbose level of 0 (the default one) is usually sufficient. For more
161  * information about the simulation, checking the parameters of the simulation, code version,
162  * GPU used, file paths, and debugging running scripts, verbose levels 1 and 2 may be very useful.
163  *
164  * The \c -h and <tt>\--help</tt> parameters print all the parameters of the C++ code. The
165  * <tt>\--version </tt>parameter reports detail information about the code useful for debugging and
166  * bug reports. It prints out the internal version, the build date and time, the git hash allowing
167  * us to track the version of the source code, the operating system, the compiler name and version
168  * and the instruction set used.
169  *
170  * For jobs that are expected to run for a very long time, it may be useful to checkpoint and
171  * restart the execution. One motivation is the wall clock limit per task on clusters where jobs
172  * must fit within a given time span (e.g. 24 hours). The second motivation is a level of
173  * fault-tolerance, where you can back up the state of the simulation after a predefined period.
174  * To enable checkpoint-restart, the user is asked to specify a file to store the actual state of
175  * the simulation by <tt>\--checkpoint_file</tt> and the period in seconds after which the
176  * simulation will be interrupted by <tt>\--checkpoint_interval</tt>. When running on a cluster,
177  * please allocate enough time for the checkpoint procedure that can take a non-negligible amount
178  * of time (7 matrices have to be stored in the checkpoint file and all aggregated quantities are
179  * flushed into the output file). Please note, that the checkpoint file name and path is not checked
180  * at the beginning of the simulation, but at the time the code starts checkpointing. Thus make sure
181  * the file path was correctly specified ((otherwise you will not find out the simulation crashed
182  * until the first leg of the simulation finishes)). The rationale behind this is that to keep as
183  * high level of fault tolerance as possible, the checkpoint file should be touched even when really
184  * necessary.
185  *
186  * When controlling a multi-leg simulation by a script loop, the parameters of the code remains the
187  * same in all legs. The first leg of the simulation creates a checkpoint file while the last one
188  * deletes it. If the checkpoint file is not found the simulation starts from the beginning. In
189  * order to find out how many steps have been finished, please open the output file and read
190  * the variable <tt>t_index</tt> and compare it with <tt>Nt</tt> (e.g. by the h5dump command).
191  *
192  *
193  * The remaining flags specify the output quantities to be recorded during the simulation and
194  * stored on disk analogous to the sensor.record input. If the \c -p or <tt>\--p\_raw</tt> flags
195  * are set (these are equivalent), a time series of the acoustic pressure at the grid points
196  * specified by the sensor mask is recorded. If the <tt>\--p_rms</tt>, <tt>\--p_max</tt>,
197  * <tt>\--p_min</tt> flags are set, the root mean square and/or maximum and/or minimum values of
198  * the pressure at the grid points specified by the sensor mask are recorded. If the
199  * <tt>\--p_final</tt> flag is set, the values for the entire acoustic pressure field in the final
200  * time step of the simulation is stored (this will always include the PML, regardless of the
201  * setting for <tt> `PMLInside'</tt>).
202  * The flags <tt>\--p_max_all</tt> and <tt>\--p_min_all</tt> allow to calculate the maximum and
203  * minimum values over the entire acoustic pressure field, regardless on the shape of the sensor
204  * mask. Flags to record the acoustic particle velocity are defined in an analogous fashion. For
205  * proper calculation of acoustic intensity, the particle velocity has to be shifted onto the same
206  * grid as the acoustic pressure. This can be done by setting <tt>\--u_non_staggered_raw</tt> flag,
207  * that first shifts the particle velocity and then samples the grid points specified by the sensor
208  * mask. Since the shift operation requires additional FFTs, the impact on the simulation time may
209  * be significant.
210  *
211  * Any combination of <tt>p</tt> and <tt>u</tt> flags is admissible. If no output flag is set,
212  * a time-series for the acoustic pressure is recorded. If it is not necessary to collect the output
213  * quantities over the entire simulation, the starting time step when the collection begins can
214  * be specified using the -s parameter. Note, the index for the first time step is 1 (this follows
215  * the MATLAB indexing convention).
216  *
217  * The <tt>\--copy_sensor_mask</tt> will copy the sensor from the input file to the output one at
218  * the end of the simulation. This helps in post-processing and visualisation of the outputs.
219  *
220 \verbatim
221 ┌───────────────────────────────────────────────────────────────┐
222 │ kspaceFirstOrder3D-CUDA v1.1 │
223 ├───────────────────────────────────────────────────────────────┤
224 │ Usage │
225 ├───────────────────────────────────────────────────────────────┤
226 │ Mandatory parameters │
227 ├───────────────────────────────────────────────────────────────┤
228 │ -i <file_name> │ HDF5 input file │
229 │ -o <file_name> │ HDF5 output file │
230 ├───────────────────────────────┴───────────────────────────────┤
231 │ Optional parameters │
232 ├───────────────────────────────┬───────────────────────────────┤
233 │ -t <num_threads> │ Number of CPU threads │
234 │ │ (default = 4) │
235 │ -g <device_number> │ GPU device to run on │
236 │ │ (default = the first free) │
237 │ -r <interval_in_%> │ Progress print interval │
238 │ │ (default = 5%) │
239 │ -c <compression_level> │ Compression level <0,9> │
240 │ │ (default = 0) │
241 │ --benchmark <time_steps> │ Run only a specified number │
242 │ │ of time steps │
243 │ --verbose <level> │ Level of verbosity <0,2> │
244 │ │ 0 - basic, 1 - advanced, │
245 │ │ 2 - full │
246 │ │ (default = basic) │
247 │ -h, --help │ Print help │
248 │ --version │ Print version and build info │
249 ├───────────────────────────────┼───────────────────────────────┤
250 │ --checkpoint_file <file_name> │ HDF5 Checkpoint file │
251 │ --checkpoint_interval <sec> │ Checkpoint after a given │
252 │ │ number of seconds │
253 ├───────────────────────────────┴───────────────────────────────┤
254 │ Output flags │
255 ├───────────────────────────────┬───────────────────────────────┤
256 │ -p │ Store acoustic pressure │
257 │ │ (default output flag) │
258 │ │ (the same as --p_raw) │
259 │ --p_raw │ Store raw time series of p │
260 │ --p_rms │ Store rms of p │
261 │ --p_max │ Store max of p │
262 │ --p_min │ Store min of p │
263 │ --p_max_all │ Store max of p (whole domain) │
264 │ --p_min_all │ Store min of p (whole domain) │
265 │ --p_final │ Store final pressure field │
266 ├───────────────────────────────┼───────────────────────────────┤
267 │ -u │ Store ux, uy, uz │
268 │ │ (the same as --u_raw) │
269 │ --u_raw │ Store raw time series of │
270 │ │ ux, uy, uz │
271 │ --u_non_staggered_raw │ Store non-staggered raw time │
272 │ │ series of ux, uy, uz │
273 │ --u_rms │ Store rms of ux, uy, uz │
274 │ --u_max │ Store max of ux, uy, uz │
275 │ --u_min │ Store min of ux, uy, uz │
276 │ --u_max_all │ Store max of ux, uy, uz │
277 │ │ (whole domain) │
278 │ --u_min_all │ Store min of ux, uy, uz │
279 │ │ (whole domain) │
280 │ --u_final │ Store final acoustic velocity │
281 ├───────────────────────────────┼───────────────────────────────┤
282 │ -s <time_step> │ When data collection begins │
283 │ │ (default = 1) │
284 └───────────────────────────────┴───────────────────────────────┘
285 \endverbatim
286  *
287  *
288  *
289  *
290 * @section HDF HDF5 File Structure
291  *
292  * The CUDA/C++ code has been designed as a standalone application which is not dependent on MATLAB
293  * libraries or a MEX interface. This is of particular importance when using servers and
294  * supercomputers without MATLAB support. For this reason, simulation data must be transferred
295  * between the CUDA/C++ code and MATLAB using external input and output files. These files are
296  * stored using the Hierarchical Data Format HDF5 (http://www.hdfgroup.org/HDF5/). This is a data
297  * model, library, and file format for storing and managing data. It supports a variety of
298  * datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.
299  * The HDF5 technology suite includes tools and applications for managing, manipulating, viewing,
300  * and analysing data in the HDF5 format.
301  *
302  *
303  * Each HDF5 file is a container for storing a variety of scientific data and is composed of two
304  * primary types of objects: groups and datasets. An HDF5 group is a structure containing zero or
305  * more HDF5 objects, together with supporting metadata. A HDF5 group can be seen as a disk folder.
306  * An HDF5 dataset is a multidimensional array of data elements, together with supporting metadata.
307  * A HDF5 dataset can be seen as a disk file. Any HDF5 group or dataset may also have an associated
308  * attribute list. An HDF5 attribute is a user-defined HDF5 structure that provides extra
309  * information about an HDF5 object. More information can be obtained from the HDF5 documentation
310  * (http://www.hdfgroup.org/HDF5/doc/index.html).
311  *
312  *
313  * kspaceFirstOrder3D-CUDA v1.1 introduces a new version of the HDF5 input and output file format.
314  * The code is happy to work with both versions (1.0 and 1.1), however when working with an input
315  * file of version 1.0, some features are not supported, namely the cuboid sensor mask, and
316  * u_non_staggered_raw. When running from within the actual MATLAB K-Wave Toolbox, the files will
317  * always be generated in version 1.1
318  *
319  * The HDF5 input file for the CUDA/C++ simulation code contains a file header with brief
320  * description of the simulation stored in string attributes, and the root group `/' which stores
321  * all the simulation properties in the form of 3D datasets (a complete list of input datasets is
322  * given bellow).
323  * The HDF5 checkpoint file contains the same file header as the input file and the root group `/'
324  * with a few datasets keeping the actual simulation state.
325  * The HDF5 output file contains a file header with the simulation description as well as
326  * performance statistics, such as the simulation time and memory consumption, stored in string
327  * attributes.
328 
329  * The results of the simulation are stored in the root group `/' in the form of 3D datasets. If the
330  * linear sensor mask is used, all output quantities are stored as datasets in the root group. If
331  * the cuboid corners sensor mask is used, the sampled quantities form private groups containing
332  * datasets on per cuboid basis.
333  *
334  *
335  * \verbatim
336 ==============================================================================================================
337  Input File/Checkpoint File Header
338 =============================================================================================================
339 created_by Short description of the tool that created this file
340 creation_date Date when the file was created
341 file_description Short description of the content of the file (e.g. simulation name)
342 file_type Type of the file (input)
343 major_version Major version of the file definition (1)
344 minor_version Minor version of the file definition (1)
345 ==============================================================================================================
346  \endverbatim
347  *
348  * \verbatim
349 ==============================================================================================================
350  Output File Header
351 ==============================================================================================================
352 created_by Short description of the tool that created this file
353 creation_date Date when the file was created
354 file_description Short description of the content of the file (e.g. simulation name)
355 file_type Type of the file (output)
356 major_version Major version of the file definition (1)
357 minor_version Minor version of the file definition (1)
358 -------------------------------------------------------------------------------------------------------------
359 host_names List of hosts (computer names) the simulation was executed on
360 number_of_cpu_cores Number of CPU cores used for the simulation
361 data_loading_phase_execution_time Time taken to load data from the file
362 pre-processing_phase_execution_time Time taken to pre-process data
363 simulation_phase_execution_time Time taken to run the simulation
364 post-processing_phase_execution_time Time taken to complete the post-processing phase
365 total_execution_time Total execution time
366 peak_core_memory_in_use Peak memory required per core during the simulation
367 total_memory_in_use Total Peak memory in use
368 ==============================================================================================================
369  \endverbatim
370  *
371  *
372  *
373  * The input and checkpoint file stores all quantities as three dimensional datasets with dimension
374  * sizes designed by <tt>(Nx, Ny, Nz)</tt>. In order to support scalars and 1D and 2D arrays, the
375  * unused dimensions are set to 1. For example, scalar variables are stored with a dimension size of
376  * <tt>(1,1,1)</tt>, 1D vectors oriented in y-direction are stored with a dimension size of
377  * <tt>(1, Ny, 1)</tt>, and so on. If the dataset stores a complex variable, the real and imaginary
378  * parts are stored in an interleaved layout and the lowest used dimension size is doubled (i.e.,
379  * Nx for a 3D matrix, Ny for a 1D vector oriented in the y-direction). The datasets are physically
380  * stored in row-major order (in contrast to column-major order used by MATLAB) using either the
381  * <tt>`H5T_IEEE_F32LE'</tt> data type for floating point datasets or <tt>`H5T_STD_U64LE'</tt> for
382  * integer based datasets. All the datasets are store under the root group.
383  *
384  *
385  * The output file of version 1.0 could only store recorded quantities as 3D datasets under the
386  * root group. However, with version 1.1 and the new cuboid corner sensor mask, the sampled
387  * quantities may be laid out as 4D quantities stored under specific groups. The dimensions are
388  * always <tt>(Nx, Ny, Nz, Nt)</tt>, every sampled cuboid is stored as a distinct dataset and the
389  * datasets are grouped under a group named by the quantity stored. This makes the file
390  * clearly readable and easy to parse.
391  *
392  *
393  * In order to enable compression and more efficient data processing, big datasets are not stored as
394  * monolithic blocks but broken into chunks that may be compressed by the ZIP library and stored
395  * separately. The chunk size is defined as follows:
396  *
397  * \li <tt> (1M elements, 1, 1) </tt> in the case of 1D variables - index sensor mask (8MB blocks).
398  * \li <tt> (Nx, Ny, 1) </tt> in the case of 3D variables (one 2D slab).
399  * \li <tt> (Nx, Ny, Nz, 1) </tt> in the case of 4D variables (one time step).
400  * \li <tt> (N_sensor_points, 1, 1) </tt> in the case of the output time series (one time step of
401  * the simulation).
402  *
403  *
404  * All datasets have two attributes that specify the content of the dataset. The \c `data_type'
405  * attribute specifies the data type of the dataset. The admissible values are either \c `float' or
406  * \c `long'. The \c `domain_type' attribute specifies the domain of the dataset. The admissible
407  * values are either \c `real' for the real domain or \c `complex' for the complex domain. The
408  * C++ code reads these attributes and checks their values.
409  *
410  *
411  *
412  * \verbatim
413 ==============================================================================================================
414  Input File Datasets
415 ==============================================================================================================
416 Name Size Data type Domain Type Condition of Presence
417 ==============================================================================================================
418  1. Simulation Flags
419 --------------------------------------------------------------------------------------------------------------
420  ux_source_flag (1, 1, 1) long real
421  uy_source_flag (1, 1, 1) long real
422  uz_source_flag (1, 1, 1) long real
423  p_source_flag (1, 1, 1) long real
424  p0_source_flag (1, 1, 1) long real
425  transducer_source_flag (1, 1, 1) long real
426  nonuniform_grid_flag (1, 1, 1) long real must be set to 0
427  nonlinear_flag (1, 1, 1) long real
428  absorbing_flag (1, 1, 1) long real
429 --------------------------------------------------------------------------------------------------------------
430  2. Grid Properties
431 --------------------------------------------------------------------------------------------------------------
432  Nx (1, 1, 1) long real
433  Ny (1, 1, 1) long real
434  Nz (1, 1, 1) long real
435  Nt (1, 1, 1) long real
436  dt (1, 1, 1) float real
437  dx (1, 1, 1) float real
438  dy (1, 1, 1) float real
439  dz (1, 1, 1) float real
440 --------------------------------------------------------------------------------------------------------------
441  3 Medium Properties
442 --------------------------------------------------------------------------------------------------------------
443  3.1 Regular Medium Properties
444  rho0 (Nx, Ny, Nz) float real heterogenous
445  (1, 1, 1) float real homogenous
446  rho0_sgx (Nx, Ny, Nz) float real heterogenous
447  (1, 1, 1) float real homogenous
448  rho0_sgy (Nx, Ny, Nz) float real heterogenous
449  (1, 1, 1) float real homogenous
450  rho0_sgz (Nx, Ny, Nz) float real heterogenous
451  (1, 1, 1) float real homogenous
452  c0 (Nx, Ny, Nz) float real heterogenous
453  (1, 1, 1) float real homogenous
454  c_ref (1, 1, 1) float real
455 
456  3.2 Nonlinear Medium Properties (defined if (nonlinear_flag == 1))
457  BonA (Nx, Ny, Nz) float real heterogenous
458  (1, 1, 1) float real homogenous
459 
460  3.3 Absorbing Medium Properties (defined if (absorbing_flag == 1))
461  alpha_coef (Nx, Ny, Nz) float real heterogenous
462  (1, 1, 1) float real homogenous
463  alpha_power (1, 1, 1) float real
464 --------------------------------------------------------------------------------------------------------------
465  4. Sensor Variables
466 --------------------------------------------------------------------------------------------------------------
467  sensor_mask_type (1, 1, 1) long real file version 1.1 (0 = index, 1 = corners)
468  sensor_mask_index (Nsens, 1, 1) long real file version 1.0 always, File version 1.1 if sensor_mask_type == 0
469  sensor_mask_corners (Ncubes, 6, 1) long real file version 1.1, if sensor_mask_type == 1
470 --------------------------------------------------------------------------------------------------------------
471  5 Source Properties
472 --------------------------------------------------------------------------------------------------------------
473  5.1 Velocity Source Terms (defined if (ux_source_flag == 1 || uy_source_flag == 1 || uz_source_flag == 1))
474  u_source_mode (1, 1, 1) long real
475  u_source_many (1, 1, 1) long real
476  u_source_index (Nsrc, 1, 1) long real
477  ux_source_input (1, Nt_src, 1) float real u_source_many == 0
478  (Nsrc, Nt_src, 1) float real u_source_many == 1
479  uy_source_input (1, Nt_src, 1) float real u_source_many == 0
480  (Nsrc, Nt_src, 1) float real u_source_many == 1
481  uz_source_input (1, Nt_src, 1) float real u_source_many == 0
482  (Nt_src, Nsrc, 1) float real u_source_many == 1
483 
484  5.2 Pressure Source Terms (defined if (p_source_flag == 1))
485  p_source_mode (1, 1, 1) long real
486  p_source_many (1, 1, 1) long real
487  p_source_index (Nsrc, 1, 1) long real
488  p_source_input (Nsrc, Nt_src, 1) float real p_source_many == 1
489  (1, Nt_src, 1) float real p_source_many == 0
490 
491  5.3 Transducer Source Terms (defined if (transducer_source_flag == 1))
492  u_source_index (Nsrc, 1, 1) long real
493  transducer_source_input (Nt_src, 1, 1) float real
494  delay_mask (Nsrc, 1, 1) float real
495 
496  5.4 IVP Source Terms (defined if ( p0_source_flag ==1))
497  p0_source_input (Nx, Ny, Nz) float real
498 --------------------------------------------------------------------------------------------------------------
499  6. K-space and Shift Variables
500 --------------------------------------------------------------------------------------------------------------
501  ddx_k_shift_pos_r (Nx/2 + 1, 1, 1) float complex
502  ddx_k_shift_neg_r (Nx/2 + 1, 1, 1) float complex
503  ddy_k_shift_pos (1, Ny, 1) float complex
504  ddy_k_shift_neg (1, Ny, 1) float complex
505  ddz_k_shift_pos (1, 1, Nz) float complex
506  ddz_k_shift_neg (1, 1, Nz) float complex
507  x_shift_neg_r (Nx/2 + 1, 1, 1) float complex file version 1.1
508  y_shift_neg_r (1, Ny/2 + 1, 1) float complex file version 1.1
509  z_shift_neg_r (1, 1, Nz/2) float complex file version 1.1
510 --------------------------------------------------------------------------------------------------------------
511  7. PML Variables
512 --------------------------------------------------------------------------------------------------------------
513  pml_x_size (1, 1, 1) long real
514  pml_y_size (1, 1, 1) long real
515  pml_z_size (1, 1, 1) long real
516  pml_x_alpha (1, 1, 1) float real
517  pml_y_alpha (1, 1, 1) float real
518  pml_z_alpha (1, 1, 1) float real
519 
520  pml_x (Nx, 1, 1) float real
521  pml_x_sgx (Nx, 1, 1) float real
522  pml_y (1, Ny, 1) float real
523  pml_y_sgy (1, Ny, 1) float real
524  pml_z (1, 1, Nz) float real
525  pml_z_sgz (1, 1, Nz) float real
526 ==============================================================================================================
527  \endverbatim
528 
529 \verbatim
530 ==============================================================================================================
531  Checkpoint File Datasets
532 ==============================================================================================================
533 Name Size Data type Domain Type Condition of Presence
534 ==============================================================================================================
535  1. Grid Properties
536 --------------------------------------------------------------------------------------------------------------
537  Nx (1, 1, 1) long real
538  Ny (1, 1, 1) long real
539  Nz (1, 1, 1) long real
540  Nt (1, 1, 1) long real
541  t_index (1, 1, 1) long real
542 --------------------------------------------------------------------------------------------------------------
543  2. Simulation State
544 --------------------------------------------------------------------------------------------------------------
545  p (Nx, Ny, Nz) float real
546  ux_sgx (Nx, Ny, Nz) float real
547  uy_sgy (Nx, Ny, Nz) float real
548  uz_sgz (Nx, Ny, Nz) float real
549  rhox (Nx, Ny, Nz) float real
550  rhoy (Nx, Ny, Nz) float real
551  rhoz (Nx, Ny, Nz) float real
552 --------------------------------------------------------------------------------------------------------------
553 \endverbatim
554 
555 
556  \verbatim
557 ==============================================================================================================
558  Output File Datasets
559 ==============================================================================================================
560 Name Size Data type Domain Type Condition of Presence
561 ==============================================================================================================
562  1. Simulation Flags
563 --------------------------------------------------------------------------------------------------------------
564  ux_source_flag (1, 1, 1) long real
565  uy_source_flag (1, 1, 1) long real
566  uz_source_flag (1, 1, 1) long real
567  p_source_flag (1, 1, 1) long real
568  p0_source_flag (1, 1, 1) long real
569  transducer_source_flag (1, 1, 1) long real
570  nonuniform_grid_flag (1, 1, 1) long real
571  nonlinear_flag (1, 1, 1) long real
572  absorbing_flag (1, 1, 1) long real
573  u_source_mode (1, 1, 1) long real if u_source
574  u_source_many (1, 1, 1) long real if u_source
575  p_source_mode (1, 1, 1) long real if p_source
576  p_source_many (1, 1, 1) long real if p_source
577 --------------------------------------------------------------------------------------------------------------
578  2. Grid Properties
579 --------------------------------------------------------------------------------------------------------------
580  Nx (1, 1, 1) long real
581  Ny (1, 1, 1) long real
582  Nz (1, 1, 1) long real
583  Nt (1, 1, 1) long real
584  dt (1, 1, 1) float real
585  dx (1, 1, 1) float real
586  dy (1, 1, 1) float real
587  dz (1, 1, 1) float real
588 -------------------------------------------------------------------------------------------------------------
589  3. PML Variables
590 --------------------------------------------------------------------------------------------------------------
591  pml_x_size (1, 1, 1) long real
592  pml_y_size (1, 1, 1) long real
593  pml_z_size (1, 1, 1) long real
594  pml_x_alpha (1, 1, 1) float real
595  pml_y_alpha (1, 1, 1) float real
596  pml_z_alpha (1, 1, 1) float real
597 
598  pml_x (Nx, 1, 1) float real
599  pml_x_sgx (Nx, 1, 1) float real
600  pml_y (1, Ny, 1) float real
601  pml_y_sgy (1, Ny, 1) float real
602  pml_z (1, 1, Nz) float real
603  pml_z_sgz (1, 1, Nz) float real
604 --------------------------------------------------------------------------------------------------------------
605  4. Sensor Variables (present if --copy_sensor_mask)
606 --------------------------------------------------------------------------------------------------------------
607  sensor_mask_type (1, 1, 1) long real file version 1.1 and --copy_sensor_mask
608  sensor_mask_index (Nsens, 1, 1) long real file version 1.1 and if sensor_mask_type == 0
609  sensor_mask_corners (Ncubes, 6, 1) long real file version 1.1 and if sensor_mask_type == 1
610 --------------------------------------------------------------------------------------------------------------
611  5a. Simulation Results: if sensor_mask_type == 0 (index), or File version == 1.0
612 --------------------------------------------------------------------------------------------------------------
613  p (Nsens, Nt - s, 1) float real -p or --p_raw
614  p_rms (Nsens, 1, 1) float real --p_rms
615  p_max (Nsens, 1, 1) float real --p_max
616  p_min (Nsens, 1, 1) float real --p_min
617  p_max_all (Nx, Ny, Nz) float real --p_max_all
618  p_min_all (Nx, Ny, Nz) float real --p_min_all
619  p_final (Nx, Ny, Nz) float real --p_final
620 
621 
622  ux (Nsens, Nt - s, 1) float real -u or --u_raw
623  uy (Nsens, Nt - s, 1) float real -u or --u_raw
624  uz (Nsens, Nt - s, 1) float real -u or --u_raw
625 
626  ux_non_staggered (Nsens, Nt - s, 1) float real --u_non_staggered_raw (File version ==1.1)
627  uy_non_staggered (Nsens, Nt - s, 1) float real --u_non_staggered_raw (File version ==1.1)
628  uz_non_staggered (Nsens, Nt - s, 1) float real --u_non_staggered_raw (File version ==1.1)
629 
630  ux_rms (Nsens, 1, 1) float real --u_rms
631  uy_rms (Nsens, 1, 1) float real --u_rms
632  uz_rms (Nsens, 1, 1) float real --u_rms
633 
634  ux_max (Nsens, 1, 1) float real --u_max
635  uy_max (Nsens, 1, 1) float real --u_max
636  uz_max (Nsens, 1, 1) float real --u_max
637 
638  ux_min (Nsens, 1, 1) float real --u_min
639  uy_min (Nsens, 1, 1) float real --u_min
640  uz_min (Nsens, 1, 1) float real --u_min
641 
642  ux_max_all (Nx, Ny, Nz) float real --u_max_all
643  uy_max_all (Nx, Ny, Nz) float real --u_max_all
644  uz_max_all (Nx, Ny, Nz) float real --u_max_all
645 
646  ux_min_all (Nx, Ny, Nz) float real --u_min_all
647  uy_min_all (Nx, Ny, Nz) float real --u_min_all
648  uz_min_all (Nx, Ny, Nz) float real --u_min_all
649 
650  ux_final (Nx, Ny, Nz) float real --u_final
651  uy_final (Nx, Ny, Nz) float real --u_final
652  uz_final (Nx, Ny, Nz) float real --u_final
653 --------------------------------------------------------------------------------------------------------------
654  5b. Simulation Results: if sensor_mask_type == 1 (corners) and file version == 1.1
655 --------------------------------------------------------------------------------------------------------------
656  /p group of datasets, one per cuboid -p or --p_raw
657  /p/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
658  /p/2 (Cx, Cy, Cz, Nt-s) float real 2nd sampled cuboid, etc.
659 
660  /p_rms group of datasets, one per cuboid --p_rms
661  /p_rms/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
662 
663  /p_max group of datasets, one per cuboid --p_max
664  /p_max/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
665 
666  /p_min group of datasets, one per cuboid --p_min
667  /p_min/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
668 
669  p_max_all (Nx, Ny, Nz) float real --p_max_all
670  p_min_all (Nx, Ny, Nz) float real --p_min_all
671  p_final (Nx, Ny, Nz) float real --p_final
672 
673 
674  /ux group of datasets, one per cuboid -u or --u_raw
675  /ux/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
676  /uy group of datasets, one per cuboid -u or --u_raw
677  /uy/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
678  /uz group of datasets, one per cuboid -u or --u_raw
679  /uz/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
680 
681  /ux_non_staggered group of datasets, one per cuboid --u_non_staggered_raw
682  /ux_non_staggered/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
683  /uy_non_staggered group of datasets, one per cuboid --u_non_staggered_raw
684  /uy_non_staggered/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
685  /uz_non_staggered group of datasets, one per cuboid --u_non_staggered_raw
686  /uz_non_staggered/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
687 
688  /ux_rms group of datasets, one per cuboid --u_rms
689  /ux_rms/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
690  /uy_rms group of datasets, one per cuboid --u_rms
691  /uy_rms/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
692  /uz_rms group of datasets, one per cuboid --u_rms
693  /uy_rms/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
694 
695  /ux_max group of datasets, one per cuboid --u_max
696  /ux_max/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
697  /uy_max group of datasets, one per cuboid --u_max
698  /ux_max/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
699  /uz_max group of datasets, one per cuboid --u_max
700  /ux_max/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
701 
702  /ux_min group of datasets, one per cuboid --u_min
703  /ux_min/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
704  /uy_min group of datasets, one per cuboid --u_min
705  /ux_min/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
706  /uz_min group of datasets, one per cuboid --u_min
707  /ux_min/1 (Cx, Cy, Cz, Nt-s) float real 1st sampled cuboid
708 
709  ux_max_all (Nx, Ny, Nz) float real --u_max_all
710  uy_max_all (Nx, Ny, Nz) float real --u_max_all
711  uz_max_all (Nx, Ny, Nz) float real --u_max_all
712 
713  ux_min_all (Nx, Ny, Nz) float real --u_min_all
714  uy_min_all (Nx, Ny, Nz) float real --u_min_all
715  uz_min_all (Nx, Ny, Nz) float real --u_min_all
716 
717  ux_final (Nx, Ny, Nz) float real --u_final
718  uy_final (Nx, Ny, Nz) float real --u_final
719  uz_final (Nx, Ny, Nz) float real --u_final
720 ==============================================================================================================
721 \endverbatim
722  *
723  *
724 */
725 
726 #include <exception>
727 
728 #ifdef _OPENMP
729  #include <omp.h>
730 #endif
731 
733 #include <Logger/Logger.h>
734 
735 
736 using std::string;
737 
738 /**
739  * The main function of the kspaceFirstOrder3D-CUDA
740  * @param [in] argc
741  * @param [in] argv
742  * @return
743  */
744 int main(int argc, char** argv)
745 {
746  // Create k-Space solver
747  TKSpaceFirstOrder3DSolver KSpaceSolver;
748 
749  // print header
751  TLogger::Log(TLogger::BASIC, OUT_FMT_CODE_NAME, KSpaceSolver.GetCodeName().c_str());
753 
754  // Create parameters and parse command line
756 
757  //-------------- Init simulation ----------------//
758  try
759  {
760  // Initialise Parameters by parsing the command line and reading input file scalars
761  params.Init(argc, argv);
762  // Select GPU
763  params.SelectDevice();
764 
765  // When we know the GPU, we can print out the code version
766  if (params.IsVersion())
767  {
768  KSpaceSolver.PrintFullNameCodeAndLicense();
769  return EXIT_SUCCESS;
770  }
771  }
772  catch (const std::exception &e)
773  {
775  // must be repeated in case the GPU we want to printout the code version
776  // and all GPUs are busy
777  if (params.IsVersion())
778  {
779  KSpaceSolver.PrintFullNameCodeAndLicense();
780  }
781 
782  if (!params.IsVersion())
783  {
785  }
787  }
788 
789  // set number of threads and bind them to cores
790  #ifdef _OPENMP
791  omp_set_num_threads(params.GetNumberOfThreads());
792  #endif
793 
794  // Print simulation setup
795  params.PrintSimulatoinSetup();
796 
798 
799  //-------------- Allocate memory----------------//
800  try
801  {
802  KSpaceSolver.AllocateMemory();
803  }
804  catch (const std::bad_alloc& e)
805  {
809  }
810  catch (const std::exception& e)
811  {
816  13).c_str());
817  }
818 
819  //-------------- Load input data ----------------//
820  try
821  {
822  KSpaceSolver.LoadInputData();
823  }
824  catch (const std::ios::failure& e)
825  {
830  9).c_str());
831  }
832  catch (const std::exception& e)
833  {
836 
837  const string ErrorMessage = string(ERR_FMT_UNKNOWN_ERROR) + e.what();
840  13).c_str());
841  }
842 
845  KSpaceSolver.GetDataLoadTime());
846 
847 
848  if (params.Get_t_index() > 0)
849  {
853  params.Get_t_index());
854  }
855 
856 
857  // start computation
859  // exception are caught inside due to different log formats
860  KSpaceSolver.Compute();
861 
862 
863 
864  // summary
866 
869  KSpaceSolver.GetHostMemoryUsageInMB());
870 
873  KSpaceSolver.GetDeviceMemoryUsageInMB());
874 
876 
877 // Elapsed Time time
878 if (KSpaceSolver.GetCumulatedTotalTime() != KSpaceSolver.GetTotalTime())
879  {
882  KSpaceSolver.GetTotalTime());
883 
884  }
887  KSpaceSolver.GetCumulatedTotalTime());
888 
889 
890  // end of computation
892 
893  return EXIT_SUCCESS;
894 }// end of main
895 //--------------------------------------------------------------------------------------------------
size_t GetDeviceMemoryUsageInMB()
Get memory usage in MB on the GPU side.
TOutputMessage OUT_FMT_LEG_EXECUTION_TIME
Output message.
TOutputMessage OUT_FMT_TOTAL_EXECUTION_TIME
Output message.
double GetTotalTime() const
Get total simulation time.
static std::string WordWrapString(const std::string &inputString, const std::string &delimiters, const int indentation=0, const int lineSize=65)
Wrap the line based on logger conventions.
Definition: Logger.cpp:130
TOutputMessage OUT_FMT_SUMMARY_HEADER
Output message.
TErrorMessage ERR_FMT_PATH_DELIMITERS
delimiters for linux paths
Definition: ErrorMessages.h:52
size_t GetNumberOfThreads() const
Get number of threads.
Definition: Parameters.h:220
TOutputMessage OUT_FMT_HOST_MEMORY_USAGE
Output message.
int main(int argc, char **argv)
Definition: main.cpp:744
TOutputMessage OUT_FMT_FIRST_SEPARATOR
Output message - first separator.
static TParameters & GetInstance()
Get instance of the singleton class.
Definition: Parameters.cpp:70
bool IsVersion() const
Is –version specified at the command line?
Definition: Parameters.h:233
static void ErrorAndTerminate(const std::string &errorMessage)
Log an error and terminate the execution.
Definition: Logger.cpp:94
TErrorMessage ERR_FMT_OUT_OF_MEMORY
error message - out of memory
Definition: ErrorMessages.h:55
TOutputMessage OUT_FMT_RECOVER_FROM
Output message.
void PrintFullNameCodeAndLicense() const
Print the code name and license.
The header file containing the main class of the project responsible for the entire 3D fluid simulati...
const std::string GetCodeName() const
Get code name - release code version.
TOutputMessage OUT_FMT_END_OF_SIMULATION
Output message.
size_t Get_t_index() const
Get simulation time step.
Definition: Parameters.h:102
The header file containing a class responsible for printing out info and error messages (stdout...
TOutputMessage OUT_FMT_DEVICE_MEMORY_USAGE
Output message.
TOutputMessage OUT_FMT_ELAPSED_TIME
Output message.
void SelectDevice()
Select cuda device for execution.
Definition: Parameters.cpp:144
Class storing all parameters of the simulation.
Definition: Parameters.h:49
void PrintSimulatoinSetup()
Print the simulation setup (all parameters)
Definition: Parameters.cpp:167
double GetCumulatedTotalTime() const
Get total simulation time cumulated over all legs.
virtual void LoadInputData()
Load simulation data from the input file.
Basic (default) level of verbosity.
Definition: Logger.h:63
TOutputMessage OUT_FMT_LAST_SEPARATOR
Output message -last separator.
void Init(int argc, char **argv)
Parse command line and read scalar values to init the class.
Definition: Parameters.cpp:106
TOutputMessage OUT_FMT_CODE_NAME
Output message.
virtual void Compute()
Compute the k-space simulation.
virtual void AllocateMemory()
Memory allocation.
double GetDataLoadTime() const
Get data load time.
static void Log(const TLogLevel queryLevel, const std::string &format, Args...args)
Log desired activity for a given log level, version with string format.
Definition: Logger.h:85
TOutputMessage OUT_FMT_SEPARATOR
Output message - separator.
TOutputMessage OUT_FMT_INIT_HEADER
Output message.
TErrorMessage ERR_FMT_UNKNOWN_ERROR
Unknown error - unknown error.
Definition: ErrorMessages.h:59
Class responsible for running the k-space first order 3D method.
TOutputMessage OUT_FMT_FAILED
Output message - failed message.
size_t GetHostMemoryUsageInMB()
Get memory usage in MB on the CPU side.