<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>k-Wave User Forum &#187; Topic: Performance of runing  GPU simulations</title>
		<link>http://www.k-wave.org/forum/topic/performance-of-runing-gpu-simulations</link>
		<description>Support for the k-Wave MATLAB toolbox</description>
		<language>en-US</language>
		<pubDate>Wed, 13 May 2026 00:42:17 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.2</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.k-wave.org/forum/search.php</link>
		</textInput>
		<atom:link href="http://www.k-wave.org/forum/rss/topic/performance-of-runing-gpu-simulations" rel="self" type="application/rss+xml" />

		<item>
			<title>rainbowwhale on "Performance of runing  GPU simulations"</title>
			<link>http://www.k-wave.org/forum/topic/performance-of-runing-gpu-simulations#post-8029</link>
			<pubDate>Wed, 27 Jan 2021 02:11:19 +0000</pubDate>
			<dc:creator>rainbowwhale</dc:creator>
			<guid isPermaLink="false">8029@http://www.k-wave.org/forum/</guid>
			<description>&#60;p&#62;&#34;WARNING: Highest prime factors in each dimension are 97 97&#60;br /&#62;
Use dimension sizes with lower prime factors to improve speed&#34;&#60;br /&#62;
Your grid is not consist of small prime numbers.&#60;br /&#62;
4096*4096 grid could be faster than current configuration.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Mengjie_shi on "Performance of runing  GPU simulations"</title>
			<link>http://www.k-wave.org/forum/topic/performance-of-runing-gpu-simulations#post-8028</link>
			<pubDate>Tue, 26 Jan 2021 11:40:53 +0000</pubDate>
			<dc:creator>Mengjie_shi</dc:creator>
			<guid isPermaLink="false">8028@http://www.k-wave.org/forum/</guid>
			<description>&#60;p&#62;Hi everyone, &#60;/p&#62;
&#60;p&#62;I tried to run the 2D k-wave simulation of the line sensor using the GPU version. Here is the output. &#60;/p&#62;
&#60;p&#62;Resizing matrix...&#60;br /&#62;
  input grid size: 384 by 384 elements&#60;br /&#62;
  output grid size: 3840 by 3840 elements&#60;br /&#62;
  completed in 0.080413s&#60;br /&#62;
Running k-Wave simulation...&#60;br /&#62;
  start time: 26-Jan-2021 09:55:04&#60;br /&#62;
  reference sound speed: 1500m/s&#60;br /&#62;
  dt: 2ns, t_end: 36.202us, time steps: 18102&#60;br /&#62;
  input grid size: 3840 by 3840 grid points (38.4 by 38.4mm)&#60;br /&#62;
  maximum supported frequency: 75MHz&#60;br /&#62;
  expanding computational grid...&#60;br /&#62;
  computational grid size: 3880 by 3880 grid points&#60;br /&#62;
  WARNING: Highest prime factors in each dimension are 97  97&#60;br /&#62;
           Use dimension sizes with lower prime factors to improve speed&#60;br /&#62;
  precomputation completed in 2.6354s&#60;br /&#62;
  saving input files to disk...&#60;br /&#62;
  completed in 1.6728s&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124;                  kspaceFirstOrder-CUDA v1.3                   &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124; Reading simulation configuration:                        Done &#124;&#60;br /&#62;
&#124; Selected GPU device id:                                     0 &#124;&#60;br /&#62;
&#124; GPU device name:           GeForce RTX 2070 with Max-Q Design &#124;&#60;br /&#62;
&#124; Number of CPU threads:                                      1 &#124;&#60;br /&#62;
&#124; Processor name:      Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124;                      Simulation details                       &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124; Domain dimensions:                                3880 x 3880 &#124;&#60;br /&#62;
&#124; Medium type:                                               2D &#124;&#60;br /&#62;
&#124; Simulation time steps:                                  18102 &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124;                        Initialization                         &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124; Memory allocation:                                       Done &#124;&#60;br /&#62;
&#124; Data loading:                                            Done &#124;&#60;br /&#62;
&#124; Elapsed time:                                           0.24s &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124; FFT plans creation:                                      Done &#124;&#60;br /&#62;
&#124; Pre-processing phase:                                    Done &#124;&#60;br /&#62;
&#124; Elapsed time:                                           1.68s &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124;                    Computational resources                    &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124; Current host memory in use:                            1521MB &#124;&#60;br /&#62;
&#124; Current device memory in use:                          2917MB &#124;&#60;br /&#62;
&#124; Expected output file size:                                8MB &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124;                          Simulation                           &#124;&#60;br /&#62;
+----------+----------------+--------------+--------------------+&#60;br /&#62;
&#124; Progress &#124;  Elapsed time  &#124;  Time to go  &#124;  Est. finish time  &#124;&#60;br /&#62;
+----------+----------------+--------------+--------------------+&#60;br /&#62;
&#124;     0%   &#124;        0.086s  &#124;    778.300s  &#124;  26/01/21 10:08:10 &#124;&#60;br /&#62;
&#124;     5%   &#124;       58.048s  &#124;   1100.480s  &#124;  26/01/21 10:14:30 &#124;&#60;br /&#62;
&#124;    10%   &#124;      127.455s  &#124;   1145.829s  &#124;  26/01/21 10:16:24 &#124;&#60;br /&#62;
&#124;    15%   &#124;      200.249s  &#124;   1133.909s  &#124;  26/01/21 10:17:25 &#124;&#60;br /&#62;
&#124;    20%   &#124;      273.220s  &#124;   1092.277s  &#124;  26/01/21 10:17:57 &#124;&#60;br /&#62;
&#124;    25%   &#124;      346.869s  &#124;   1040.147s  &#124;  26/01/21 10:18:19 &#124;&#60;br /&#62;
&#124;    30%   &#124;      419.282s  &#124;    977.964s  &#124;  26/01/21 10:18:28 &#124;&#60;br /&#62;
&#124;    35%   &#124;      492.980s  &#124;    915.245s  &#124;  26/01/21 10:18:40 &#124;&#60;br /&#62;
&#124;    40%   &#124;      567.184s  &#124;    850.541s  &#124;  26/01/21 10:18:49 &#124;&#60;br /&#62;
&#124;    45%   &#124;      640.790s  &#124;    782.996s  &#124;  26/01/21 10:18:55 &#124;&#60;br /&#62;
&#124;    50%   &#124;      714.071s  &#124;    713.755s  &#124;  26/01/21 10:18:59 &#124;&#60;br /&#62;
&#124;    55%   &#124;      787.334s  &#124;    643.909s  &#124;  26/01/21 10:19:02 &#124;&#60;br /&#62;
&#124;    60%   &#124;      862.208s  &#124;    574.567s  &#124;  26/01/21 10:19:08 &#124;&#60;br /&#62;
&#124;    65%   &#124;      935.525s  &#124;    503.536s  &#124;  26/01/21 10:19:10 &#124;&#60;br /&#62;
&#124;    70%   &#124;     1009.156s  &#124;    432.313s  &#124;  26/01/21 10:19:13 &#124;&#60;br /&#62;
&#124;    75%   &#124;     1082.477s  &#124;    360.666s  &#124;  26/01/21 10:19:14 &#124;&#60;br /&#62;
&#124;    80%   &#124;     1155.983s  &#124;    288.856s  &#124;  26/01/21 10:19:16 &#124;&#60;br /&#62;
&#124;    85%   &#124;     1229.012s  &#124;    216.762s  &#124;  26/01/21 10:19:17 &#124;&#60;br /&#62;
&#124;    90%   &#124;     1301.022s  &#124;    144.452s  &#124;  26/01/21 10:19:17 &#124;&#60;br /&#62;
&#124;    95%   &#124;     1373.348s  &#124;     72.189s  &#124;  26/01/21 10:19:17 &#124;&#60;br /&#62;
+----------+----------------+--------------+--------------------+&#60;br /&#62;
&#124; Elapsed time:                                        1446.09s &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124; Sampled data post-processing:                            Done &#124;&#60;br /&#62;
&#124; Elapsed time:                                           0.01s &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124;                            Summary                            &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124; Peak host memory in use:                               1521MB &#124;&#60;br /&#62;
&#124; Peak device memory in use:                             2917MB &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124; Total execution time:                                1448.90s &#124;&#60;br /&#62;
+---------------------------------------------------------------+&#60;br /&#62;
&#124;                       End of computation                      &#124;&#60;br /&#62;
+---------------------------------------------------------------+ &#60;/p&#62;
&#60;p&#62;GPU speeded up the progress of the simulations but it seems it doesn't improve the speed significantly. I don't know whether the total time 1448.90s is the shortest time it can achieve regarding my small spatial grid step? If not, would you please tell me how I can improve it? By the way, This is my first time trying GPU. Any suggestions and learning materials are very welcome. Thank you.
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
