<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>k-Wave User Forum &#187; Topic: C++ FFT plans creation really slow</title>
		<link>http://www.k-wave.org/forum/topic/c-fft-plans-creation-really-slow</link>
		<description>Support for the k-Wave MATLAB toolbox</description>
		<language>en-US</language>
		<pubDate>Tue, 12 May 2026 22:25:12 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.2</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.k-wave.org/forum/search.php</link>
		</textInput>
		<atom:link href="http://www.k-wave.org/forum/rss/topic/c-fft-plans-creation-really-slow" rel="self" type="application/rss+xml" />

		<item>
			<title>zak morgan on "C++ FFT plans creation really slow"</title>
			<link>http://www.k-wave.org/forum/topic/c-fft-plans-creation-really-slow#post-9092</link>
			<pubDate>Thu, 09 May 2024 16:08:22 +0000</pubDate>
			<dc:creator>zak morgan</dc:creator>
			<guid isPermaLink="false">9092@http://www.k-wave.org/forum/</guid>
			<description>&#60;p&#62;Never mind, I was ecperimeting with PML and had it set to be outside the grid, thus growing the grid to a non-power of 2 size thus explaining the slow-down!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>zak morgan on "C++ FFT plans creation really slow"</title>
			<link>http://www.k-wave.org/forum/topic/c-fft-plans-creation-really-slow#post-9091</link>
			<pubDate>Thu, 09 May 2024 00:09:21 +0000</pubDate>
			<dc:creator>zak morgan</dc:creator>
			<guid isPermaLink="false">9091@http://www.k-wave.org/forum/</guid>
			<description>&#60;p&#62;I'm also seeing similar, when running on the CPU c++ code pre-pocessing is instant and then my simulation takes about 47 seconds, however when running the CUDA code, the simulation takes about 10 seconds, but 80 seconds is spent on pre-processing the FFT. It seems odd that preprocessing should be this much slower on the GPU than on the CPU?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Jiri Jaros on "C++ FFT plans creation really slow"</title>
			<link>http://www.k-wave.org/forum/topic/c-fft-plans-creation-really-slow#post-8794</link>
			<pubDate>Tue, 06 Jun 2023 12:58:24 +0000</pubDate>
			<dc:creator>Jiri Jaros</dc:creator>
			<guid isPermaLink="false">8794@http://www.k-wave.org/forum/</guid>
			<description>&#60;p&#62;is the GPU set into PM mode 1? &#60;/p&#62;
&#60;p&#62;sudo nvidia-smi -pm 1&#60;/p&#62;
&#60;p&#62;It is also an extremely small simulation, so I'm not sure what it does. It may also do something odd in the preporessing phase.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tnie on "C++ FFT plans creation really slow"</title>
			<link>http://www.k-wave.org/forum/topic/c-fft-plans-creation-really-slow#post-8560</link>
			<pubDate>Tue, 05 Jul 2022 22:54:06 +0000</pubDate>
			<dc:creator>tnie</dc:creator>
			<guid isPermaLink="false">8560@http://www.k-wave.org/forum/</guid>
			<description>&#60;p&#62;Simulating a trivial example, the &#60;code&#62;FFT plans creation&#60;/code&#62; phase takes 126.52 seconds while the simulation itself takes less than a second. That can't be right... The same simulation runs in less than a second total in MATLAB. &#60;/p&#62;
&#60;p&#62;I get the following output&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;+---------------------------------------------------------------+
&#124;                  kspaceFirstOrder-CUDA v1.3                   &#124;
+---------------------------------------------------------------+
&#124; Reading simulation configuration:                        Done &#124;
&#124; Selected GPU device id:                                     0 &#124;
&#124; GPU device name:                      NVIDIA GeForce RTX 3090 &#124;
&#124; Number of CPU threads:                                      1 &#124;
&#124; Processor name:     Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz &#124;
+---------------------------------------------------------------+
&#124;                      Simulation details                       &#124;
+---------------------------------------------------------------+
&#124; Domain dimensions:                                  128 x 108 &#124;
&#124; Medium type:                                               2D &#124;
&#124; Simulation time steps:                                    793 &#124;
+---------------------------------------------------------------+
&#124;                        Initialization                         &#124;
+---------------------------------------------------------------+
&#124; Memory allocation:                                       Done &#124;
&#124; Data loading:                                            Done &#124;
&#124; Elapsed time:                                           0.01s &#124;
+---------------------------------------------------------------+
&#124; FFT plans creation:                                      Done &#124;
&#124; Pre-processing phase:                                    Done &#124;
&#124; Elapsed time:                                         126.52s &#124;
+---------------------------------------------------------------+
&#124;                    Computational resources                    &#124;
+---------------------------------------------------------------+
&#124; Current host memory in use:                             408MB &#124;
&#124; Current device memory in use:                          1300MB &#124;
&#124; Expected output file size:                                0MB &#124;
+---------------------------------------------------------------+
&#124;                          Simulation                           &#124;
+----------+----------------+--------------+--------------------+
&#124; Progress &#124;  Elapsed time  &#124;  Time to go  &#124;  Est. finish time  &#124;
+----------+----------------+--------------+--------------------+
&#124;     0%   &#124;        0.001s  &#124;      0.396s  &#124;  05/07/22 16:52:22 &#124;
&#124;     5%   &#124;        0.014s  &#124;      0.257s  &#124;  05/07/22 16:52:22 &#124;
&#124;    10%   &#124;        0.029s  &#124;      0.255s  &#124;  05/07/22 16:52:22 &#124;
&#124;    15%   &#124;        0.044s  &#124;      0.247s  &#124;  05/07/22 16:52:22 &#124;
&#124;    20%   &#124;        0.058s  &#124;      0.229s  &#124;  05/07/22 16:52:22 &#124;
&#124;    25%   &#124;        0.071s  &#124;      0.211s  &#124;  05/07/22 16:52:22 &#124;
&#124;    30%   &#124;        0.086s  &#124;      0.199s  &#124;  05/07/22 16:52:22 &#124;
&#124;    35%   &#124;        0.100s  &#124;      0.184s  &#124;  05/07/22 16:52:22 &#124;
&#124;    40%   &#124;        0.114s  &#124;      0.169s  &#124;  05/07/22 16:52:22 &#124;
&#124;    45%   &#124;        0.130s  &#124;      0.158s  &#124;  05/07/22 16:52:22 &#124;
&#124;    50%   &#124;        0.143s  &#124;      0.142s  &#124;  05/07/22 16:52:22 &#124;
&#124;    55%   &#124;        0.159s  &#124;      0.129s  &#124;  05/07/22 16:52:22 &#124;
&#124;    60%   &#124;        0.173s  &#124;      0.115s  &#124;  05/07/22 16:52:22 &#124;
&#124;    65%   &#124;        0.185s  &#124;      0.099s  &#124;  05/07/22 16:52:22 &#124;
&#124;    70%   &#124;        0.203s  &#124;      0.086s  &#124;  05/07/22 16:52:22 &#124;
&#124;    75%   &#124;        0.217s  &#124;      0.072s  &#124;  05/07/22 16:52:22 &#124;
&#124;    80%   &#124;        0.231s  &#124;      0.057s  &#124;  05/07/22 16:52:22 &#124;
&#124;    85%   &#124;        0.245s  &#124;      0.042s  &#124;  05/07/22 16:52:22 &#124;
&#124;    90%   &#124;        0.259s  &#124;      0.028s  &#124;  05/07/22 16:52:22 &#124;
&#124;    95%   &#124;        0.273s  &#124;      0.014s  &#124;  05/07/22 16:52:22 &#124;
+----------+----------------+--------------+--------------------+
&#124; Elapsed time:                                           0.29s &#124;
+---------------------------------------------------------------+
&#124; Sampled data post-processing:                            Done &#124;
&#124; Elapsed time:                                           0.01s &#124;
+---------------------------------------------------------------+
&#124;                            Summary                            &#124;
+---------------------------------------------------------------+
&#124; Peak host memory in use:                                408MB &#124;
&#124; Peak device memory in use:                             1300MB &#124;
+---------------------------------------------------------------+
&#124; Total execution time:                                 128.07s &#124;
+---------------------------------------------------------------+
&#124;                       End of computation                      &#124;
+---------------------------------------------------------------+&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;Any idea how to make the C++ binary faster?
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
