Split formula group for OpenCL up into smaller bits when necessary

Will make it less demanding on low-end hardware, where the device driver is unresponsive for too long when an OpenCL kernel handling lots of data is executing. This makes Windows restart the driver which is problematic. I tried several approaches of splitting, both at higher levels in sc and at the lowest level just before creating and executing the OpenCL kernel(s). This seems to be the most minimal and local approach. Doing it at the lower level would have required too much poking into our obscure OpenCL code, like passing an offset parameter to every kernel. Use a simple heuristic to find out whether to split. On the problematic low-end devices, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT is 4, while for more performant devices it is 1 or 8.
author: Tor Lillqvist <tml@collabora.com> 2015-09-10 21:58:28 +0300
committer: Tor Lillqvist <tml@collabora.com> 2015-09-15 18:43:33 +0300
commit: d6a5aac0f903b292db57bb4a613e603aa029c78b (patch)
tree: 59fc0174303dc12faecb98fa916486f119e963ca /include/opencl
parent: 75bde904d5b4f756037889f2b2ddee3e34dd81b8 (diff)
1 files changed, 1 insertions, 0 deletions
diff --git a/include/opencl/openclwrapper.hxx b/include/opencl/openclwrapper.hxx
index 04fe1e3725e7..fe677729460b 100644
--- a/include/opencl/openclwrapper.hxx
+++ b/include/opencl/openclwrapper.hxx
@@ -51,6 +51,7 @@ struct GPUEnv
     int mnCmdQueuePos;
     bool mnKhrFp64Flag;
     bool mnAmdFp64Flag;
+    cl_uint mnPreferredVectorWidthFloat;
 };
 
 extern OPENCL_DLLPUBLIC GPUEnv gpuEnv;
author	Tor Lillqvist <tml@collabora.com>	2015-09-10 21:58:28 +0300
committer	Tor Lillqvist <tml@collabora.com>	2015-09-15 18:43:33 +0300
commit	d6a5aac0f903b292db57bb4a613e603aa029c78b (patch)
tree	59fc0174303dc12faecb98fa916486f119e963ca /include/opencl
parent	75bde904d5b4f756037889f2b2ddee3e34dd81b8 (diff)