diff options
author | Tor Lillqvist <tml@collabora.com> | 2015-09-10 21:58:28 +0300 |
---|---|---|
committer | Tor Lillqvist <tml@collabora.com> | 2015-09-15 18:43:33 +0300 |
commit | d6a5aac0f903b292db57bb4a613e603aa029c78b (patch) | |
tree | 59fc0174303dc12faecb98fa916486f119e963ca /include | |
parent | 75bde904d5b4f756037889f2b2ddee3e34dd81b8 (diff) |
Split formula group for OpenCL up into smaller bits when necessary
Will make it less demanding on low-end hardware, where the device
driver is unresponsive for too long when an OpenCL kernel handling lots
of data is executing. This makes Windows restart the driver which is
problematic.
I tried several approaches of splitting, both at higher levels in sc
and at the lowest level just before creating and executing the OpenCL
kernel(s). This seems to be the most minimal and local approach. Doing
it at the lower level would have required too much poking into our
obscure OpenCL code, like passing an offset parameter to every kernel.
Use a simple heuristic to find out whether to split. On the
problematic low-end devices, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT is
4, while for more performant devices it is 1 or 8.
Diffstat (limited to 'include')
-rw-r--r-- | include/clew/clew.h | 1 | ||||
-rw-r--r-- | include/opencl/openclwrapper.hxx | 1 |
2 files changed, 2 insertions, 0 deletions
diff --git a/include/clew/clew.h b/include/clew/clew.h index 94b6c29d9262..e5cfaf0836be 100644 --- a/include/clew/clew.h +++ b/include/clew/clew.h @@ -416,6 +416,7 @@ typedef struct _cl_image_format { // cl_device_info
#define CL_DEVICE_MAX_COMPUTE_UNITS 0x1002
+#define CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT 0x100A
#define CL_DEVICE_MAX_CLOCK_FREQUENCY 0x100C
#define CL_DEVICE_GLOBAL_MEM_SIZE 0x101F
#define CL_DEVICE_NAME 0x102B
diff --git a/include/opencl/openclwrapper.hxx b/include/opencl/openclwrapper.hxx index 04fe1e3725e7..fe677729460b 100644 --- a/include/opencl/openclwrapper.hxx +++ b/include/opencl/openclwrapper.hxx @@ -51,6 +51,7 @@ struct GPUEnv int mnCmdQueuePos; bool mnKhrFp64Flag; bool mnAmdFp64Flag; + cl_uint mnPreferredVectorWidthFloat; }; extern OPENCL_DLLPUBLIC GPUEnv gpuEnv; |