OpenCL OpenGL Memory Sharing

LWJGL is a library that is not limited to graphics only. LWJGL also contains utilities for running calculations in a system independent way with the aid of the Open Compute Library (OpenCL). OpenGL and OpenCL share more in common than just their names. Since OpenCL and OpenGL both support video cards as one of their target platforms, the libraries were developed to allow memory to be shared between the two libraries. This page aims to provide an example of how OpenGL memory objects can be shared with OpenCL and how OpenCL can be used to update OpenGL memory objects without the cost of streaming data from the system memory.

OpenCL/OpenGL Interoperability
OpenGL stores data in a number of object types, including textures and vertex buffers. OpenCL stores data in two major types of objects: buffers and image objects. It is important to understand that when OpenGL and OpenCL share memory, the memory is not copied between the OpenGL and OpenCL contexts. Shared memory objects share the same memory and modifications to the data will be reflected in both the shared OpenGL and OpenCL objects.

To allow OpenGL and OpenCL to share memory, a few important configurations must be made.

First, when the OpenCL context is initialized, it must be given a reference to a valid OpenGL context, which means that OpenGL must be initialized before OpenCL.

 // Create the LWJGL OpenGL context Display.create; // etc...

// Then the drawable context that OpenCL needs is created like so Drawable drawable = Display.getDrawable;

// The platform is a LWJGL CLPlatform object CLPlatform platform = CLPlatform.getPlatforms.get(0);

// The devices is a List of LJWGL CLDevice objects representing hardware or software contexts that OpenCL can use List devices = platform.getDevices(CL10.CL_DEVICE_TYPE_GPU);

// Create the OpenCL context using the patform, devices, and the OpenGL drawable CLContext context = CLContext.create(platform, gpuDevices, null, drawable, null);

Now OpenCL is initialized and ready to share memory objects with OpenGL. However, it is important to remember that shared memory objects must not be modified by OpenCL and OpenGL at the same time, if a memory objects is modified by both contexts at the same time the resulting application behavior is undefined.

Creating Shared OpenCL Memory Objects
OpenCL shared memory objects can be created from OpenGL buffers, render buffers, and Texture2D or Texture3D objects. Methods for creating shared OpenCL objects reside in the CL10GL and CL12GL classes. Shared objects must always follow this pattern
 * Create OpenGL buffer, texture, etc
 * Create OpenCL shared object from the OpenGL object
 * Synchronize access to the object so that OpenGL and OpenCL take turns using the object

For example, an OpenCL CLMem object could be created from an OpenGL vertex buffer like so:

 // Create an OpenGL buffer int glBufId = GL15.glGenBuffers; // Load the buffer with data using glBufferData; // etc...

// Create the shared OpenCL memory object from the OpenGL buffer CLMem glMem = CL10GL.clCreateFromGLBuffer(context, CL10.CL_MEM_READ_WRITE, glBufId, null);

Synchronize Access to Memory Objects
OpenGL and OpenCL can share data, but only one of the two contexts can access a shared memory object at one time. Typically this can be ensured by separating OpenGL and OpenCL code in an application and ensuring that the OpenCL code only runs before or after rendering an OpenGL frame, not while an OpenGL frame is being rendered.

First we need to create a CLCommandQueue which we can send OpenCL commands to, such as commands to upload or download data to or from a memory object, or run an OpenCL kernel.

 // Create an command queue using our OpenCL context and the first device in our list of devices CLCommandQueue queue = CL10.clCreateCommandQueue(context, devices.get(0), CL10.CL_QUEUE_PROFILING_ENABLE, null);

If you have worked with Java threads, you have probably used the 'synchronized' keyword, which ensures that a method or chunk of code cannot be accessed by more than one thread at the same time. Similarly, OpenCL and OpenGL require that memory objects be synchronized before being used. In the case of a shared OpenGL/OpenCL memory object, OpenCL must 'lock' a memory object before using it. This is accomplished by the 'clEnqueueAcquireGLObjects' method. A similar 'clEnqueueReleaseGLObjects' method allows OpenCL to release the 'lock' on an shared memory object.

Here is an example of 'locking' a shared OpenCL memory object, using the memory object as one of the parameters for a kernel, running the kernel, and then 'unlocking' the memory object.

 // Acquire the lock for the 'glMem' memory object int error = CL10GL.clEnqueueAcquireGLObjects(queue, glMem, null, null); // Remember to check for errors if(error != CL10.CL_SUCCESS) { Util.checkCLError(error); }

// Now execute an OpenCL command using the shared memory object, // such as uploading data to the memory object using 'CL10.clEnqueueWriteBuffer' // or running a kernel using 'CL10.clEnqueueNDRangeKernel' with the correct parameters // ...

// Release the lock on the 'glMem' memory object error = CL10GL.clEnqueueReleaseGLObjects(queue, glMem, null, null); if(error != CL10.CL_SUCCESS) { Util.checkCLError(error); }

// Remember to flush the command queue when you are done. // Flushing the queue ensures that all of the OpenCL commands // sent to the queue have completed before the program continues. CL10.clFinish(queue);

When you program is done, you should delete the shared memory objects by calling

 // Delete/release an OpenCL shared memory object CL10.clReleaseMemObject(glMem);

The exciting part about this memory sharing is that an OpenCL kernel could be created with routines to modify the data in a OpenGL vertex buffer. This could be used to create real time effects like waves on an ocean or a flag blowing in the wind. This is an exceptionally good use for OpenCL since almost no data needs to be sent over the hardware bus between the CPU and GPU since the OpenCL kernel runs directly on the video card and has access to the video card's memory and can carry out tasks that modify the video card memory, unlike OpenGL shaders which can only modify a vertex or pixel for the current frame.

This means that OpenCL kernels can modify vertex buffers more efficiently than traditional CPU streaming techniques because very little data has to be sent to the video card and because OpenCL can use the parallel computing performance of the video card to speed up simple tasks like recalculating the positions of vertices.

Further Resources
For examples of how to create an OpenCL kernel, see OpenCL_in_LWJGL.

In depth article about OpenGL and OpenCL interoperability.