Using Vertex Buffer Objects (VBO)

What is a Vertex Buffer Object
A Vertex Buffer Object (VBO for short) is an OpenGL buffer object to buffer vertex data to VRAM so as to make rendering faster. This is similar to display lists in which the display list is cached in VRAM and executed from there, yielding a higher FPS.

Using VBOs, not only is the data already in the VRAM, but generally you can setup the VBO for rendering with a few OpenGL calls, then render one or several primitives with a single OpenGL call. This vastly decreases the necessary communication between CPU and GPU which is so often a large cause for slow programs.

This tutorial is meant to provide a simple introduction to using VBOs.

Vertex Buffer objects differ from display lists, they differ in the sense that display lists cannot be modified dynamically and once the display lists is compiled, you would have to generate a new display list, delete the old one so that the new data can be sent to the card. VBOs have 3 different flavours, static, dynamic and stream, the latter two are designed to handle dynamic data such as animation. However, various reports and benchmarks on the internet show that display lists are faster than static VBOs by a small margin, but the advantages of using VBOs definetly outweigh the disadvantages.

The Extension
VBOs used to be an ARB extension however, as of OpenGL 1.5 they are a core feature. LWJGL being the library it is, provides access to the extension on top of the core features. Since this tutorial in the legacy section, both code utilizing the core and extension will be shown. The official name for the extension is GL_ARB_vertex_buffer_object and the class that represents that extension is org.lwjgl.opengl.ARBVertexBufferObject. It is relatively easy to use and should give your game a solid FPS increase if that extension is supported by the vendor's card. This extension relies heavily on vertex arrays and you should be familiar with vertex arrays before reading this tutorial.

Generating a VBO id and buffering
It is relatively easy to generate any OpenGL buffer id, much like a Texture id. The command is glGenBuffers and LWJGL has a convenience form of it as well as the original.  public static int createVBOID { IntBuffer buffer = BufferUtils.createIntBuffer(1); GL15.glGenBuffers(buffer); return buffer.get(0); //Or alternatively you can simply use the convenience method: return GL15.glGenBuffers; //Which can only supply you with a single id. } ARB Extension format.  public static int createVBOID { if (GLContext.getCapabilities.GL_ARB_vertex_buffer_object) { IntBuffer buffer = BufferUtils.createIntBuffer(1); ARBVertexBufferObject.glGenBuffersARB(buffer); return buffer.get(0); } return 0; } That id is important, your going to need it when rendering (again, much like a texture id). The next step is the equivalent of initializing the buffer. You must specify the type of the buffer - GL_ARRAY_BUFFER for vertex data (position, normals, colour and tex coords), and GL_ELEMENT_ARRAY_BUFFER for index data. We tend to call index buffers IBOs. (you don't need to use IBOs but generally will but I will explain them later) - and send up the initial data this VBO will contain. If you don't want to send any data for any reason, you can use an empty java Buffer argument but make sure it is the right size. As with any OpenGL object, you must first bind it before you can do anything with it. glBindBuffer(int target, int id). The we use glBufferData(int target, (Float|Int|Byte etc)Buffer data, int usage). In LWJGL, you can use any of the java.nio buffers (provided of coarse that they are direct). Usage is a OpenGL enum which hints to the implementation how this VBO will be used. Enums are: GL_(STATIC | STREAM | DYNAMIC)_(DRAW | READ | COPY). From the OpenGL docs:

STATIC - The data store contents will be modified once and used many times.

STREAM - The data store contents will be modified once and used at most a few times.

DYNAMIC - The data store contents will be modified repeatedly and used many times.

DRAW - The data store contents are modified by the application, and used as the source for GL drawing and image specification commands.

READ - The data store contents are modified by reading data from the GL, and used to return that data when queried by the application.

COPY - The data store contents are modified by reading data from the GL, and used as the source for GL drawing and image specification commands.

If you are unsure about what these mean, then for now just use GL_STATIC_DRAW. Likelihood is you'll use this most of the time anyway. Bear in mind it is just a hint and you won't be stopped from doing other things, it just might be slightly slower.

 public static void vertexBufferData(int id, FloatBuffer buffer) { //Not restricted to FloatBuffer GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, id); //Bind buffer (also specifies type of buffer) GL15.glBufferData(GL15.GL_ARRAY_BUFFER, buffer, GL15.GL_STATIC_DRAW); //Send up the data and specify usage hint. } public static void indexBufferData(int id, IntBuffer buffer) { //Not restricted to IntBuffer GL15.glBindBuffer(GL15.GL_ELEMENT_ARRAY_BUFFER, id); GL15.glBufferData(GL15.GL_ELEMENT_ARRAY_BUFFER, buffer, GL15.GL_STATIC_DRAW); } ARB Extension format.  public static void bufferData(int id, FloatBuffer buffer) { if (GLContext.getCapabilities.GL_ARB_vertex_buffer_object) { ARBVertexBufferObject.glBindBufferARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, id); ARBVertexBufferObject.glBufferDataARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, buffer, ARBVertexBufferObject.GL_STATIC_DRAW_ARB); } }

Rendering
Now that you have generated and and stored the VBO id, you can render. There are a few steps you must take before saying render. Firstly you must enable any arrays you are using. The command is glEnableClientState(int param) to be called with GL_VERTEX_ARRAY, GL_COLOR_ARRAY etc. Alternatively if you are using shaders and attribute variables: glEnableVertexAttribArray(int index) to be called with the index of your attribute. Then as ever you must bind the VBO as already shown. Finally (and this is the thing you may have trouble with) you must assign a pointer which tells OpenGL where in the VBO to look for the data. gl(Vertex|Color|Normal)Pointer(int size, int type, int stride, long offset). Size is the number of pieces of data for each vertex. Position (x, y, z) would be 3, colour(r, g, b, a) would be 4 etc. NB For reasons unbeknownst to me, glNormalPointer does not take this paramter - you must use three values. Type is an OpenGL enum specifying the form the data takes (GL_FLOAT, GL_BYTE etc.). Stride is a difficult one. It is the offset in bytes between the first element of this data for the nth vertex and the first element of this data for the n+1th vertex. Example will be given below. Offset is the offset into this VBO of the first element of this data for the first vertex. Again example below. If you are using shader attribute variables the command you want is: glVertexAttribPointer(int index, int size, int type, boolean normalize, int stride, long offset) The extra parameters are: index - the index of the attribute and normalize - whether to normalize the data as it gets sent up to the gpu. With all that out of the way, you can render the vertices, but there are three different commands for this. The simplest is: glDrawArrays(int mode, int first, int count). This will send out count sequential vertices starting from first using the mode. (Mode is GL_TRIANGLES, GL_POLYGON etc.) This command is what you will use if you are not using IBOs. The next is: glDrawElements(int mode, int count, int type, long offset) This command draws vertices as specified by the elements in the bound IBO. Offset is the offset IN BYTES of the first index in the IBO. Type is the form of the indices (MUST BE GL_UNSIGNED_INT, GL_UNSIGNED_BYTE or GL_UNSIGNED_SHORT). Count is the number of indices to read and mode is just mode. The next one I won't explain as it is extremely similar to glDrawElements but with two extra parameters. It is glDrawRangeElements Here is the specification: glDrawRangeElements. Also see: glMultiDrawArrays and glMultiDrawElements glMultiDrawArrays glMultiDrawElements.  public static void render { GL11.glEnableClientState(GL11.GL_VERTEX_ARRAY); GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, vertexBufferID); GL11.glVertexPointer(3, GL11.GL_FLOAT, 0, 0); GL11.glEnableClientState(GL11.GL_COLOR_ARRAY); GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, colourBufferID); GL11.glColorPointer(4, GL11.GL_FLOAT, 0, 0); //If you are not using IBOs: GL11.glDrawArrays(GL11.GL_TRIANGLES, 0, numberIndices);

//If you are using IBOs: GL15.glBindBuffer(GL15.GL_ELEMENT_ARRAY_BUFFER, indexBufferID); GL11.glDrawElements(GL11.GL_TRIANGLES, numberIndices, GL11.GL_UNSIGNED_INT, 0);

//The alternate glDrawElements. GL12.glDrawRangeElements(GL11.GL_TRIANGLES, 0, maxIndex, numberIndices,					GL11.GL_UNSIGNED_INT, 0); } ARB Extension format. public static void render { GL11.glEnableClientState(GL11.GL_VERTEX_ARRAY); ARBVertexBufferObject.glBindBufferARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, vertexBufferID); GL11.glVertexPointer(3, GL11.GL_FLOAT, 0, 0); GL11.glEnableClientState(GL11.GL_COLOR_ARRAY); ARBVertexBufferObject.glBindBufferARB(ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, colourBufferID); GL11.glColorPointer(4, GL11.GL_FLOAT, 0, 0); ARBVertexBufferObject.glBindBufferARB(ARBVertexBufferObject.GL_ELEMENT_ARRAY_BUFFER_ARB, indexBufferID); GL12.glDrawRangeElements(GL11.GL_TRIANGLES, 0, maxIndex, indexBufferSize,					GL11.GL_UNSIGNED_INT, 0); }

Examples

Given a position VBO: { x1, y1, z1, x2, y2, z2 ... }. All values are floats (4 bytes per float). You have 3 pieces of data so size is 3. Type is clearly GL_FLOAT. Now stride is the byte distance between the start of x1 and the start of x2. Between these you have: x1 itself y1 and z1. So the stride is 3 floats = 3 * 4 = 12. Offset is the byte distance to the start of x1, which is of coarse 0 in this instance. So the necessary pointer call is:

glVertexPointer(3, GL11.GL_FLOAT, 12, 0);

For a colour VBO: { r1, g1, b1, a1, r2, g2, b2, a2 ... }. All values are unsigned shorts. (2 bytes per short). Stride: between r1 and r2 there is: r1, g1, b1 and a1. So stride is 4 shorts = 4 * 2 = 8. Offset: Clearly 0. (This only really comes into play when you use interleaved data)

glVertexPointer(4, GL11.GL_UNSIGNED_SHORT, 8, 0);

Experimenting
The above code is optimal for static meshes (hence the GL_STATIC_DRAW_ARB), so for models, you might want to investigate GL_DYNAMIC_DRAW_ARB or GL_STREAM_DRAW_ARB. There are also methods for reading back the data in the VBO.

Speedy VBO's
This rest of this tutorial will focus on how to get better performance out of your VBOs, please make sure you read and understand the above before continuing. For the sake of clarity, all the VBOs below are of FLOAT type and of STATIC_DRAW_ARB nature. Also a FLOAT is considered to be of 4 bytes.

Interleaving a VBO
Consider making your data interleaved. Interleaving is the process of putting several pieces of data (ie position AND colour AND normals) into the same VBO and using the strides and offsets to point to their location. With interleaved data you can get maximum VBO speed as interleaved data only needs 1 VBO id and thus can be managed more easily by the GL implementation.

So without interleaving you have one VBO: { x1, y1, z1, x2, y2, z2 ... }, one VBO { nx1, ny1, nz1, nx2, ny2, nz2 ... } and one VBO { r1, g1, b1, r2, g2, b2 ... }. You must bind the first, assign the pointer, bind the second, assign the pointer, bind the third, assign the pointer. Aside from the overhead of having 3 VBOs instead of one, you are calling 6 OpenGL functions instead of 4 at the start of every single primitive draw in every single frame. It adds up. With interleaving you have one VBO { x1, y1, z1, nx1, ny1, nz1, r1, g1, b1, x2, y2, z2, nx2, ny2, nz2 ... }. Trust me that this is hugely more efficient.

Pointer Examples with Interleaved Data

Given the interleaved VBO: { x1, y1, z1, nx1, ny1, nz1, r1, g1, b1, x2, y2, z2, nx2, ny2, nz2 ... } With all values as floats. Position - Stride is distance between x1 and x2. So there is x1, y1, z1, nx1, ny1, nz1, r1, g1, b1 and a1. That is ( 3 + 3 + 4 ) floats = ( 3 + 3 + 4) * 4 = 10 * 4 = 40. Offset = 0 clearly.

glVertexPointer(3, GL11.GL_FLOAT, 40, 0);

Normals - Stride is the same as before, which you will find for any interleaved buffer ( between nx1 and nx2 there are 3 normals, 4 colours and 3 positions = 10 * 4 = 40). Offset is the distance to nx1. So you have x1, y1 and z1 = 3 floats = 3 * 4 = 12.

glNormalPointer(GL11.GL_FLOAT, 40, 12); //Remember there is no size parameter.

Colours - Stride as before. Offset is distance to r1. There is x1, y1, z1, nx1, ny1, nz1 = 6 floats = 6 * 4 = 24.

glColorPointer(4, GL11.GL_FLOAT, 40, 24);

Now lets get really complicated.

Interleaved VBO: { x1, y1, z1, nx1, ny1, nz1, r1, g1, b1, x2, y2, z2, nx2, ny2, nz2 ... }. Looks the same BUT positions are doubles (8 bytes per double) normals are floats (4 bytes) and colours are shorts (2 bytes).

Position: Stride is 3 positions, 3 normals and 4 colours which gives you: ( 3 * 8 ) + ( 3 * 4 ) + ( 3 * 2 ) = 24 + 12 + 6 = 42. Offset clearly 0 as ever.

glVertexPointer(3, GL11.GL_DOUBLE, 42, 0);

Normals: Stride as before. Offset is 3 positions = (3 * 8) = 24.

glNormalPointer(GL11.GL_FLOAT, 42, 24); //No size parameter

Colours: Stride as before. Offset is 3 positions and 3 normals = ( 3 * 8 ) + ( 3 * 4 ) = 24 + 12 = 36.

glColorPointer(4, GL11.GL_UNSIGNED_SHORT, 42, 36);

OK, that wasn't so hard then. This is a point where I often make errors however, so I always comment the above workings next to my pointer calls. If you have a rendering bug that you don't understand, look at your pointer calls. Get some paper (yes I mean paper) and draw out what I have drawn above, then rework it out without looking at the results you got last time. Then compare.

GL12.GL_MAX_ELEMENTS_VERTICES
GL12.GL_MAX_ELEMENTS_VERTICES is a magical number in which the number of vertices in your VBO is equal to that number, you'll get speedy VBOs. Warning: Speculation follows. This magical number, I assume, is the size of the FIFO buffers, anything above that will be stored in either vram, or ram causing the slowdowns causing unnecessary fetches, which only stall the pipeline. The magical number on most NVIDIA is 4096. So if your doing some terrain stuff, cut it up into 64×64 pieces. Speedy VBOs.

There is also GL12.GL_MAX_ELEMENTS_INDICES, i'll leave that to you.

Conclusion
There is still the matter of GL15.glMapBuffer and GL15.glBufferSubData that can speed VBO updating (specially on specified regions within the VBO), but for now here are the specs: glMapBuffer glBufferSubData

Enjoy your Speedy VBOs