Accelerated blitting in SDL and framebuffer access
Posted: Tue Nov 28, 2023 9:33 pm
Hi there!
I'm playing with libSDL for the Wii/GC with the goal of implementing accelerated blitting. I'm making some changes to the pipeline (such as removing the flipping thread and using a pair of XFB, like GRRLIB does) while reading the GX documentation, and I got to a point where I have a choice to make, and I'm not sure which is the best one.
The generic drawing pipeline goes as follow:
3D drawings (Texture mappings) ---GX_DrawDone--> EFB ---GX_CopyDisp--> XFB ---Video_Flush--> Screen
Now, the question is, which framebuffer should the client be given access to when it calls SDL_LockSurface and the accesses the surface pixels directly. Currently, libSDL uses a texture as big as the screen, where all drawing happens unaccelerated, and once the frame is ready the texture is mapped by the 3D engine. If we introduce accelerated blitting, this will also happen in the "3D drawings" stage of the pipeline, which means that the texture which we use as our screen texture will not "see" all the blits until we call GX_DrawDone. So, I see two options:
1) Render the 3D scene to a texture: this is done by adding an additional step into the pipeline above, where we copy the EFB data into main memory, at which point we can use this memory as a texture again.
2) Just provide the EFB buffer as surface->pixels: when the client calls SDL_LockSurface we call GX_DrawDone to ensure that all the blitting operations done so far are rendered into the EFB, at which point we can expose the EFB directly: while the manual says that framebuffer access has to be done via the GX_PeekARGB and GX_PokeARGB functions, the libogc implementation of these functions just operates on main memory without additional function calls, and indeed I verify that the pixels on EFB can be easily accessed with the usual arithmetics (one just has to set the stride to 1024 bytes).
While option 1 is more generic and is indeed the only solution that would work in presence of real 3D scenes, it seems to me that option 2 does the job well for 2D operations, and should be faster (less copying). So far in my tests I haven't found an issue with it, but it's also possible that I'm missing something.
Do we have any GX experts here, which would advise me on which route to take?
I'm playing with libSDL for the Wii/GC with the goal of implementing accelerated blitting. I'm making some changes to the pipeline (such as removing the flipping thread and using a pair of XFB, like GRRLIB does) while reading the GX documentation, and I got to a point where I have a choice to make, and I'm not sure which is the best one.
The generic drawing pipeline goes as follow:
3D drawings (Texture mappings) ---GX_DrawDone--> EFB ---GX_CopyDisp--> XFB ---Video_Flush--> Screen
Now, the question is, which framebuffer should the client be given access to when it calls SDL_LockSurface and the accesses the surface pixels directly. Currently, libSDL uses a texture as big as the screen, where all drawing happens unaccelerated, and once the frame is ready the texture is mapped by the 3D engine. If we introduce accelerated blitting, this will also happen in the "3D drawings" stage of the pipeline, which means that the texture which we use as our screen texture will not "see" all the blits until we call GX_DrawDone. So, I see two options:
1) Render the 3D scene to a texture: this is done by adding an additional step into the pipeline above, where we copy the EFB data into main memory, at which point we can use this memory as a texture again.
2) Just provide the EFB buffer as surface->pixels: when the client calls SDL_LockSurface we call GX_DrawDone to ensure that all the blitting operations done so far are rendered into the EFB, at which point we can expose the EFB directly: while the manual says that framebuffer access has to be done via the GX_PeekARGB and GX_PokeARGB functions, the libogc implementation of these functions just operates on main memory without additional function calls, and indeed I verify that the pixels on EFB can be easily accessed with the usual arithmetics (one just has to set the stride to 1024 bytes).
While option 1 is more generic and is indeed the only solution that would work in presence of real 3D scenes, it seems to me that option 2 does the job well for 2D operations, and should be faster (less copying). So far in my tests I haven't found an issue with it, but it's also possible that I'm missing something.
Do we have any GX experts here, which would advise me on which route to take?