My project involves interfacing an AMD Radeon HD 2400 graphics card with a STM32 Discovery evaluation board. Using AMD's publicly available documentation together with MMIO traces of the proprietary Catalyst driver I have managed to upload and execute code on the GPU's internal hardware video decoder which features a
**broken link removed** Xtensa 32-bit CPU.
I have written a demo application running on the Xtensa CPU which uses the hardware 3D engine to draw a Z-buffered, gouraud shaded spinning cube on the screen. AMD's Hierarchical-Z feature is enabled to improve performance. The 3D engine is configured using code taken from AMD's open-source Linux driver.
Photos of the project have been uploaded to
Flickr.
Many PCI Express devices feature a debug interface that is used during ASIC development to allow hardware registers to be modified should the PCIe link fail for some reason or another (e.g. due to incompatibility with the system's northbridge). The HD2400's debug interface is over I2C bus which can be overclocked to run at 1.5MHz when required. Thus it may be easily interfaced to practically any microcontroller for hacking
.
The hardware side involves modifying the HD2400 to enable and bring out the I2C debug interface and supply the necessary PCIe clock signal that is required for the board to operate correctly.
The firmware I have written comes in two parts; the STM32 side is responsible for setting up the GPU using the BIOS and uploading the Xtensa firmware to the GPU's video RAM, which the Xtensa CPU then executes. The GCC compiler is used together with "newlib" and "libopencm3" libraries to build the firmware for the ARM STM32 architecture.
The Xtensa side runs on the UVD hardware and again was built using GCC, "newlib" and the open-source FreeRTOS real-time operating system to provide preemptive multitasking.
I have written code to configure the various internal hardware blocks inside the GPU to enable output to a 8.9 inch 1024x600 netbook LCD directly using "LVDS"/FPDLink signalling from the DVI connector on the card. Normally it is not possible to interface a laptop LCD to a DVI port directly as the data encoding is different however as these GPUs are also used in laptop computers they may be switched into LVDS mode allowing the display to be attached directly.
Two graphics 'planes' are supported, the background which is the 3D rendered output and the alpha blended overlay which provides a text console using the VGA 16x8 font. These are blended together on-the-fly as the screen is refreshed using the video overlay block and occupy separate addresses in memory.
The GPU's highly complex 3D hardware features several distinct blocks which operate together to render the 3D scene into video RAM. These are configured using open-source code provided by AMD, including that taken from the
r600_demo sample code.
A vertex buffer located in video RAM holds the vertices of the cube to be rendered in 3D space. These are transformed using a 4x4 model-view-projection matrix implemented as a vertex shader using the DOT4 instruction to carry out the matrix multiplication. The matrix controls control translation, rotation, scaling and projection into 2D viewport coordinates and is calculated once per frame in the demo application.
Triangles are then rendered with the assistance of the pixel shader which (in this case) simply passes data through unmodified. A Z buffer check is performed to ensure that only visible surfaces are rendered. AMD's hardware supports Hierarchical-Z which reduces the memory bandwidth required performing Z-buffer tests.
Double buffering is used to eliminate flicker when drawing; one buffer is used for drawing and another is used for display and the two swap places after each frame has been rendered. At most one frame can be rendered per vertical sync event; this is to prevent 'tearing' of the display.
I will be releasing the source code within a week at most; work remaining involves fixing I2C bus routines in the STM32 code, full SD card support and cleanup of the hardware register database and associated scripts.
Support for AMD's newer 'CAICOS' (HD6350 IIRC) ASIC is also planned however I am particularly interested in AMD's next generation (HD7xxx) architecture, the moment public documentation is available I will cease work on HD2xxx support.