One of the more controversial aspects of AMD’s GPU design in recent years has been the focus on math processing over texture address and filtering capabilities. Many enthusiasts, for example, felt that a part like the Radeon 2900 XT simply lacked the texturing rates to match its shading performance, and as such was an imbalanced part often bottlenecked in its potential performance by the small number of texture units available. This design imbalance also carried over to the RV670. AMD’s engineers, however, obviously came to the same conclusion, because as noted above the RV770 now sports 4 texture units per SIMD array, giving the chip a total of 40 texture units. This is an increase over the 16 units found in the R600 and RV670 chips, and AMD also claims that the units themselves have been redesigned for improved performance.
Another sore point with previous AMD DX10 GPUs has been the performance hit occurred when using anti-aliasing and the company has also worked to address this weakness. The number of ROPs, 16, has remained the same as R600 and RV670, but the depth and stencil operations have been doubled per ROP per clock cycle, which effectively doubles the theoretical fill rate of the new chip with 2x or 4x AA enabled (16 pixels per clock compared to 8 for the RV670). Stepping up to 8x AA will drop the pixel output down to 8 per clock, though this again remains double the fill rate of the R600 or RV670 (4 pixels at 8x AA for the older chips). These improvements to the ROPs should go a long way to evening out the anti-aliasing performance gap between competing AMD and NVIDIA GPUs.
The Radeon HD 4800 series also continues AMD’s VPU efforts, accelerating video content using VC-1, H.264, and MPEG-2. The RV770 also includes DirectX 10.1 support, and even NVIDIA’s new GTX 280/260 GPUs cannot claim full 10.1 compliance, though as we discussed recently the developer saturation for this minor update to SM4.0 remains to be seen. The Assassin’s Creed PC port, however, included DX10.1 support out of the box (it was removed in a subsequent patch) and saw fantastic anti-aliasing performance on the Radeon HD 3870, so we would certainly like to see a stronger uptake among game developers.
AMD has sent us a pair of Diamond’s Radeon HD 4850s, which we tested singly and in CrossFire mode against the RV670’s Radeon HD 3870 to allow us to analyze how the above detailed improvements in GPU design have played out, performance-wise, in the applications SimHQ currently uses in our graphics testing. We also decided to include an XFX XXX edition GeForce 8800 GTX, which is a factory overclocked board, since that is the only GeForce 8/9 product available to us at the time of testing.
The core specifications of the tested boards are listed below.
ATI HD 3870
|
ATI HD 4850
|
NVIDIA 8800 GTX
|
|
Core speed |
775MHz
|
625MHz
|
630MHz
|
Shader Units |
320
|
800
|
128
|
Shader Units Speed |
775MHz
|
625MHz
|
1.35GHz
|
Onboard Memory |
512MB
|
512MB
|
768MB
|
Memory Speed |
2.25GHz
|
2.0GHz
|
2.0GHz
|
Memory Interface |
256-bit
|
256-bit
|
384-bit
|
Memory Bandwidth |
72GB/sec
|
64GB/sec
|
96GB/sec
|
Texture Units |
16
|
40
|
32
|
ROPs (render back end) |
16
|
16
|
24
|
Price |
~$165
|
~$200
|
~$370
|
The Radeon HD 4850 is a single-slot board like the previous generation’s 3850 and also requires only a single 6-pin power connector. The Radeon HD 3870 and GeForce 8800 GTX are both dual-slot boards, and unlike the other two the 8800 requires two 6-pin power plugs. The Radeon HD 4850 itself is 9.25” long, a .25” increase over the board length of the 3850 and 3870, and the GeForce board is 10.5” long. While not included in this article’s testing, the HD 4870 is clocked at 750 MHz and will ship with 512MB of 3.6 GHz GDDR5 memory, but is otherwise identical to the 4850 in terms of its overall design and its number of functional units.
Worth noting is that a direct comparison between the number of stream processors in the competing architectures shown above isn’t the best approach to understanding the different shader prowess each design offers. While even the Radeon HD 3870 boasts a far greater shader unit count than the 8800 GTX’s 128 units, these units are not identical in the operations they can perform per clock cycle, and as shown above NVIDIA has implemented a clock domain for their shader units that allows them to run at a much higher frequency than the rest of the chip. This design has also been carried over to NVIDIA’s new GTX 280/260 GPUs. We would also like to stress that the GeForce 8800 GTX is included only for comparison purposes against the Radeon HD 4850s in CrossFire mode. We’re certainly not suggesting that our readers compare a graphics board like an overclocked GTX that still sells for over $350 at the time of writing to a $200 part; yet since it’s the only NVIDIA board available to us at the time of testing we thought it would provide an interesting data point against $400 worth of HD 4850s in CrossFire mode.