ATI Radeon X1900 XTX

by John Reynolds

 

Introduction

ATi TechnologiesThe graphics market has traditionally seen product refreshes occur at a fairly static pace. A new architecture upon which a generation of parts will be based is introduced, generally for the spring or fall OEM cycle, with a new part that is derivative of the existing architecture hitting the market just in time for the following refresh. This schedule, however, is an ideal market approach that can’t always be adhered to or met perfectly, and in ATI’s case was completely blown out of the water last year with the oft-mentioned delay of their first Shader Model 3.0 graphics chip, the R520. Due last summer, the R520 and its Radeon 1000 family of parts weren’t released until very late in the year, which forced ATI to considerably shorten the longevity of certain parts as the company’s high-end offerings. This decision was made since a normal product lifespan for the Radeon 1800s would’ve pushed the subsequent refresh uncomfortably close to the expected release of Windows Vista. And with its inclusion of DX10, Vista represents a significant inflection point for the graphics industry that no company would willingly choose to miss. Thus a few months following the release of ATI’s Radeon 1000 family ATI introduced its refresh chip, R580, and its board lineup, the Radeon 1900s.

Radeon X1900s – Tripping the ALUs Fantastic

SimHQ’s review of the Radeon X1800 XT offered a somewhat in-depth look at the ultra threaded design of the new architecture. However, we won’t discuss again the details of the architecture here, choosing instead to focus on what changes or improvements the new Radeon 1900s hold. But first, a brief glimpse of ATI’s new board lineup is in order. The 1900s come in three flavors of reference specification, and all being based on the R580 chip. The Radeon X1800 XT is also listed below to contrast the clock speed and architecture changes in the new refresh parts.

X1800 XT
X1900 AIW
X1900 XT
X1900 XTX
Core speed
625 MHz
500 MHz
625 MHz
650 MHz
Memory speed
750 MHz
480 MHz
725 MHz
775 MHz
Onboard RAM
512 MB
256 MB
512 MB
512 MB
Memory bandwidth
48 GB/sec
30.7 GB/sec
46.4 GB/sec
49.6 GB/sec
Pixel Shaders
16
48
48
48
Vertex Shaders
8
8
8
8
Texture units
16
16
16
16
ROP Units
16
16
16
16
Transistors
321m
384m
384m
384m

Looking at the above table, we see a strong resemblance to the X1800, with both chips including eight vertex cores and 16 texture address and ROP units. The 1900s also retain the major architectural design features of the Radeon 1000 family, with the dispatch processor, ring bus memory controller, and decoupled texture address units all present. The primary architectural difference between the 1800 and 1900s, however, is in the number of pixel shader cores, a change that will be discussed at greater length below. Worth noting is that the 1900 XT has an identical core speed of 625 MHz to that of the 1800 XT, yet ships with a slight reduction in memory speed and bandwidth. The 1900 XT and XTX boards themselves are differentiated by minor clock speed variances of the graphics chip and onboard memory, with the XTX replacing the PE (platinum edition) of previous generations. This SKU name had become somewhat tarnished with an interpretation of Phantom Edition due to poor availability of these parts in the past, which may have prompted ATI to rebrand it with a new designation. The new XTX also initially hit the market priced substantially higher than the XT board. And while it is not listed above, there is also a Radeon 1900 Crossfire ‘master’ board that ships with core and memory speeds identical to those of the regular 1900 XT.

ATI’s decision to triple the number of pixel shader cores could very well be thought of as the company’s expectation of the requirements to competitively render future game engines. The pixel shader pipelines or cores of the 1900s, however, remain identical to the rest of the Radeon 1000 family, each consisting of two ALUs (arithmetic logic unit) and a branch execution unit. Each ALU is itself comprised of a vector and scalar unit, though these units are not identical in their capabilities between ALUs, a difference in functionality that has resulted in one ALU being generally regarded as a ‘mini’ unit. And the branch execution unit is capable of handling flow control instructions for improved branching performance with complex shaders that boast longer instruction lengths.

So why would ATI decide to triple the number of shader pipelines with the R580, a change that has no impact on the chip’s overall raster capabilities and yet increased its transistor count by almost 20%? The means by which the PC hardware market has traditionally measured the theoretical performance of graphics chips — fill rate and texturing, both derived by counting pixel pipelines and texture units indexed with clock speed — is becoming increasingly antiquated. Modern graphics boards have become quite capable of ‘filling’ a display device’s resolution with texture-filtered pixels, leaving the shift in graphics performance measurement to that of analyzing the arithmetic processing capabilities of a part. Pixel shaders were introduced to the market with the release of DX8 hardware years ago, enabling developers to code mathematical operations that began replacing traditional texturing in the work required to output finished, rendered pixels. And as subsequent API and hardware generations have been released these operations have increased in their capabilities and flexibility. Modern, more graphically-advanced titles, are suggested to have a math operation to texture instruction ratio of 5:1 on an average scene within the game. And this ratio is widely expected to grow as developers continue increasing the length and complexity of shader instructions with the arrival of more powerful hardware.

The ATI X1900 Structure

The ATI X1900 Structure – Click here for an enlarged image.

While not a major architectural change from the 1800s, another area of improvement in the 1900s is that the texture address units now support a feature known as Fetch4. This functionality was present in the X1300 and X1600s, though lacking in the 1800s. Texture address units are generally designed to read four components consisting of color data (red, green, blue, and alpha) at one location within a texture. Yet if the texture at hand has only one component the unit is wasting a large portion of its theoretical performance by reading only a single component of data for that clock cycle. Fetch4 allows the improved texture address units to potentially avoid this situation by enabling them to read a single-channel piece of data for four texture locations at a time. Therefore areas of a rendered scene not comprised of color data, such as shadows, could see a marked performance increase. Fetch4, however, requires developer support, though its implementation purportedly rather simple to include.

One last architectural change to R580 ATI made that deserves a quick mention is that of the size of the hierarchical Z-buffer cache. The hierarchical Z feature has existed in modern GPUs for years and allows occluded pixels to be detected and removed very early in the raster pipeline, thus saving the graphics chip from the work of rendering hidden surfaces. To work efficiently, however, the feature requires an on-chip cache and ATI has increased the 1900’s Z-buffer cache by 50% over that found in the 1800s; since the space requirement within the cache grows with higher resolutions, this cache increase was undoubtedly included to help performance at higher resolutions.

The Radeon X1900 XTX had a MSRP of $650 at launch, $100 more than the XT, though as of late April its real-world price has trickled down much closer to that of the XT. Worth noting for the overclocking crowd is that despite the relatively minor clock speed differences between the XT and XTX boards, the latter ships with Samsung’s 1.1ns memory modules rather than the 1.2ns memory found on the XT boards; whether or not this justifies the added cost of the XTX is up to the individual. The X1800 XT can now be found in the mid-$300 range, so we’ll see whether or not the X1900 XTX merits the 50% cost increase.

Go To Page 2