|
Review: Radeon X1600 XT and Radeon X1800
XT
Back
to Page 1
Shader effects are all the rage these
days, and there's little doubt that the bottleneck in graphically-intensive
titles is more and more the amount of math graphics chips
must perform to render these effects. Rendering more sophisticated
effects requires more shaders with longer instructions, thus
making efficient use of their available shader engines is
where ATI's new architecture hopes to shine. One way to approach
this problem would be to increase the number of ALUs per pixel
shader pipeline, yet ATI have obviously not gone this route
for these initial Radeon 1000 parts, choosing instead to keep
the same amount of units as in their previous generation.
Keeping a GPU's shader units fed instead of idle and wasting
clock cycles is paramount in achieving the kind of efficiency
ATI hopes for with the Radeon X1000 chips; thus it is rather
hard to miss the importance of the Ultra-Threading Dispatch
Processor in the above diagram.
Pixel Shader Engine

Each pixel shader pipeline in the
Radeon 1000 boards is comprised of two ALUs and a branch execution
unit; hese pipelines are arranged in groupings of four, known
as quads (the de facto design choice for years, which is why
we see past and current GPUs with 4, 8, 16, or 24 "pipelines").
The X1800 XT and X800 XT boast four quads apiece, and the
X1600 XT has three. Take note, however, that these ALUs are
not co-equal, but rather each pipeline has a "full"
and "mini" unit, with the latter lacking the ability
to execute all of the instructions of the full ALU. It is
also worth noting that the internal precision of the new architecture
has been increased from FP24 (24-bit floating point) of previous
generations to FP32 (32-bit floating point), again with no
reduced or partial precision modes.
Yet looking back at the architecture
diagram we see that the Ultra-Threading Dispatch unit feeds
these quad pixel shader blocks. This processor could perhaps
be thought of as an integrated hardware scheduler, evenly
distributing the shader work among the pipelines to increase
the efficiency of the ALUs. Moreover, the dispatch unit also
manages the shader data in smaller pieces known as threads
which are aided by the new branch execution unit in each pipeline
capable of executing one flow control instruction per clock
cycle. This design improves efficiency in dealing with dynamic
branching, an important aspect of any forward-looking SM3.0
architecture. Yet the dispatch processor also feeds the texture
address units, which are no longer a part of each pixel pipeline
as in previous designs. By decoupling the texture units from
the pipelines the dispatch unit can help prevent texturing
stalling the pixel shader pipelines, another design decision
with an eye for increased efficiency. ATI claims this change
to their pixel shader engine should achieve a 90% efficiency
in the pixel shader pipeline regardless of the shader being
processed.
Memory Controller

Memory access has traditionally been
one of the bottlenecks for graphics processing, particularly
for a feature such as anti-aliasing that stresses bandwidth.
To improve performance in these areas and, again, improve
overall efficiency, ATI has engineered a new memory controller
in the Radeon X1000 family. The R300 and R400 chips boasted
256-bit wide controllers comprised of four 64-bit channels;
in contrast the new controller consists of eight 32-bit channels
that feed a bi-directional ring bus with the ring stops arbitrating
memory access requests. The X1600 XT, though, as a mainstream,
smaller chip, has half the bit width in its memory controller
as the X1800s.
This new controller circles around
the die of the chip, theoretically helping hide latency by
nature of its topology; furthermore, because of its physical
placement along the outer edges of the chip where less heat
is produced, ATI claims the new controller can be clocked
at higher speeds to accommodate faster RAM as it becomes available.
In addition, the new controller is programmable, allowing
ATI to analyze memory access patterns and optimize the controller's
operations on a per application basis. This controller, along
with the Ultra-Threading Dispatch unit, probably contributes
the most to the transistor increase R520 has over previous
chips (321 million compared to the X800 XT's 160m). We'll
examine how well the X1800 XT performs at high resolutions
with anti-aliasing compared to the older X800 XT.
Various sundry factoids for this new
architecture also worth mentioning would include: aside from
VS3.0 support, the vertex shader engine remains relatively
unchanged from the previous generation, though the number
of units has been increased from six to eight for the high-end
boards. This increase, along with the higher clock speed,
should allow for faster vertex processing in the X1800 XT
over previous generations. In addition, ATI also now uses
floating point calculations for the early Z and occlusion
detection, which the company claims improves hidden surface
removal by 60%. Removing such occluded pixels early in the
raster process is yet another step toward improving efficiency.
And, lest we forget, Radeon 1000 parts like the X1600 and
X1800 can be paired with Crossfire master boards for a dual
PEG configuration; in fact, SimHQ will be reviewing a Radeon
X1800 XT Crossfire solution in the near future.
Go
To Page 3
Click
here to go to top of this page.
Copyright 2008, SimHQ.com. All Rights Reserved. Contact the webmaster.
|