3DNews Vendor Reference English Resource -
All you need to know about your products!
Digital-Daily.com
Digital-Daily

ATI Radeon X800 XT (R420): Extreme force

Author: Andrey Kuzin
Date: 09/05/2004



Introduction

ATI Radeon X800 XT
GPU chip ATI R420
Memory 256 Mb; GDDR3 1.6ns
Core/Mem: 520/1120 MHz
Category: Hi-End
Price: $500

Remember how many score points did the fastest VGA give on the date 3DMark2003 was issued? I don't either. What I remember is that it was not much at all. In fact, the score was so low that gamers of all the world were about to anathematize that benchmark. At least, it failed to be a testing sport discipline. Now it will - both R420 and NV40 easily got over the important 10000 score points bar at one stroke.

Having run through all the tests and got round to writing this review, I was clearly aware that on clicking the intriguing text label "ATI Radeon X800 XT (R420): secret weapon", the reader would pursue the only goal - to find out "who on earth is the winner this time?" And you are absolutely unwilling to browse through the ten pages of the review while reading 50 k of text. Let's comply with a request of the most impatient and, breaking the rules of the tragedy genre, lay our cards on the table in the second paragraph:


Is it any easier now? :-) Perhaps, but not for all. This should frustrate especially those who recently bought a Radeon 9800XT or GeForce FX5950Ultra. As was promised, today's novelties indeed demonstrate a two-fold lead over the yesterday's champions at the same price as much as $500.

ATI R420 Graphic Chip

R420, the first in the line of ATI's Hi-End chips is manufactured following the 0.13 mk process technology at lithographic production lines of TSMÑ (Taiwan Semiconductor Manufacturing Ñompany). Formerly, ATI trained a lot with much cheaper RV360 while expecting the production cost reduction.


R420
ATI R420
Manufactured on the first week of April 2004.


ATI X800 review:
(published 4 may `04)
[H]ard OCP.com
iXBT
TechReport
Hardware Analysis
NeoSeeker
Driver Heaven
Ascully
Anandtech
HotHardware
PC Perspective
Bjorn3D
Lost Circuits
Beyond3D
Gamers-Depot

Physically, it is absolutely impossible to arrange 160 mln transistors making up R420 without increasing the 0.15 mk chip area, so transition to the 0.13 mk technology was inevitable. This transition by ATI was in many ways paid by the sweat and blood of its competitor, NVIDIA, who had started experiments with 0.13 chips through releasing NV30 at the same TSMC grounds as early as in the summer of 2002 once the first Taiwanese production line for the process technology was built. The new process technology is the seven circles of hell which NVIDIA had to go through, but when the efforts were completely exhausted, the company transferred production of its chips over to IBM facilities and all started over again (currently, it is again transferring all to TSMC).

For ATI chips, such an urgent transition was not so critical. Their R300 core (which served as the basis for further R360 and current R420) had fewer transistors (115 mln) owing to the optimum chip architecture. Those transistors quite easily fitted within the reasonable dimensions of chips manufactured following the 0.15 mk process technology. But NVIDIA would persistently fail to fit within the dimensions starting with the first FX chip, that's why they suffered a lot with the urgent search of the most recent process technology.

The video chip architecture rarely undergoes a "radical" change. So it's more than naive to wait for absolutely new chip every 9 months. It's also naive to assume that a team of engineers takes on a new chip upon finishing a project. NOTHING OF THE KIND. Both ATI and NVIDIA have at least three teams like these - one is developing a current chip declared in the roadmaps, the second is doing the next, and the third team is working at a forward-looking goals to achieve in many years to come ...

Therefore, having completed R300 (Radeon9700 Pro), the company teams immediately started work at doubling the number of pixel and vertex pipelines on the base of that successful core. Why is R320 mentioned all the time? All is simple - before its emergence, ATI was hanging by a thread: at performance, the hi-end ATi Radeon 8500 was able to oppose only the cheapest Ti4200 ($200), the sales dropped "below the sea level", no profits came - only losses, and the number of low-end chips exceeded all the reasonable limits; a special driver was needed to support for every card; the marketing division seemed to be headed by rookies, and the top management had no better to do but mark the company account balance from day to day. The money was melting.

All was thrown at the last resort - the development of a new chip, R300. If now Intel had to develop a new processor architecture for the mainstream market, the company's well-being would shatter immensely. The development costs for every new product by Intel, AMD, ATI and NVIDIA are currently beyond any reasonable limits. If we pronounce these figures to our (Russia's) government, they would turn green with greed and drown themselves because of the sorrow.

So, speaking that ATI and NVIDIA "released new chips", we've got to be clearly aware of what is really new in them ... Novelty is varied. Currently, no one would dare to change the finely-woven chip architecture - it's too costly. Therefore, the only possible options are addition of some new functionalities, overclocking the core to some extent, pipeline optimization, and... the most radical thing - increase in their number.

It's just the number of pixel and vertex pipelines that has been doubled in today's R360/NV40, which is not that frequent these days.



R360 Block Diagram

All the blocks that have undergone changes are highlighted in red. Six units of vertex programs (instead of three as was before) theoretically prepare 750 mln polygons per second. For comparison - NV40 demonstrates 600 mln polygons per second due to the lower operating frequency of the chip.

And the 16 pixel pipelines are able handling 80 shader instructions per cycle in parallel at 192 GigaFLOP performance, with the theoretical fill-rate over 8.8 gigapixels per second.



Documentation (PDF):
Architecture_Final.zip
Memory_Final.zip
3Dc_Presentation_Final.zip
EvaluatingPerfomrance.zip
PCI_Express_Final.zip

The pixel shader pipeline is made up of two vector-driven and two scalar ALU. Plus one address ALU and a special F-Buffer unit responsible for handling longer OpenGL shaders.

The spice of the R420 technology is the full processor scalability and the possibility to create 16-, 8- and 4-pipeline configurations of the pixel pipeline for processors of all the market niches.

The complete description of the processor architecture takes up tens of pages and merits a special consideraion, but those who are curious may download the PDF document Architecture_Final.zip (1 MB).


ATI Radeon X800 XT Video Card Features

Two video cards - ATI Radeon X800 in both XT and Pro makes - are planned to be produced on the base of the R420 chip. On May 4, released will be ATI Radeon X800Pro with 12 pixel pipelines, and on May 21 the X800 XT with 16 pixel pipelines will be officially launched. All their differences are summarized in the following table:

ATI
Radeon X800 Pro
Radeon X800 XT
Q-ty of transistors 160 mln 160 mln
Core 475 MHz 520 MHz
Mem 900 Mhz (DDR3) 1120 Mhz (DDR3)
Pixel pipelines (pcs.) 12 16
Launch date May 4 May 21
Price $400 $500

Design

Although on 4 May ATI presented the 12-pipeline ATI Radeon 9800 Pro, we got its elder brother - the 16-pipeline ATI Radeon 9800XT presented on the photos:



Front ATI Radeon X800 XT. *2400x1539; 611Kb


Front ATI Radeon X800 XT without a cooler. *1600x1087; 400Kb


Back ATI Radeon X800 XT. *2200x1424; 741Kb

As we can see, the card very much resembles its predecessor - ATI Radeon 9800XT. And the cooling system preserved its shape and has even grown smaller! Aren't they bullying the powerful systems of its competitor?

Anyway, such "similarity" is not quite good for an entirely new product. I wish there were more impressive exterior differences, e.g. white textolyte. That the board does not bear the X800 XT logo is not a mistake, but the consequence of that the card has its origin in the samples. The cards which are to be shipped to the retail will bear a logo like this:


logo Radeon X800XT

By the way, would you like to have a look at the person who does the PCB wiring for all ATI cards?


Meet her, a graduate from the Bauman Higher Technical College (Moscow) ;-). Sometimes she reads Russian-language reviews on her company's products, but most of the time she is fixed to her huge monitor trying to arrange just another thousand of pins in the freshly-baked chip. It turns out that TSMC adores to trade places of the processor's signal pins, and every new sample or revision requires redesign of the board's wiring.

I am not telling you her name and email - it's of more use to me :).

Power consumption

One of the most serious problems of video cards is the heat emission of core and memory chips.

The ATI Radeon X800XT reference board consumes merely ~70W, which is one and a half time more than the 110W required by GeForce 6800 Ultra. How comes this fantastic difference which has turned traditional for competitive models by both companies? The answer is evident - those 160 mln transistors using the Low-k dielectric versus 222 mln transistors not using Low-k in NVIDIA chips, plus more up-to-date core by ATI that originates from Radeon 9700 Pro which used to be tuned almost manually. Thirdly, all the power-saving technologies by ATI for mobile solutions have been introduced on regular ATI graphic processors. The pipelines simply get disabled once they turn idle.

The current major core by NVIDIA is two generations older and has acquired add-ons with new units, which inevitably results in overall efficiency reduction. It became evident for the first time when NV30 was released, and it cost NVIDIA lots of efforts not to lose face and urgently start doing modernizations and polishing of the chip, which proved to be not bad at all. But history repeats - again there came up the need for a two-slot cooling system (originally, the card was designed to have a low-profile cooling system) and this time it resulted in two compulsory molex power connectors which have to be plugged in to two different power cables. The recommended power rating of the PSU with a NV40 installed is 480W. It is extremely difficult to make such PSUs available in Russia, and they cost ~$120-150.



GeForce 6800Ultra

The quantity of transistors in the chip is the key factor that affects some other of its characteristics. The more they are there, the lower chip yield rate. the lower the chip frequency, which creates more difficulties in applying power to the chip (because of the greater number of zones).

ATI Radeon X800XT offers one molex power connector. The yellow connector of unknown purpose is present only in reference boards handed over to test labs and those manufactured at own small production facilities of ATI for internal testing needs. It won't be there on cards which are to be shipped to the retail.



Note how the voltage stabilizer unit has turned simpler.

A few more words about the power supply of modern cards - 70W is greater than can be applied via the AGP bus, but less(!) than PCI-Express16 (~75W) can hold. As a result, as new motherboards appear, we can expect R420 cards as well without any extra power. And even NV40 will most likely get rid of one of the two connectors in its PCI-E make. In any case, the dates for releasing AGP makes of both Radeon X800XT and GeForce 6800Ultra is limited to a few months, after that they will turn rarity. Once Intel announces Alderwood and Grantsdale, the computer industry will be quickly transferred to the PCI-E walk.

VIVO

Both Radeon Õ800 cards offer an onboard VIVO chip (the old RAGE THEATER). Of the previous 9800XT series, only the Asus' board was equipped with the Theater 200 chip.


Traditionally, in the "features" section we review the memory, but this time the topic is so huge that we had to devote to it a separate page.

GDDR3 Memory

Radeon X800 cards offer 256 Mb of the GDDR3 SDRAM memory. This is an entirely new architecture of graphic memory suggested by engineers at ATI and approved by JEDEC as an open standard.

JEDEC - Joint Electronic Devices Engineering Council. The Council was established in 1960 by EIA (Electronic Industries Alliance) and NEMA (National Electrical Manufacturers Association) in order to perform joint development of standards for electronic devices and reduce costs to do with creation of new devices due to reciprocal data exchange. Today, most manufacturers and developers of electronic components worldwide are members of JEDEC.

It's just Joe Macri, ATI's chief technical executive in Santa Clara), who headed the definition of GDDR3 industry standards to be further approved by JEDEC and is currently the chairman of committee engaged in definition of the next standard GDDR4 at JEDEC.


Joe Macri (ATI)
Joe Macri (ATI's chief technical executive)

The tasks which were posed in development of GDDR3 are predictable enough - to raise the memory efficiency while providing compatibility to the original DDR architecture:

GDDR2 vs GDDR3:
Criterion
GDDR2
GDDR3
Improve on DDR’s imperfections and limitations
+
++
Remain backwards compatible with original DDR
++
++
Architecture that initially supports UHE desktop and workstation markets
++
++
To be also applicable in mobile and mainstream desktop (within 6--9 months)
+
++
Simplify DRAM Design as well as System Design or DRAM Use where possible
+
++
Architecture to invest in for next few years
+
++
Frequency support greater than 600MHz
-
++
Industry consensus i.e. everyone building to the same specification
-
++

++ task fully completed
+ task completed with some reservations
- failed to complete

As you can see from this table, this time they successfully avoided the major error of the previous architecture - the variant reading of the standard.

The GDDR3 was first demonstrated at the last autumn's IDF. But the first company who by the quirk of fate was the first to implement GDDR3 in a mainstream commercial product was NVIDIA with its FX5700Ultra DDR3 video card. The experiment proved successful - the card consumed less power, was cooler even despite the small performance boost (for test results, see the VGA Roundup).

What is GDDR3?
Feature
DDR
GDDR2
GDDR3
I/O SSTL-2 SSTL-18
with ODT
POD-18
Clocking interface: DQS Differential DQS or DQS Unidirectional DQS
Frequency 166-450Mhz 400-500Mhz 500-800Mhz

To avoid variant reading, we left the table as is. Specialist find it more appropriate to read it in the original, and common people would be absolutely not interested in that :). But it makes sense to explain what POD and Unidirectional DQS are.

POD (Pseudo Open Drain)
:

  • It is voltage based open drain vs. current based drain area (reduced to implement driver);
  • reference voltage Vref = 1.26V vs. 0.9V on DDR2;
  • Allows controller that can support DDR, GDDR2 and GDDR3;
  • Simplified DRAM design (allows use of Nmos transistors to build DRAM receiver);
  • Reduced power and simplified MC/IO design;
  • Idle state of strobe signals = VDDQ for simplified clocking.
Unidirectional DQS
GDDR3 offers transmission of unidirectional DQS strobes, which is advantageous over transmission of bi-directional differential DQS strobes in DDR2 for P2P applications:


Differential strobes:
  • 4 strobes per byte
  • Requires 8 pins
  • No benefit on architectural side
  • Minimal improvement to system timings
  • Improved ISI and SSO (tDS/DH)

Unidirectional strobes
  • 4 Read and 4 Write strobes per byte
  • Requires 8 pins
  • Benefits on architectural side
  • Eases system timings
  • Preamble pulse for improved ISI
  • Strobes always valid for simplified clocking

Other areas remain compatible with DDR and DDR2

  • Data as in DDR (Centered for writes; Edge aligned for reads);
  • CLK pulses as in DDR

I don't like the idea of turning the material into a textbook on electronics, so simply put, all excessive resistors have been removed, and the voltage stabilizer unit on the card has turned twice simpler, which is well seen on the photos:

GDDR3's low power consumption results in less heat to the chips while providing essential boost in clock speeds. The chips are simply cold, and on Radeon X800 cards there aren't even radiators on the memory chips. This should appeal to the overclocker very much :). Nothing prevents them from fitting it on their own. But remember - 105 C is the critical point after which the data starts leaking away. The GDDR3 I/O are constantly calibrated to provide consistent characteristics across temperature and voltage.

And the last nice improvement - simplified clocking. As a result, there is more natural headroom available in the I/O system, with the critical paths minimized.

Whoosh... That's it!

It's a rare occasion when we are loaded with too much of technical stuff, so I'd better unload all the fresh info upon the readers. Our next lecture, already on GDDR4 will be held in 2005, because its standardization completion is scheduled for the end of this year.


GDDR3 memory on X800 series cards

Nowadays, the GDDR3 memory is being produced by all the leading players on the memory market - Samsung, Hynix, Micron, Infinion and Elpida.

The Radeon X800 card of the XT make offers 256 Mb GDDR3 SDRAM made up of eight Samsung K4J55323QF-GC16 chips of maximum frequency 600Mhz (resultant 1.2 GHz). In the BIOS of the video card, the memory frequency is set to 560(1120MHz), i.e. with some margin. The chips are set in four pieces on each of the sides and do not require any cooling. The supply voltage is 2.0V (compare it to 2.8-2.5V in the popular series of K4D263238E-GC chips).

In its own GDDR3 line, Samsung offers even more speedy chips - GC14 and GC12 (700 and 800MHz, respectively), so it's quite probable that more advanced and expensive video cards built on R420 chip may appear from manufacturers who prefer doing VGA developments on their own, e.g. ASUS. But in this case the memory chips would have to be cooled, which requires revamping of the whole cooling system, which is not going to be much trouble for some manufacturers.


Samsung K4J55323QF-GC16
GDDR3 Samsung K4J55323QF-GC16 (for Radeon X800XT video cards)

In the Radeon X800 Pro version, used was the low-end chip of the same series - Samsung K4J55323QF-GC20 but at the maximum clock speed 500MHz(1GHz). In the BIOS of the video card, the memory frequency is set to 450(900 MHz), i.e. again with some overclocking margin. But for now there aren't any tools to overclock with.



GDDR3 Samsung K4J55323QF-GC20 (for Radeon X800Pro video cards)

Technologies. Demo "Ruby: The DoubleCross"

Since the time the first 3DFX card was released, the video industry established a good tradition - to prepare a good techno demo by the time the new chip is launched thus demonstrating new graphic horizons, because it sometimes takes too long wafting for the implementation of new technologies into real gaming products.

The most successful demos turn into worship symbols of products, if you remember the "butterfly" by NVIDIA. It's high time that butterfly was included in all marketing textbooks as a fantastic example of purely image-making work in promoting a high-tech product.

With the release of "Dusk", the conceptual approach to techno demos has seriously changed, and now their major and only goal is ADVERTIZING, with the demonstration of new chip functionalities as secondary. Nowadays "demos" are used as TV commercials aimed at taking by surprise, smashing and fixing the eyesight. Enhanced by the Internet facilities, the advertizing persuades the target audience to download presentation demo reels first and then the executive file (if the manufacturer makes them available for download).


ATI Ruby
"Agent RUBY is authorized in use of extreme force and has global diplomatic immunity"
The most extreme weapon of Ruby :-)

Those inveterate techno geeks from ATI at last understood that. This time, to present X800 cards to the public ATI avoided showing ogres and other evil spirits, and used the pretty "Ruby" girl on a top-secret mission as an attraction, but this time fully dressed! Have you ever seen dressed lady spies??? I haven't.

Ruby: The DoubleCross Ruby: The DoubleCross
*2400x2400; 450 kb
*2400x2400; 440 kb

Demo "Ruby: The DoubleCross" was created jointly by ATI and "RhinoFX". RhinoFX is not the last player on the 3D animation market and receives orders from clients like General Motors, Sony, Motorola, BMW, Procter & Gamble, Ford etc...


RhinoFX

In our case, the guys from RhinoFX together with ATI underwent an technological training and mastered all the new functionalities of X800 3D accelerators. In particular, the demo makes active use of complex shader generation commands (up to 512 instructions) as well as the new normal map compression dubbed 3Dc.


3Dc Technology



Techno demos:
Ruby 3Dc demo 1 (- 3,9Mb)
Ruby 3Dc demo 2 (- 3,1Mb)
SSam 2 demo (- 6,0Mb)

Today's 3D images in games are still made of primitive triangles on top of which 2D textures are fitted. The more triangles, the more precise the shape and volume of an object is. But it is too prodigal to compute millions of coordinates for creating a 3D object precisely. A simple and effective way was found - the normal map. The technique allows applying a special texture responsible for correct lighting of an object made up of a limited number of elements thus attaining correct display of the surface volume.

The normal maps are already used quite intensely in the most recently released games (e.g. Far Cry and Lord of the Ring), as well as those to come soon, like Half Life 2, Doom 3 and Serious Sam 2.


The "normals" are like an arrow that points to the right perpendicular direction from a complex surface at any of its point. This data is required to compute how the light is to be reflected from the surface point, and normal maps appear to be ordinary textures where all the necessary information is digitally stored.

Perhaps no one grasped the idea, so let me put in in simpler terms:

  1. Both complex and simple models of an object are created.
  2. Based on them, the difference in directions of surface perpendiculars is calculated
  3. The produced data is used to create a "normal map" texture
  4. Information from that texture allows computing the illumination of simple object as if the game uses the originally complex (correct) object.

To illustrate it, let's use the following picture taken from ATI's presentation:


Like any texture, the normal map has its limitations for the amount of presented information. To reduce the volume of textures, the DXTC compression technique is used:


As we see, the final result is far away from the original after the existing compression techniques have been applied. ATI offers developers a new tool for improving the quality of normal map compression, the 3Dc technology implemented at the hardware level in R420 chip.


We are not going deep into the mathematical model of the process - experts can get a detailed view from the technical documentation available to download at ATI's website. Note only the result of introduction - 1:4 compression (one-to-four) while preserving the identity to the original, which results in saved memory bandwidth and improved realism in displaying 3D models.

We have repeatedly observed some bravura announcements and quiet funerals of proprietary initiatives - just remember the TrueForm (ATI). Hopefully, this time the developers will make use of the technology because it is absolutely simple in implementation. In virtually two hours, Croteam added support for the 3Dc into their "Serious Sam 2" demo version, which is possible provided there are detailed models.



Serious Sam 2 techdemo (*.avi 6,0mb)

To date, the 3Dc technology has been announced in such awaited and worship projects like Serious Sam 2, Half Life 2, Pirates! by Syd Meyer, Tribes Vengeance and Dark Sector.


Pixel Shader 3.0: to be or not to be?

In the competitive NV40, support for the next version of pixel shaders has been implemented. R420 lacks this technology. That is, in the nearest future we are in for a fierce marketing war - NVIDIA will be persuading all in the extreme necessity for version 3.0 pixel shaders already in today's hardware, and ATI will earnestly object to that substantiating its viewpoint by the fact that even today's version 2.0 shaders have been implemented in all their glory in the only game - "Far Cry". But how well done they are! Before the protracted projects like "DOOM3" and "Half-Life2" are released, the project of little known Crytek has already turned into a smashing hit.

I must admit, game developers are working at a serious time lag before new technologies are introduced into their projects. There are objective reasons for that - the time taken for creating a game may last as long as 3-5 years while generations of graphic chips change once every 9 months. Recall that version 2.0 shaders were first demonstrated in the revolutionary R300 core (Radeon 9700) which was released 20 months ago. And only now we can somehow feast our eyes looking at them.

Coming back to Pixel Shader 3.0, it is worth mentioning two-three fine points:

  1. The version 3.0 pixel shader technology will be supported in the next version of API Microsoft DX (DirectX9c), which will be in about early June;
  2. As many game developers say, version 2.0 shaders won't be able to make games look better. There aren't effects that can be done with version 3.0 shaders which can't be implemented with version 2.0 shaders.
  3. It's a big question whether it is possible to boost performance upon switching to version 3.0 shaders. They differ from the previous versions mainly by the complete support for conditional branchings. The problem is the branching result is very hard to predict, and if the prediction is mistaken, all the GPU pipeline has to be re-started with substantial loss in performance.
  4. The first game where it will be supported is most likely "DOOM3";
  5. Ideologically, much depends on the well-known Futuremark, and more precisely on whether the "Pixel Shader 3.0" benchmark will be integrated in 3DMark2004. If yes, then it will be possible to measure the effectiveness of implementing the technology in NV40 (it is unlikely to be high or even acceptable). If not, ATI will acquire one more argument in favor of the premature nature of this technology.

So little time is left before 3DMark2004 is released. Let's wait and see.

Problems of tests: "cheating" and "optimizations"

The topic of chronical lack of adequate tools and quick removal of old tests from use is the favorite talk of the town among the employees of test labs at leisure. With the release of R420, two more tests of the good old arsenal have been washed off. For example, this is how tests of ATI Radeon X800XT look in Comanche 4:


ATI Radeon X800XT : Comanche 4

The performance of the video card is somehow seen only in the resolution 1920x1440 at the most aggressive quality settings. All the remaining figures characterize only the system CPU and the limit of the game's engine.

Another ancient test - Village Mark (evaluating the hidden surfaces cut-off efficiency) does not perceive results higher than 200 fps, and Radeon X800XT easily jumps over these 200 fps even at 1600x1200.

We've been constantly convincing developers that addition of the benchmarking kit guarantees at least that the game will be surely optimized in NVIDIA drivers. What is more important, the game will immediately acquire immense popularity and absolutely free "citing rate" at hundred and thousands of Internet resources which are in bad need for new tests.

Official attitudes of ATI and NVIDIA to the problems of testing video cards are also different - ATI recommends using FRAPS for the case of lack of a game console or an integrated benchmark, but for NVIDIA the release of every new game with a benchmark included is another headache - it has to be included into the list of "first-priority optimization items". Both these factors are not inspiring. FRAPS is good only for the case of a very serious performance difference for a video card, e.g. like this:

ATI Radeon 9800XT
ATI Radeon X800XT

FRAPS is absolutely useless for comparing video cards of approximately the same rank. Well, who can get absolutely equal ranking while driving NFS:Underground? (Basically, it is possible if you choose the 'Free' and 'no traffic on the road' options).

And the most painful thing - cheating and optimizations by NVIDIA. At that, we have to dwell presenting in more detail. The thing is that "cheating" and "optimizations" are absolutely different notions. So, what is it all about, and where is the problem rooted? Let's try to explain it in as simple as possible ways.

The pipeline architectures of ATI and NVIDIA graphic chips are different, of course. In a word, NVIDIA's pipeline is longer and in theory it is able handling (pushing) more data per cycle than the shorter ATI pipeline can do. But that is possible provided there is the only condition - the input data has got to be optimized (ordered) for sequential data processing just by that pipeline. One error, and all fails. The data is sent back to the beginning, a repeated sampling is done and then sent to the processing again. It's impossible to rule out idle cycles completely, but their percentage must and has to be reduced. This is done using the two factors - by either general improvement of the branch processing mechanism in the compiler, or through hard-coding the name of each particular game thus advising the method for processing its code in the most optimum way. It's just the second technique which is called "optimization" and is absolutely legitimate. Another thing that this sore costs NVIDIA too much. They have a whole staff team (as rumors have it, between 10 to 20 employees) who receive luxurious salaries only for analyzing competitors' code (in all the new games) while firmly keeping their compiler specifics in mind. This is infinite hair of a job which requires exceptional skills. But they have to and ALWAYS will do that! Despite any attempts to make them stop their activities. At that, NVIDIA is absolutely right.

If all the data is handled by the pipeline properly without turning it back to the beginning and repeated sampling, then potentially NVIDIA should take a convincing lead over ATI's pipelines. But that is only in theory. This never happens in real world - it's like crooked teeth in a watch gear manufactured manually in the 17th century - the probability of their complete match drops by an order of magnitude if the radius of matching increases.

As an illustration, we invite the readers to estimate the probability of successful outcome for the left-hand and right-hand cases.

All the endless new versions of Detonator/ForceWare driver just include more and more new optimizations for NEW GAMES.

To make their lives easier, NVIDIA invented the Cg language, but that tool-set turned out to be so intricate and complicated that most developers simply don't use it. But even if they decide to learn it, then financial and technological aid from NVIDIA is needed (just remember the marketing initiative called "The way it's meant to be played!" :)

But what about ATI? For ATI, there is no need to do that - the company has taken a different path. They use a shorter pipeline, and due to the fewer number of transistors per chip they were able to increase the chip operating frequency even in R420 as compared to NV40 (525 MHz versus 400). On the one hand, the shorter pipeline handles less data per cycle, but it gets fewer idle cycles and higher operating frequency as compared to the competitor, which makes optimizations for all with their own efforts a useless job.

ATI compiler stands out with its higher predictability of results and offers more stability. Even now, with the beta version of Catalyst 4.5 (for the X800 series), no problems were found. The driver is predictable in its performance and, unlike ForceWare, it never demonstrates any unexplainable performance boosts or strange drops.



Terry Makedon
"Mr. Catalyst" - head of the
Catalyst development department

Therefore, 'a la NVIDIA optimization' is a forced measure of the company, otherwise the result will be catastrophic. As a typical example, see the test results in "3DMark2003 build 320 vs 340: a rare moment of truth?" which tells that Futuremark at the culmination of the intrigue simply blocked all NVIDIA's optimizations for 3DMark2003.

Why was that done? And here we have just arrived at the problem of "cheating" which, unlike the "optimizations" is no longer a legal method of competition.

Futuremark has a special version of 3DMark2003 benchmark that demonstrates a map fly-by and once with a version of the Detonator/ForceWare drivers it was found that a whole piece of the map had been simply removed in one of the tests. Actually, this part of the map is not seen to the eye, but this is a fact. Futuremark decided not to sort it all out but simply blocked all the optimizations and cheats, whether legal or not, at one stroke. Both companies were in rage. Futuremark values its own reputation, and they don't seem to lose it at all - even without that they have had a lot of cavils...

The parties agreed upon the following: NVIDIA bound itself not to overstep the "critical threshold" in Futuremark products, and the latter undertook not to make this history public :-).

Test configuration

CPU P4 3.2Mhz 800FSB (Northwood D1)
Mb Epox 4PDA2+ (i865PE)
Memory PC2700 2x256Mb 400Mhz in the dual-channel mode
Latency timings - 2:6:3:3
OS WinXP + SP1
Drivers Forceware 56.72 for FX5950Ultra & Forceware 56.72
Catalyst 4.5 BETA for testers

Test software:

    Synthetic benchmark:
  1. 3DMark2003 v340;
  2. Codecreatures v1.0.0 (a DirectX 8.1 application, shaders on, Hardware T&L);
  3. AquaMark 3 (DirectX 9.0, Vertex Shaders 1.1/1.4/2.0, Pixel Shaders 1.1/1.4/2.0, Hardware T&L, AquaMark3 Triscore mode);

    Gaming applications:

  4. Unreal Tournament 2003 Demo (Direct3D, Hardware T&L, vertex shaders, Dot3, cube texturing.);
  5. Gun Metal Benchmark 2 v1.20s (a DirectX 9.0 benchmark, Vertex Shaders 2.0, Pixel Shaders 1.1, Hardware T&L);
  6. X2: The Threat Demo (Direct3D, multitexturing, Dot3, running in the benchmark mode embedded in the demo version);
  7. Final Fantasy XI Official Benchmark 2 (a benchmark for assessing the performance in the future game Final Fantasy XI. The developers haven't presented any data on the gaming engine);
  8. Core Design / Eidos Interactive Tomb Raider: Angel of Darkness v49 (DirectX 9.0, Vertex Shaders 2.0, Pixel Shaders 2.0);
  9. Valve Software/Vivendi Universal Games Half-life 2 leaked beta (DirectX 9.0, Vertex Shaders 2.0, Pixel Shaders 2.0);
  10. GSC GameWorld / Russobit Ì FireStarter (DirectX 8.1/DirectX 9.0, pixel and vertex shaders, particle system, dynamic lights, projected textures);
  11. Crytek / UbiSoft FarCry (DirectX 9.0, Pixel Shaders 2.0, quality settings set to the maximum, our own "3Dnews001" demo was used).

In the tests, the following three Hi-End ATI video cards of the past generation took part: Radeon 9700Pro/9800Pro/9800XT. NVIDIA's ranking was marked for the FX5990Ultra video card (dotted line on the graphs).

Purpose of the tests: reveal the performance difference between ATI video cards of different generations.

3DMark 2003 Pro ver340











Multitexturing and version 2.0 pixel shader processing speed honestly rose by 2.5 times. All the remaining (except Wing Of Fury, the oldest of all the tests) - by two times. It is impossible to imagine the release of a Pentium processor which would demonstrate such a performance boost like that. The video industry is still able working miracles!


3DMark 2001SE

Fans of this benchmark will be disappointed - all the juices have been squeezed out already.



Codecreatures

First off, the general score with the standard image quality (AA and AF disabled):


In this benchmark, version 1.1 pixel shaders were used and NVIDIA cards have always felt more comfortable at that. The boost for Radeon X800 XT is of course not a two-fold as compared to Radeon 9800XT, but is anyway really impressive.

Here is the alignment of forces for resolutions up to the super-high:



Codecreatures 4xAA / 8xAF

Now switch on the quality settings and enjoy the result:

That's really incredible! The rise is more than two-fold (even two and a half times). Radeon X800XT demonstrates a fantastic leap forward leaving all the previous generation cards well behind! The 'strength margin' (we use this term to denote the difference in performance drop under standard settings and with the aggressive image quality settings enabled) is beyond competition.

Here is the alignment of forces for various resolutions:


9700Ultra failed to stand the torture at 2028x1536 and honestly gave way - just ran out of memory.


Aquamark 3

This is the last "conditionally synthetic" benchmark. Anyway, it is made in a real game engine..


At that, we see a substantial boost, albeit not as impressive as in Codecreatures.

Moving on to real-world games...


UT 2003

Let's start with the good old UT2003, although little has been changed in the "new" UT2004. Since without inclusion of quality settings the game came up against its technological limit with all the top-end cards, we did the tests solely in the quality mode.





HALO

This is the same case as with Unreal Tournament 2003. We immediately had enable all options to the full and test all the cards solely in the 4AA/8AF mode.
All the old cards start from the point where X800 XT finished its boffoes.


Final Fantasy XI Official Benchmark 2

The game already been released and is available in the retail worldwide - a thick box styled in the Japanese anime, which prevented from premature purchase. Our accounts department would never believe that was a "test software" and would refuse to pay a $60 cheque. Well, I don't know what it is there in the game, because I am bored to death with the test - in my night dreams I see crowds of Eskimo children ridinig peacock-back with fur coats on around the Amazon jungles, so no wonder my adorable woman complains that I kick.. :) Others make coffee just in case .. :)


All in all, no details have so far arrived .. I have no idea what is measured in there, in which score points, what was used - absolutely no idea.


GunMetal Benchmark

A DirectX 9.0 benchmark, although conditional. It makes active use of version 2.0 vertex shaders and version 1.1 pixel shaders.



The picture is traditional - the game is too weak for new-generation cards. In the second test, again the same straight line for X800 XT.

But the overall leap in performance is well seen from this graph:



X2-The Threat Rolling Demo

Multitexturing and Dot3 - without any shaders.



ATI cards dislike the stencil shade technology, but nevertheless, if AA+shadow are not enabled, then it is possible to play at 2048x1536 at 82 fps! A real chaos.


Half Life 2 Leaked Alpha


Looks like the demo version of Half Life 2 became playable at 1600x1200 and demonstrated 82 fps with X800 XT. It was unthinkable just yesterday. At 1024x768 we clearly see a definite limit.


With our second written demo to Half-Life 2, there is absolutely no drop at all the resolutions. For X800 XT, Half-Life2 and Commanch4 are birds of a feather ;-) We strongly hope that in the final version of the game it will be possible to set transcendental image quality settings.

To give you a bird's flight view, we present a test of all the video cards run at 1024x768 with our first demo written to HL2:



Tomb Rider: Angel of Darkness


"Tomb Rider: Angel of Darkness" is a hard DX9 game - and again we see a distinct two-fold performance boost at high resolutions. Isn't it high time we switched to 23" monitors? :-)
Video cards of the previous generation are playable at only 800x600 and 1024x768.



With the paris2g and paris1c demos, all the cards start from the same point, and Radeon X800XT in the paris1c still got stuck at any resolutions. This is impossible to believe! Well, when will Intel release a 4 GHz processor at last?

FireStarter

This is an easy enough game even for the previous-generation video cards, and to produce adequate results, we immediately loaded it with 4AA and 8AF settings.


Another shortcoming of the benchmark is that is allows taking results in only two resolutions. Super-high resolutions remain unavailable, and with 1280x1024 our guinea-pig was unable to show itself in full glory.


FarCry

The spice of the show - Far Cry (you may have omitted all the previous results going straight to the figures for FarCry :). At least, such a reaction was by the top management of ATI while they were watching the results for these tests.

We used the full version of the game (by the way, purchased in Toronto). It already had the version 1.1 patch which did a good enough job fixing the performance of FX video cards at handling version 2.0 pixel shaders.

All the settings were put to Very High Quality .. except the Water Quality - it was set to Ultra High Quality. We did two sessions of tests - in the AF level 1 and AF level 4 modes


Refraining myself from storms of joyful emotions, I can say that even Far Cry is a piece of cake for Radeon X800 XT!

Lastly, we enable the AF level 4:


If there were a processor at least one and a half times more powerful than today, Radeon X800 XT would have shown a smooth curve, like all the other cards, but alas ... today's strike has pushed the video industry to heights no one ever expected.

To run the FarCry, we used our own demo recorded with the "research6" map. We got numerous requests to share the ideas of the testing technique in this game. All is simple.

FarCry Testing Manual:

  • Install the full version of the game, add the patch 1.1 to fix shaders for NV3x
  • Copy our demo (merely 200kb) into the directory game_installation_path/levels/research
  • In the shortcut properties, add the -DEVMODE command-line option
  • Start the game - set the graphic options via the menu
  • Then open the console by pressing the (~) key and enter "map research 6". The map is loading
  • Back to the console - enter: "demo 3dnews001" - it started running, but you quickly press the (~) to close the console
  • Notes: in the console, don't enter quotation marks before and after the command
  • Notes for the cheat-fanciers: start the game with the -DEVMODE command-line option, load any map, press P to get any weapons, O - to get 999 ammo units (I found the cheat myself and absolutely by accident :-)


R420 vs NV40: first glance

GeForce 6800Ultra (NV40) was there on our test configuration for merely a few hours, and it is impossible to run a thorough testing for the short period, but anyway we were able to take readings and acquire first impressions.

This of course does not abolish the sacred duty of complete tests, but for now - just a first glance. Needless to say, the ForceWare 60.72 behaved so freakish, so some tests (e.g. GunMetal and UT2003) simply wouldn't start. Yesterday, the fixed ForceWare 61.11 was released, and it's better if we ran new and more comprehensive tests.

These are just dry figures, and for now - no comments.







Final Words

All the previous Hi-end cards are rubbish compared to the new generation of graphic chips. Along with the two-fold performance rise in modern demanding games, all the tests of previous-generation products with minor increase look scanty and mean. We were earnestly promised a two-fold performance rise, and we got it. For the very first time in many years, the consumer has not been deceived.

We've got to be aware that the new cards are aimed at solely super powerful CPUs and top-quality monitors which would demonstrate the smashing image quality at super-high resolutions with the maximum aggressive quality settings and the gameplay unaffected. Game developers have acquired new horizons for whatever exotic experiments in creating far-fetched effects.

The very first Radeon X800 Pro video cards will be shipped to the retail under ATI brand for the North American continent. As our Taiwanese partners of ATI assured us - their versions will reach Russia in about two weeks.. In two more weeks afterwards, the Radeon X800 XT is to arrive at the retail stores.

Remember - the cost of the Pro version is 400$ and we get whole two weeks to find a decent monitor with an Athlon64/P4 EE...



Toronto, the capital of ATI :-)
(Photo taken from CN Tower)

The "Radeon X800 XT" video card was presented by ATI's Moscow representation office.

Our undying gratitude to:

  • Aleksandr Zhavoronkov and Nikolay Radovskiy (ATI Russia) - for the arrangement of trip to Toronto to take part in "ATI Technology Day"
  • Rick Bergman - ATI's Chief Vice President for Marketing, Terry Makedon - Sr. Product Manager - Software Desktop Discrete Graphics, Joe Macri (ATI USA) - for the superb seminar on GDDR3, Chris Hooke (ATI Europe)
  • Andrey Vorobyov (iXBT) and Pavel Pilarczyk PCLab.pl - for the good company in Canada
  • Gennadiy Riger, Irina Kramtsova and Roman Kirichinskiy (ATI employees) - for meeting and seeing me off to/from the airport, fantastic excursions, care and attention ;-)

VGA Roundup `2003:

VGA Roundup (consolidated tests for NVIDIA + ATI)
NVIDIA VGA Roundup `2003
ATI VGA Roundup `2003

Read more on this topic:

MSI FX 5700 Ultra/NonUltra
Gigabyte GV-N595U-GT (NVIDIA FX5950 Ultra)
GeXcube Radeon 9600XT Extreme
ASUS Radeon 9600XT
NVIDIA FX5700 Ultra
MSI GeForce FX 5950 Ultra
3DMark2003 build 320 vs 340: A rare moment of truth?
Tests of ForceWare 52.16: FX 5900 versus Radeon 9800Pro
ASUS RADEON 9800 XT: a turning point
Tests of ATI Radeon 9800 PRO video cards
FX 5900 versus Radeon 9800Pro
FX 5600Ultra versus Radeon 9600Pro

Copyright © 2005 Digital-Daily. All Rights Reserved.
contact - info@digital-daily.com