3DNews Vendor Reference English Resource -
All you need to know about your products!
Digital-Daily.com
Digital-Daily

ForceWare 52.16: NVIDIA's retaliation

Date: 14/12/2003

By: Aleksey Burdyko

Introduction

It's no longer a secret that the pixel and shader performance in NVIDIA's products starting with NV30 has been at an inadmissibly low level, if compared to the performance of ATI's similar products within the same class. While in the synthetic benchmarks like 3DMark 2003 the problems were "solved" quite well with the advent of new drivers and embedded "optimizations", which often deteriorated the image quality or used candidly fraudulent techniques to push up the performance, then with the release of new benchmarks and, most importantly, DirectX 9.0 games that supported pixel and shader programs of version 2.0, the situation for NVIDIA chips was increasingly getting lamentable. We'll be also analyzing the situation with 3DMark 2003 in our today's testing session. It's especially interesting to see the test results for the package in view of the newly released (v340) patch thereto.

Let's recall those new games which have been released so far, and then the testing sessions conducted by numerous editions which revealed a hard to imagine lapse in scenes where pixel and vertex shaders of version 2.0 were intensely used.

"Tomb Rider: Angle of Darkness", "Halo: Combat Evolved", "AquaMark 3", a beta version of "Half-life 2" which "escaped" online just in good time - at all these games and benchmarks built on real gaming engines, NVIDIA boards, softly speaking, were not superior at first glance and even flagship models produced results comparable to ATI's middle-end boards (what is meant it the testing of the Half-Life 2 beta version). That's why a priori NVIDIA boards were rightfully dubbed "slow cards" at DirectX 9.0. We'll try to find out the cause of such results issued by NVIDIA cards in the theoretical section of the review.

Of course, NVIDIA is not sitting idle, and the programmers of this Californian company have released new ForceWare drivers which should eliminate the reproaches regarding the low shader performance in the NVIDIA NV3x family (by the way, 9 months ago, at the past CeBit, Alan Tike claimed the name Detonator would live forever).

In the new driver, along with the traditional bug fixes and addition of new features, the compilers of vertex and pixel shaders have been significantly revised, which should increase the rendering speed and thus should not affect the image quality. It's just the new algorithm for handling the pixel and vertex programs that determined the change of NVIDIA drivers name. So you can easily forget about the Detonator, so often criticized recently =). The new drivers are dubbed ForceWare, and currently the only WHQL-certified version available is 52.16. In the theoretical part of our review, we'll also consider what NVIDIA programmers actually revised in terms of the driver operation, which required a change in the name. Although there is a certain marketing trick of the company, but as practice tells, NVIDIA has every reason for that.


The NVIDIA ForceWare v52.16 driver

The situation with shaders in the NV3x is not at all that straightforward as it may seem at first glance. It would be a delusion to assume that the root is solely in the hardware part. It is also rooted in the architecture specifics of NVIDIA cards which use the 32-bit floating-point operation precision. ATI boards in turn perform all the computations at 24-bit precision minimally allowed by the DirectX 9.0 specifications. But it would be also wrong to claim that NVIDIA cards built on the CineFX architecture perform floating-point operations at particularly the 32-bit precision. In fact, the architecture of NVIDIA boards of the FX family is more flexible than those built on ATI's DirectX 9.0 chips and allows switching between the resource-intensive 32-bit precision and the less demanding "cut-by-half" floating-point operation precision, i.e. 16-bit. It may also include 12-bit (integer) precision.

Of course, the 32-bit precision requires much more computational operations than the 24-bit or, even more, the 16-bit floating-point precision. It's just the 32-bit precision that in most cases was implemented in NVIDIA drivers. Certainly, ATI's boards using the strictly fixed 24-bit accuracy were able showing much higher performance level in applications intensely using pixel programs of version 2.0. What do DirectX 9.0 specifications tell regarding that? They tell the floating-point precision should at least equal 24 bits per color channel, that is ATI chips fully meet the DirectX 9.0 specifications, and NVIDIA definitely surpasses that level through using the 32-bit precision to the prejudice of those precious score points and FPS scores in games and benchmarks. The company's attempt to switch the chip into the 16-bit precision mode was regarded quite ambiguously and ended up in accusing NVIDIA of fraud.

However, in our view, the moves of the Californian company in this case are quite understandable: at that, odds are in favor of ATI cards. But why has NVIDIA anyway made its FX chips the way we see them today and not followed ATI's route introducing the rigidly fixed 24-bit precision? In that case, the companies would have been competing at pushing up the clock speeds of their chips and memory modules and producing various modifications of architecturally the same boards (which, in fact, is being done successfully, but then the cards would have been indeed on par). The thing is - NVIDIA, being confident in its strength developed an architecture of the GeForce FX family expecting the programmers would make use of their proprietary Cg language (more adapted to NVIDIA cards) in writing shader programs. There's no wonder about that, actually - that time the company was strong enough and had every reason for the confidence.

And, as we remember it quite well, along with the advertising of potentials of the chips there was a hype about unprecedented programmability level and freedom for programmers in writing the shader program code for CineFX NVIDIA cards. However, practice showed that programming FX chips proved to be quite complicated and laborious a task. ATI chips appeared to be much easier to execute the code with the Microsoft HLSL (High Level Shader Language) compiler at 24-bit precision. That is, ATI chips run faster partly due to the fact that most shader programs would be written just with the standard Microsoft compiler, but NVIDIA staked on its own joint development, for which the company forfeited in the end, as we all can see that. Of course, there are purely hardware-related problems about NVIDIA's GeForce FX chips, but the issue of floating-point precision adds more headaches to NVIDIA.

So, what really matters to the end user? The end user who simply plays the most recent DirectX 9.0 games (whose number is not going to be that great for the moment, but things are anyway getting to the better, which is nice) absolutely doesn't care how the code is complied, what floating-point precision it offers etc. =). What really matters to the end user is the quality of the image displayed. If the quality doesn't drop and the speed goes up, why not to use the 16-bit precision? But here a subjective matter of quality assessment rises. So, this should be up to the users themselves to decide.


NVIDIA ForceWare

Clearly, NVIDIA has got to do something about the drivers. The company's "energetic moves" in embedding the 16-bit floating-point precision were taken by the public, softly speaking, without enthusiasm, which is quite logical and anticipated, so other ways to solve the problem had to be found. The first stage in solving the problem was the release of new ForceWare driver series. We'll be talking about the optimizations which don't deteriorate the image quality in more detail in what follows.

First off, let's talk about the purely "cosmetic" modifications.

ForceWare v52.16 ForceWare v52.16 ForceWare v52.16

At first glance, they are not so noticeable. The design of the menus hasn't undergone much changes since the times of the 40th series. There isn't any conceptually new approach in building the menus, which is not needed in fact, because the author of the review is absolutely content with the 40th series way of the menu layout and GUI.

ForceWare v52.16 ForceWare v52.16 ForceWare v52.16

The antialiasing and anisotropic filtering settings are gathered in the same section and offer 3 quality regimes:

  • "High Performance";
  • "Performance";
  • "Quality".
This is not new, actually.

ForceWare v52.16 ForceWare v52.16 ForceWare v52.16

The standard sections haven't been amended.

ForceWare v52.16 ForceWare v52.16 ForceWare v52.16

Now you can choose the desired resolution, but no just select ready pre-sets. This option is indeed a real value.

ForceWare v52.16 ForceWare v52.16 ForceWare v52.16

But the nView section does offer quite substantial changes. The major innovation of nView 3.0 is about the use of the "gridlines" feature which allows partitioning the screen more effectively and conveniently into several independent zones. If you are a lucky owner of a Quadro professional video card, the number of such zones can be up to 9, while GeForce family cards offered merely 4, which in our view is more than enough.

Also, official support for the most recent chips GeForce 5700, GeForce FX 5700 Ultra and GeForce FX 5950 Ultra has been introduced.

ForceWare v52.16
ForceWare v52.16

Of course, we were interested in other innovations of the driver. Namely, the revised "unified" compiler of DX 9.0 code. The idea behind it is that the compiler while receiving instructions in the form of a simple DirectX 9.0 code interprets them for the chip, re-builds the order and structure of commands in real time in order to provide the GeForce FX chip with the re-worked code which would execute faster than if the commands are fed to the graphics chip "as is". Visually, this is illustrated in the following diagram:


NVIDIA Unified

Potentially, the compiler can reduce the number of passes required by the code that comes directly out of the API. In the end, this may positively affect the accelerator performance in handling pixel and shader programs. Note also that the image quality does not suffer from that since the optimizations do not affect the floating-point precision settings - they simply rebuild the order and structure of commands, which logically can't deteriorate the image quality because the requested shader is displayed anyway. Another thing is that the accelerator will always handle the code more amenable for the architecture of FX chips.

Don't think that such an idea of optimization hasn't come into the heads of programmers at NVIDIA. The fundamentals of a "unified compiler" were laid down still in Detonator 44.12, but the idea was not brought to perfection, so the polishing and finishing of a real working technology was left to further drivers of the ForceWare series.

Optimization of the code that enters the GPU is no doubt a good idea. NVIDIA programmers deserve a real credit for that, but the matter of floating-point precision is still open. Since the most recent version of the Microsoft High Level Shader Language allows the programmers to choose a floating-point precision in writing the code, NVIDIA positions the possibility of GeForce FX chip architecture to choose one of three floating-point precision modes (the already mentioned 32-bit precision, 16-bit and the 12-bit integer mode) as an advantage of its chips. This is indeed difficult to negate: why should the 32-bit or 24-bit precision (for the case of ATI boards) be always used, if it is possible to restrict to e.g. the 16-bit floating-point precision for some specific tasks that do not require increased precision. Another thing is that it's not always possible to choose the right precision for particular tasks. In this case, the programmers are expected to spend much more effort in writing and optimizing the code.


ASUS V9950 (NVIDIA GeForce FX 5900)

Package bundle:

The colorful package, a de facto standard for ASUS FX product line, included the following:

  • The graphics board itself;
  • a DVI-to-D-Sub adapter;
  • a S-Video->RCA cable;
  • a User's Manual (in English);
  • A drivers installation manual (in 14 languages, including Russian);
  • 3 CD with games (Gunmetal, Delta Force: Black Hawk Down, Battle Engine);
  • 1 CD with demo versions of games (WarcrAFt III, Splinter Cell, Big Mutha Truckers, BREED, Colin McRae Rally 3 and TOCA Race Driver);
  • 2 CD with various ASUS proprietary software.

Once again we see a marvelous package bundle by ASUS, for which the company has always been notable for.


Design and board layout

The board's PCB design is completely identical to NVIDIA's reference board. There aren't any differences in either the component layout nor even in the positioning of capacitors.


ASUS V9950 (NVIDIA GeForce FX 5900) Front

ASUS V9950 (NVIDIA GeForce FX 5900) Back

The board itself proved to be quite massive. This is most likely due to the heavy cooling system (read below, for more details) with copper radiators.

The video card offers a classical dark green PCB and 128 MB DDR onboard with a 256-bit data transmission bus (8 chips, 32 bit each, positioned over the front side of the PCB). As we see, the board offers twice as less of memory capacity than its elder sister GeForce FX 5900 Ultra, which is achieved through under-equipment of the card with 8 memory chips so the contact pads on the board's reverse side remain vacant. The video card offers the AGP 2x/4x/8x interface and a standard set of outputs: one DVI-I, one analogous, and one TV-OUT. The signal for digital monitors is formed by the Sil164CT64 TMDS-transmitter made by Silicon Image.


There is also a contact pad for a VIVO chip which is not installed (available on the Ultra version of the board). On the front side of the PCB, there is also a connector for additional power necessary for the resource-hungry GeForce FX 5900. You don't have to apply the additional power to the board, but in this case the card will run at reduced frequencies (250 MHz core, 500 MHz memory). The shortcoming of the additional power connector on ASUS V9950 is its vertical positioning. First, it's quite difficult to apply power with an AGP video card already in place. Secondly, the fastening of the connector leaves much to be desired.


For memory chips positioned only on the front side, there is an advanced BGA packaging. The access time of memory chips is 2.2 ns, which is equivalent to 454 MHz (908 MHz), but the memory runs at 425 MHz (850 MHz) as per NVIDIA's specifications. The GPU operating speed is 400 MHz, which also meets the frequency recommended by NVIDIA.

The cooling of ASUS V9950 is made quite well, and during the tests there was nothing to complain about the overheating. In its overclocked state, the card was running properly despite the outstanding overclocking settings. The cooling system appears to be a continuous structure made up of a quite massive copper radiator that covers both the chip itself and the memory chips (for which there are some hollows, which improves the tightness of their attachment to the radiator), two fans (whose blades gleam in the ultraviolet, so ASUS V9950 is a real nicety for a modding fancier =) ) that blow around both the graphics chip and the memory chips.

On top of it all, I'd like to add that despite its massive bulk and seeming awkwardness of the cooling system the adjoining PCI slot is not blocked; anyway, it's better not to torture any device through installing it into the first slot because that will be a real trial for video card either since the air flow would be restricted. Among the advantages of this cooling system is the very low noise level (let alone the Flow FX =) ) which is very hard to distinguish behind the noise coming from the processor cooler and the hard disk.


Sapphire Atlantis Radeon 9800

Package bundle:

Sapphire Radeon 9800 BoxThe box with an impressive and smart drawing included the following:

  • the board itself with power patch plugs;
  • a CD with the game Tomb Rider: Angel of Darkness;
  • a DVI-to-D-Sub adapter;
  • a video cable;
  • an S-Video cable;
  • an RCA-to-S-Video adapter;
  • 3 CDs with drivers and software.

What we can note is that ATI's traditional (good old) partners are turning over a new leaf and start accompanying their products with really rich package bundles.


Design and board layout

The board's design is a clone of ATI's reference board. The PCB is designed as per ATI's requirements and no differences are seen.


Sapphire Radeon 9800 front

Sapphire Radeon 9800 back

The board offers ATI's traditional bright red color of the PCB, has 128 MB DDR memory onboard, the AGP 2x/4x/8x interface and a standard set of outputs: one analogous, one digital, and one S-Video. The good old two-phase SC1175CSW of Semtech is used as a voltage regulator.

The video card is equipped with 128 MB DDR memory packaged in 8 chips (4 chips on each of the sides - front and rear) within the advanced BGA packaging, with the 256-bit memory bus. The memory is produced by Hynix (HYB25D128323C-3.0), offers a 3.0 ns access time, which is equivalent to approximately 333 MHz of memory operation (666 MHz), but the memory runs at its intended frequency 290 MHz (580 MHz). That is, there is a small overclocking margin for the memory. The graphics chip also runs at 325 MHz as per the specifications.


There is absolutely no cooling for the memory chips. To cool the graphics processor, a low-profile cooling system is used which hardly can be regarded as effective enough. A standard reference small fan is fitted on the radiators. Nevertheless, in the nominal mode during the long 3D testing session there were no stability problems found. At the same time, the radiators were heated up quite immensely.


Benchmarking

Test configuration:


Motherboard: JetWay S446 (SiS 645)
Processor: P4 Northwood 1.6A@2.13A Ghz (133x16)
Memory: 256 MB Hynix PC2100 DDR SDRAM (CL=2)
HDD: Maxtor Diamond Plus 8 40 Gb
Video cards: ASUS V9950 128 Mb (NVIDIA GeForce FX 5900)
Sapphire Atlantis Radeon 9800 128 Mb (ATI Radeon 9800)
OS Microsoft Windows XP SP1 ENG, DirectX 9.0b
Driver: Detonator 45.23 WHQL and ForceWare 52.16
Catalyst 3.9

We remove all the decorative "niceties" out of the Windows GUI and set the operating system to maximum performance.

Disable the Vsync forcedly via the drivers both in OpenGL and in Direct3D applications. The S3TC texture compression was also disabled.

Test software:

Benchmarking Results: Synthetic benchmarks

Since the time of our previous test, we have radically revised the composition of our test synthetic packages. We have given up using the already outdated DirectX 8.1 package MadOnion 3DMark2001SE, and instead of it to assess the operation speed of DirectX 8.1 shader programs (shaders of versions 1.1 and 1.4) we left the already customary Codecreatures benchmark. The focus in selecting the DirectX 9.0 synthetic benchmarks was made on synthetic programs, so we have got 2 new kids:

ShaderMark v2.0 (DirectX 9 HLSL, a benchmark for pixel shaders);
D3D RightMark 1.0.2.7. (Public Beta 1) (a comprehensive DirectX 9.0 synthetic benchmark).

You can read a detailed analysis of data produced for the benchmarking packages directly in the review as we proceed with the tests of video cards.


ShaderMark v2.0

All the video cards were run in the benchmark in the "Anti-Detect Mode". Note also that NVIDIA's GeForce FX 5900 was unable to pass all the tests in this mode, of which the benchmark honestly reported. On the other hand, with ATI Radeon 9800 board there were no problems - all the whatever shader versions offered by ShaderMark v2.0 started up on that ATI's board without issues.


ShaderMark v2.0

So what can be said regarding the results? Here we can observe that the NVIDIA chip was literally crushed by ATI Radeon 9800. The graphs which point to ATI's 2-3-fold leadership over NVIDIA's chip are self-explanatory - none of the shaders (!) offered by the program was executed on NVIDIA GeForce FX 5900 faster than on ATI Radeon 9800. That's what the pure HLSV means for NVIDIA chips. To NVIDIA's credit it's worth noting that the new NVIDIA ForceWare 52.16 driver shows increased performance as compared to its predecessor Detonator 45.23, but it is scanty and does not affect the alignment of forces if we look at the results produced by ATI Radeon 9800. But what we need is something different: it is indicative that ForceWare 52.16 does offer a performance boost in HLSV, namely in the HLSV code. This suggests that the optimizations applied in the new NVIDIA's driver work in the code for real and give results. Albeit not so significant as we would want, but nevertheless they are there.


D3D RightMark

This is also a new benchmark in our set of synthetic applications, which in our view allow to assess the performance of the accelerator's subsystem in a most effective and impartial way, which matters. All the tests were conducted under the 1024x768 resolution. We didn't carry out all the tests under all possible settings - such a huge number of tests is unlikely to give our readers more information. It's more likely to confuse in the heap of diagrams =).

Geometry Processing Speed


D3D RightMark: Geometry Processing Speed

This test allows to assess the speed at which the geometry is processed by the accelerator. We used the most advanced mode with three diffuse-specular light sources in combination with three different operating modes: the traditional TCL (Fixed-Function Pipeline), vertex shaders 1.1 and pixel shaders 1.1, vertex shaders 2.0 and pixel shaders 2.0.

As we see, in the case of the traditional TCL the NVIDIA card leaves ATI Radeon 9800 well behind. What is remarkable is that a substantial gain due to using the new ForceWare 52.16 driver has been achieved - evidently, the shader responsible for the TCL emulation was optimized. But things turn really sad if shaders of version 1.1 and 2.0 are used. The performance of NVIDIA chip drops sharply, and nothing of the gain produced by the new driver is seen in the case of shaders 2.0 at all. On the other hand, ATI's card keeps a stiff upper lip and demonstrates an identical performance in using pixel and shader programs of both versions 1.1 and 2.0.

Pixel Filling

This test performs a number of various tasks, but we were mostly interested in the possibility of measuring the performance of frame buffer filling.


D3D RightMark: Pixel Filling

As we see, the performance is higher in ATI's chip. The new NVIDIA's ForceWare driver fixes the situation and does that quite substantially, but it anyway fails to catch up with the level attained by ATI Radeon 9800

Pixel Shading

This test in the D3D RightMark benchmarking package allows to estimate the performance of executing various pixel shaders of the second version. In this test, the geometry has been substantially simplified to minimize the dependence of results of the test on the geometric performance of the chip and verify the operation of pixel pipelines only.


D3D RightMark: Pixel Shading

As we see, the ATI Radeon 9800 chips beats NVIDIA GeForce FX 5900, and at three-fold benefit. The re-worked compiler of NVIDIA's new driver gives a gain, but it anyway does not allow the NVIDIA chip to come closer to Radeon 9800. Shaders written with HLSV are really a hard nut to crack for the NVIDIA chip - this seems to be an axiom which seems to stay imperturbable.

Point Sprites

This test is aimed at revealing the accelerator speed at displaying point sprites. In the test settings, we used 2 diffuse light sources.


D3D RightMark: Point Sprites

ATI chip takes a lead again, although the alignment of forces is closer to the situation in the filling and geometric performance tests than in the tests of pixel shader 2.0 performance, which is logical since the test depends directly on these two parameters.

Hidden Surface Removal

This test allows to estimate the efficiency of removal of hidden points and primitives by the accelerator.


D3D RightMark: Hidden Surface Removal

Cutting off hidden points works faster with ATI Radeon 9800 chip than with NVIDIA GeForce FX 5900 and quite substantially, which should affect the real-world applications.


3DMark 2001SE

3DMark2001SE benchmark is already old enough, but it's there in all our roundup as an honorable veteran =). Moreover, DirectX 8.1 games are very popular these days, which allows the user partly target the results of this benchmark for estimating the potential performance of boards in modern gaming applications. .


3DMark 2001SE

As regards directly the test results, the winner in the long run is the Sapphire board built on the ATI Radeon 9800 chip at all the resolutions. We stopped using detailed results of this test in favor of the above results of other synthetic benchmarks which, in our opinion, give a more impartial view of the real performance level for the boards.


3DMark 2003


3DMark 2003 v330

3DMark 2003 v340

At version 330 of 3DMark 2003, NVIDIA GeForce FX 5900 beats ATI Radeon 9800 in all the resolutions. A bit strange though, considering that NVIDIA failed all the shader tests of other synthetic benchmarks. But they at NVIDIA can make the "right" results, can't they? =)

At that, indicative are results produced with the latest patch to FutureMark version 340 which along with the company's new concept of producing and interpreting test results should revive the cracked (softly speaking =) ) credibility of 3DMark 2003. Why "should"? We'll explain that in what follows.

Install the new patch and run the tests: ATI Radeon 9800 is a leader with practically the same scoring of the 2003 version, and NVIDIA GeForce FX 5900 with ForceWare 52.16 is already a loser (the data for Detonator version 45.23, unfortunately, were not produced due to technical reasons, and there wasn't much sense in getting them, actually) who lost many score points as compared to the version 330 patch. A very amusing, but a regular situation. Soon after that, an unofficial release of the ForceWare drivers goes online, which "fixes" just the persistency of 3DMark 2003 in showing the right result with the new FutureMark patch. This driver hasn't yet acquired the "officially approved" status from FutureMark, but we are more than confident that WHQL0certification is at hand, followed by the "approval".


Codecreatures


Codecreatures

Codecreatures

In this quite outdated Codecreatures benchmark, NVIDIA beats ATI Radeon 9800 at all the resolutions. Almost no performance boost achieved through using the new NVIDIA's ForceWare 52.16 driver was noticed.


Benchmarking Results: Real gaming applications

From synthetic applications, we are now moving on to analyzing the performance of the graphic boards in real gaming applications. Like in the part of the review dealing with synthetic tests, the set of benchmarking applications has undergone some changes. First off, this applies to the safely "leaked online" beta-version of Half-Life 2 whose test results, in our opinion, would be very interesting for our readers. We have also added a nice benchmark based on the engine used in the soon released Final Fantasy XI game.


Unreal Tournament 2003


Unreal Tournament 2003

To start with, it is curious to look at the results produced from applications that do not make active use of shaders, among them is just Unreal Tournament 2003. At low resolution with Detonator 45.23, the Asus board built on the NVIDIA GeForce FX 5900 chip lags behind Sapphire board built on the ATI Radeon 9800. But installation of the NVIDIA ForceWare 52.16 driver brings its benefits, and the boards are already on par. At higher resolutions, NVIDIA's board holds leadership at both versions of the drivers. The performance boost attained due to the change of NVIDIA driver at high resolution is much lower than in the low resolution.


X2: The Threat Demo


X2: The Threat Demo

In this benchmark, even without installation of a new driver, NVIDIA GeForce FX 5900 traditionally beats ATI's counterpart. The thing is the benchmark uses a huge number of stenciled shadows which on NVIDIA boards require fewer passes and DO NOT use pixel and vertex programs, which creates an ideal environment for NVIDIA chips. The performance boost achieved due to the installation of a driver with the revised compiler is quite noticeable and increases the gap between the ATI counterpart.


Unreal II: The Awakening


Unreal II: The Awakening

This fresher revised version of the Unreal Tournament 2003 engine used in the game Unreal II: The Awakening demonstrates the leadership of ATI's board. This is most likely caused by the more complex geometry in the game. The new driver does not fix the situation, but noticeably reduces the gap at least in the most "playable" resolution to date. At higher resolution, ATI's board built on Radeon 9800 is unattainable for NVIDIA GPU's.


Final Fantasy XI Official Benchmark 2


Final Fantasy XI Official Benchmark 2

This is a new test in our set of benchmarks. As far as we know, the gaming engine uses neither pixel nor vertex programs of whatever versions. But since there is no officially approved information regarding that, our doubts and surmises will remain unsolved, and we'll go on worrying about the inability to find out the explanation of why ATI Radeon 9800 board beat NVIDIA GeForce FX 5900 =), while ForceWare 52.16 added much more score points in the test to ASUS V9950 board.


AquaMark 3


AquaMark 3

This is a very indicative shader benchmark to date, which gave food to loose talk regarding the optimizations both from NVIDIA and (!) ATI.

The results themselves create quite a dramatic situation for the ATI board. With Detonator 45.23, ASUS V9950 based on NVIDIA GeForce FX 5900 loses to the Canadian counterpart, but installation of the most recent NVIDIA ForceWare 52.16 drivers gives NVIDIA's offspring a substantial performance boost which is more than enough to leave ATI well behind.

We are also presenting for your judgment some screenshots with detailed results of this test so that to analyze the results for each of the video cards tested.

FX 5900 - Detonator 45.23 Radeon 9800 - Catalyst 3.9 FX 5900 - ForceWare 52.16

It's interesting to note that one of the most demanding tests in the "Large Scale Vegetation Rendering" package practically didn't respond to the change of the driver, nor did the "Massive Overdraw" test. The following tests proved to be the most sensitive to the change of the driver with the compiler code revised: "Dynamic Occlusion Culling", "Masked Environment Mapping" and "Large Scale Terrain Rendering" which showed a performance boost as high as 22% on the average.


Gun Metal Benchmark 2


Gun Metal Benchmark 2

Gun Metal Benchmark 2

In this pseudo-DirectX 9.0 benchmark that uses vertex programs of version 2.0 along with version 1.1 of pixel programs, the alignment of forces is not that straightforward. In low resolutions, the ATI board outperforms NVIDIA GeForce FX 5900 with Detonator 45.23 at both gaming tests at a small gap, but ForceWare 52.16 improves the situation. At high resolution, on the other hand, NVIDIA GeForce FX 5900 with both old and new version of the driver outperforms ATI Radeon 9800. Note that the performance boost attained due to the replacement of drivers is negligible, which is a bit strange considering that synthetic tests showed practically identical performance boost for both version 1.1 and version 2.0 pixel programs, so we can't attribute the small percentage in performance gain to the transition from Detonator 45.23 to ForceWare 52.16 and explain it by that the benchmark uses version 1.1 pixel programs, not 2.0.


HALO: Combat Evolved


Halo

Halo

It was interesting for us to follow the results of this regular DirectX 9.0 game in our list of benchmarks solely due to its possibility to forcedly enable whatever version of pixel and vertex programs in the game.

On the whole, the situation of the ATI board is as dramatic as it is for the case of AquaMark 3. With version 45.23, GeForce FX 5900 loses to ATI Radeon 9800, but ForceWare 52.16 comes in handy, and the situation radically changes both after using version 1.1 pixel programs and version 2.0 of the programs.

Another fact is also remarkable. If we look at the absolute FPS values for ATI Radeon 9800 using pixel and vertex programs of versions 2.0 and 1.1, we can see that the performance drop during transition from version 1.1 pixel and vertex programs to version 2.0 is quite insignificant. With NVIDIA GeForce FX 5900 using Detonator 45.23, the drop is there and it is quite essential. But once we estimate the figures produced with ForceWare 52.16, we can claim absolutely the same (in terms of percentage) performance drop during transition from version 1.1 to version 2.0 of pixel and vertex programs in NVIDIA GeForce FX 5900 as that seen in ATI Radeon 9800, which again is indicative of excellent job the programmers at NVIDIA have done. In fairness though, it's worth noting that we haven't noticed any slightest difference in quality in using both versions of pixel and vertex programs.


Half-life 2 Leaked Beta


Half-life 2 Leaked Beta

Half-life 2 Leaked Beta

I think it won't be an overstatement if I say that we EXPECTED the beta/alpha or whatever resembling Half-Life 2 would leak to the Internet =)). It's none of our business to comment on how the beta leaked to the Net. We are more interested in having a real DirectX 9.0 application that makes use of all the API potentials to the full and is in fact an outline of future DirectX 9.0 games. It's an ungrateful job commenting a very raw beta, since all may (and most likely will) radically change in the final release, but anyway let's get round to it =).

The Half-Life 2 engine is just the pure HLSV which doesn't bode any good to NVIDIA video cards. As our tests performed with two demo benchmarks (for which a special personal thankyou to Andrey Vorobyov who kindly presented the demo reels for tests) showed, NVIDIA card simply proved to be crushed by the ATI counterpart. Although ForceWare 52.16 shows a large performance boost, it doesn't improve the situation at all.


Image quality

Image quality: AntiAliasing 4x


As is easy to see, antialiasing is an easy job for NVIDIA chips in both Direct3D and OpenGL applications. The new NVIDIA ForceWare 52.16 merely accentuates that.

Image quality: Anisotropic Filtering 8x



The AF operation speed in Direct3D (Unreal Tournament 2003) for ATI Radeon 9800 and NVIDIA GeForce FX 5900 with Detonator 45.23 is about the same. But we produced a rather strange result when using ForceWare 52.16 - the performance dropped. Unfortunately, we had no chance to perform a repeated run of the test, so this incident remains unexplained. But at OpenGL (Return to Castle Wolfenstein), NVIDIA boards are traditional leaders.

Image quality: AntiAliasing 4x + Anisotropic Filtering 8x



In the long run, we establish a sure victory for NVIDIA.

Image quality: AntiAliasing 6x/8x + Anisotropic Filtering 8x/16x



In addition to the traditional testing for image quality, we decided to add the so-called "maximum quality" mode at which the maximum possible AF and AA modes came into play. For ATI chips, the maximum AA level was equal to 16x, that for AA - 6x. For NVIDIA chips, it was 8x for both modes, respectively.

Let's see the results. At Unreal Tournament 2003, NVIDIA is a sure leader, and at Return to Castle Wolfenstein ATI hold the same evident victory. We also can't help noticing the absolute FPS values in the games - they are at a level acceptable for the games.


Final Words

The release of new ForceWare 52.16 driver has essentially changed the alignment of forces both in the high-end boards and in the mid- and low-end niches of the market. At that, the ASUS V9950 (NVIDIA GeForce FX 5900) graphic board reviewed today is the most indicative.

Synthetic benchmarks that make active use of pixel and vertex shader processing techniques unanimously report of a substantial rise in operation speed for NVIDIA GeForce FX 5900 (as well as all the boards of the GeForce FX family) at handling shaders. The rise is especially noticeable in processing shaders of version 2.0, which has always been a bottleneck for boards built on NVIDIA GeForce FX chips. Nevertheless, at synthetic benchmarks GeForce FX 5900 boards even with ForceWare 52.16 installed can't compete on par with ATI Radeon 9800. Note that such an alignment of forces in synthetics is caused primarily due to that the synthetic benchmarks used by us are built on the Microsoft HLSL (High Level Shader Language), while ATI boards, as we mentioned it in the theoretical part of our review, handle them much more efficiently than NVIDIA boards do for which the ideal option is the customized approach in writing shader programs for the architecture of GeForce FX boards. Boards built on the GeForce FX chips handle the standard DirectX 9.0 code much worse than ATI Radeon boards do, and the new NVIDIA drives doesn't change the situation radically, but merely reduces the gap slightly.

In real-life applications, as our tests showed, the situation for NVIDIA is much more favorable in view of the released ForceWare 52.16 driver. Sometimes, it's just the installation of a new driver allowed the NVIDIA GeForce FX 5900 board to take the crown and leave ATI Radeon 9800 somewhere in the middle between the results for NVIDIA GeForce FX 5900 (with Detonator 45.23 and ForceWare 52.16). Due to known reasons, shader applications here are the most indicative. But we would like to note that it's the wide popularity of benchmarks and games used as benchmarks in our research that played one of the leading parts. The Way it's Meant to be PlayedWhat matters here is NVIDIA's program of cooperation with game developers (dubbed "The Way it's Meant to be Played") which is aimed at intense work with game developers to make the gaming engines more amenable to optimize for NVIDIA cards architecture. It that good or not? There is no one-one answer and can't be at all. The end user/gamer who is not into the details absolutely doesn't care how the maximum performance is attained by this or the other video cards manufacturer. By and large, it doesn't matter whether it will be achieved through tricky optimizations, or if the performance is originally high in the very architecture of the GPU. What really matters is the image quality. But here is another problem coming up: no matter how energetic NVIDIA is working with game developers, the company is unable to grasp ALL the developers who in turn would have to resort to writing the HLSV code which, as the practice of writing real gaming applications has shown a number of times, executes faster with ATI boards. That is why, in our view, NVIDIA chose a bit wrong policy in this case. There are many examples of that. Take, for instance, Half-Life 2 at which NVIDIA boards demonstrate appallingly low performance and rank on par with ATI's middle-end solutions Needless to say, Half-Life 2 is, without overstatement, a framework of future DirectX 9.0 games, and NVIDIA worked in quite close touch with Valve to optimize the game for the GeForce FX architecture. Of course, we can't claim to what extent the leaked beta version is optimized for NVIDIA boards or whether it is optimized at all, but it is a fact that GeForce FX 5900 is an ignominious failure at all the tests performed on the base of beta Half-Life 2 and it lags far well behind the ATI's contender. It is also a fact that game optimization for NVIDIA video cards is in practice a very troublesome job, which once again confirms the formerly made conclusions that NVIDIA can't artificially tune the performance of its cards to the "ATI's level".

With the release of new games that make increasingly intense use of version 2.0 pixel and vertex shader processing techniques, NVIDIA boards based on GeForce FX chips will look the more unconvincingly as compared to ATI boards (the examples are numerous). In our view, the most sound decision for NVIDIA would be the release of a new chip with the architecture pre-optimized for Microsoft HLSV (the issue of floating-point precision still remains open). To date, we can say for sure that all the owners of NVIDIA GeForce FX vide cards must install the new NVIDIA ForceWare 52.16 driver (the analysis of new versions of the driver, including unofficial ones - read in our further reviews), since the driver does optimize just the compiler itself and does not make any optimizations for a particular application, which is confirmed not only by gaming applications but by the synthetic benchmarks as well.

Read more on this topic:

VGA Roundup Q3`2003
FX 5900 versus Radeon 9800Pro
FX 5600Ultra versus Radeon 9600Pro
FX 5200 versus Radeon 9200

Benchmarking methods:

"AquaMark 3" Benchmarking Package
"Unreal Tournament 2003" as a benchmark
"DooM 3" as a benchmark
3DMark 2003: see the future
3DMark 2001 Pro Benchmarking package
"Max Payne" as a benchmark
"Serious Sam" as a benchmark

Copyright © 2005 Digital-Daily. All Rights Reserved.
contact - info@digital-daily.com