|3DNews Vendor Reference English Resource -
All you need to know about your products!
NVIDIA GeForce GTX 280 fast and hotAuthor: Anton Rachko
In its today's official press-release, NVIDIA announced a release of the new generation of GeForce GTX 200 family graphic cards based on the second generation of the unified visual computational architecture - GeForce GTX 280 and GeForce GTX 260. The first sample of the video card built on the GeForce GTX 280 chip has already been to our test lab, and we are looking forward to sharing our first test results with the readers, and our impressions of NVIDIA's new architecture.
Prior to moving to the graphs and findings, we introduce you to the description of the architecture of GeForce GTX 200 family graphic chips, as well as new NVIDIA's new and renewed technologies, and a number of new initiatives first announced in our today's review. For those who are impatient and can't help scrolling down to the conclusions, we'd like to point out the following: this time, NVIDIA has not only announced a renewed architecture, but in some way a new philosophy of the graphic architecture with far-reaching consequences.
First, the technical traits. Being a logical extension of GeForce 8 and GeForce 9 series which used to be NVIDIA's first generation of the unified visual computational architecture, the new products of the GeForce GTX 200 family are made on the base of the second generation of this architecture.
NVIDIA's GeForce GTX 280 and 260 graphic processors are the most massive and complicated graphic chips of the known so far: just imagine - 1.4 billion transistors in each! The most powerful solution - GeForce GTX 280 - offers 240 shader processors, 80 texture processors, support for up to 1 GB of video memory. See the following table for detailed specifications of GeForce GTX 280 and GeForce GTX 260 chips.
In fact, the modern graphic core of the GeForce GTX 200 family can be seen as a universal chip that supports two different modes graphic and computational. The architecture of GeForce 8 and 9 family chips are represented as Scalable Processor Arrays, (SPA). The architecture of GeForce GTX 200 family chips is based on the updated and improved SPA which comprises a number of Texture Processing Clusters in the graphic mode, or "Stream processing clusters" in the parallel computing computing mode. At the same time, each TPC unit comprises an array of Streaming Multiprocessors, each containing eight processor cores also referred to as Streaming Processors or Thread Processors. Each SM also includes processors of texture filtration for the graphic mode, also used for various filtering operations in the computational mode.
Below is a block diagram of GeForce 280 GTX in the traditional graphic mode.
On switching to the computational mode, the hardware thread dispatcher (top) controls the TPC threads.
Here is the TPC cluster upon a closer look: distributed memory for each SM; each processor core of SM is able distributing data among other SM cores with distributed memory, without having to address the external memory subsystem.
Therefore, NVIDIA's unified shader and computer architecture uses two absolutely different computational models: for handling the TPC, the MIMD (multiple instruction, multiple data) is used; for SM computations - SIMT (single instruction, multiple thread), advanced version, SIMD (single instruction, multiple data).
As to the general specifications, as compared to the previous generations of chips, the GeForce GTX 200 family offers the following advantages:
This is how the list if main specifications of the new chips looks:
We note it separately that DirectX 10.1 is not supported by the GeForce GTX 200 family. Among the causes is the fact that in developing chips of the new family, a decision was made to concentrate efforts not on support for DirectX 10.1, still little in demand, but on the improvement of the architecture and performance of chips.
The implementation of NVIDIA PhysX technology based on the package of physical algorithms is a powerful physics engine for real-time computations. Currently, support for the PhysX is implemented in over 150 games. Combined with the powerful GPU, the PhysX engine provides a substantial increase in the physics computational power, especially at creation of explosions with dispersion of dust and debris, characters with complex mimics, new types of weapons with fantastic effects, realistically worn and torn fabrics, fog, and smoke with dynamic streamlining of objects.
There is one more no less important novelty new power-saving modes. Due to use of the high-precision 65-nm process technology and new circuitry, it is possible to attain more flexible and dynamic control over the power consumption. For instance, the power consumption of the GeForce GTX 200 family graphic chips in the stand-by mode or in the 2D mode amounts to merely 25W; while playing a Blu-ray DVD - about 35W; at the full 3D load, the TDP does not exceed 236W. The GeForce GTX 200 chip is even able getting completely disabled due to support for the HybridPower technology when used with motherboards based on HybridPower chipsets nForce with integrated graphics (e.g., nForce 780a or 790i), while the threads of graphics of low intensity simply computed by the GPU integrated into the motherboard. Apart from that, the GPU of the GeForce GTX 200 family also offers special power-consumption units which are aimed at disabling blocks of the graphic processor not engaged at the moment.
The user can configure the system based on two or three video cards of the GeForce GTX 200 family in the SLI mode when using motherboards built on the respective nForce chipsets. For the traditional Standard SLI mode (with two video cards), an about 60-90% performance gain in games is declared; in the 3-way SLI mode the maximum FPS at the maximum screen resolutions.
Another innovation is in support for the new DisplayPort interface with resolutions over 2560 x 1600, with 10-bit color space (previous generations of GeForce graphics offered support for 10-bit data processing, but only 8-bit composite RGB colors were displayed).
Within the announcement of the new series of the GeForce GTX 200 family graphic processors, NVIDIA suggests to take an entirely different look at the role of the CPU and the GPU in a modern balanced desktop system. Such optimized PC based on the concept of heterogeneous computations (i.e., computations of the stream of heterogeneous multi-type tasks), in the opinion of specialists at NVIDIA, offers a much more balanced architecture and substantially greater computational capabilities. It means a combination of the CPU with the comparatively ,moderate performance of the most powerful graphics and even the SLI system, which allows attaining a peak performance in the most demanding games, 3D, and media applications.
In other words, the concept can be briefly formulated as follows: the CPU in the modern system takes on the service functions, while the burden of demanding computations is placed on the graphic system. Approximately the same conclusions (albeit more complex and numerically substantiated) are seen from the series of our reviews devoted to the investigations of dependence of performance on the key components of the system - see the reviews CPU-boundedness of the video system. Part I - Analysis; CPU-boundedness of the video system. Part II Effect of the CPU cache and the speed of the RAM; Bot-dependence, or why 3D games need a powerful CPU; CPU-boundedness of the video system. Transition range. "Critical" point of the CPU clock speed.
However, intensive computations with modern graphic video cards is no longer a novelty, but just with the emergence of GeForce GTX 200 family graphic processors NVIDIA expects a substantial rise of interest towards to the CUDA technology.
CUDA (Compute Unified Device Architecture) - is a computational architecture aimed at solving complex tasks in the consumer, business, and technical spheres - in any applications making intensive use of data with NVIDIA graphic processors. From the viewpoint of the CUDA technology, the new GeForce GTX 280 graphic chip is nothing more than a powerful multi-core CPU (having hundreds of cores!) for parallel computations.
As was stated above, the graphic core of the GeForce GTX 200 family can be represented as a chip supporting both the graphic and computational modes. In one of these modes "computational" - GeForce GTX 280 becomes a programmable multiprocessor with 240 cores and 1 GB of dedicated memory kind of a dedicated supercomputer whose performance is a bit less than a teraflop, which raises the effectiveness of handling applications multiple times, which do a good job parallelizing data, e.g. video encoding, scientific computations.
Graphic processors of the GeForce 8 and 9 families became the first on the market which supported the CUDA technology, and by now over 70 mln pcs have been sold so far, with the interest to the CUDA project is constantly going up. For details of the project and downloadable files required to start the work, are available here. As an example, the below screenshots demonstrate the patterns of computational performance gain produced by independent users of the CUDA technology.
Summing up with our brief investigation of architectural and technology improvements implemented in NVIDIA's new generation of graphic processors, let's point out the major aspects. The second generation of the unified architecture of visual computations implemented in the GeForce GTX 200 family is a substantial step forward as compared to the previous generations GeForce 8 and 9.
As compared to the previous leader GeForce 8800 GTX, the new flagship GeForce GTX 280 offers 1.88 times more processor cores; is capable of processing about 2.5 times more threads per chip; offers the doubled size of file registers and support for floating-point computations at doubled precision; supports 1 GB memory with 512-bit interface; is equipped with more efficient command dispatcher and improved capabilities of communication among the chip's elements; offers the improved module for Z-buffer and compression, support for 10-bit color palette, etc.
For the first time, the new generation of GeForce GTX 200 chips is originally positioned not only as a powerful 3D graphic accelerator but also as a serious computer solution for parallel computations.
GeForce GTX 280 with 1 GB are expected to appear in the retail priced at about $649, with the new products based on GeForce GTX 260 having 896 MB of memory priced at about $449 (or even $399). Quite soon, we'll be able to verify how the recommended prices match the real retail prices, so according to all the data the announcement of the GeForce GTX 200 family is far not "on paper" because solutions based on these chips have been announced by many partners of NVIDIA, and in the nearest future the novelties will appear on the retail shelves.
We now move on to describing the first GeForce GTX 280 which arrived at our test lab, as well as the test results.
The Leadtek 280GTX arrived in the OEM make, i.e. without a box. Externally, the new product looks like something intermediate between GeForce 9800GTX and GeForce 9800GX2. Both sides of the board are covered with a black housing made of plastic and metal.
The package bundle of the video card includes:
The "dual" SLI connector that allows merging 3 video cards, and a connector to apply S/P-DIF sound signal are covered with black caps. We have already seen such a solution in GeForce 9800GX2, and that time an approach like that imparted the video card the look of a standalone device, something more than just a video card.
The new product is equipped with two 8- and 6-pin power connectors which have been left open.
On the reverse side, there is a 3-Way SLI connector. Configurations of two video cards enable only one of its subconnectors, whereas a dedicated SLI-bridge that enables the connector to the full is required for configurations of three video cards.
Leadtek GTX280 is equipped with two DVI and one S-video connectors. Near the S-video connector, there is a power indicator of the video card, which may be quite useful in case of issues with power supply, which we also came across. But we'll talk about that a bit later.
The cooling system comprises two parts, a radiator with a fan and a metal plate that rejects heat from memory chips from the reverse side of the board.
Leadtek GTX280 is very similar to the predecessor GeForce 8800GTX/Ultra. The graphic chip GT200, like G80, is equipped with a metal lid that prevents the chip from mechanical damages. The space for a metal frame around the GPU, the way it is in the predecessor, has been preserved , but the frame itself is missing. The number of memory chips has gone up to 16 which were made in parts of 8 pieces on each side of the board. The memory bus width has gone up to 512 bit.
Like in GeForce 8800GTX/Ultra, the developers at NVIDIA have used a discrete VIVO chip which is in fact a RAMDAC required to support analogous displays (D-sub, S-video). Such an approach was necessitated by the induction from the GPU shader unit, the way it is now. Interestingly, such an issue has been eliminated in G92. It is not clear why engineers at NVIDIA have not inherited the experience of designing the G92 chip over to the GT200 chip.
The GT200 chip is of impressive dimensions - the 1.5 bln transistors hidden under the metal lid make themselves felt. The chip is of revision A2.
The memory chips are made by Hynix. The nominal access time of the memory chips is 0.8 ns, which is equivalent to the clock speed 2400 MHz.
The cooling system is based on five heat pipes. Three of them distribute heat from the copper base over the aluminum fins which are blown by the fan to the right, the fourth one helping uniformly distribute heat, with the fifth one rejecting heat from the power supply subsystem over to the main radiator.
Cooling system efficiency; power-saving system
Riva Tuner 2.09 already supports the new GT200 chip.
So we had no issues with measuring the temperature of the video card. As before, we'll be verifying the efficiency of the cooling system using the Firefly Forest test from the 3DMark 06 suite. The test conditions are: the resolution 1600x1200, 4-X FSAA, and 16-X AF. After nine runs of the test we produced the following results:
The graphic chip heated up to 85 C, the rotational speed of the cooler went up from 500 RPM to 1100 RPM. Note that despite the substantial rise in the fan's rotational speed, the cooling system of the video card has remained really quiet.
Video cards of the GTX200 series can boast an effective power-saving system which allows providing below 70W power supply in the idle state, or in the 2D mode. That has been attained to the substantial reduction in clock speeds of the video card in 2D mode to 300/100 MHz for the graphic processor and to 200 "true" MHz for the video memory. In the 3D mode, the frequencies of the video card rise up to 601/1296 MHz for the GPU and 2214 'true' MHz for the video memory.
A bit earlier, we mentioned possible issues with the power supply which we came across in the power supply unit. On launching 3D applications, our video card turned off, with its indicator changing the color from green to red. As it turned out, it was the PSU Hiper 880 W to blame for. The issue was solved through replacement of the PSU with Thermaltake Toughpower 750 W.
Benchmarking and conclusions
We'll be testing Leadtek GTX280 on a test bench of the following configuration:
For Leadtek 280GTX, we used drivers of version 177.34. We start examining the test results with 3DMark test suites.
In 3DMark tests, the new product has not demonstrated its speed capabilities to the full, thus yielding to GeForce 9800GX2. However, the dots on i's may be put with gaming tests in which we were running with 4x FSAA and 16x AF enabled.
At Call of Duty 4, Leadtek 280GTX demonstrated a bit better results than GeForce 9800GX2. The results for other video cards turned out to be lower, but frankly no so considerably.
On the other hand, at Crysis the Leadtek GTX280 showed all its worth to the other participants of the tests. GeForce 8800 Ultra was left well behind, let alone the GeForce 8800 GTS 512. The only real rival for GeForce 280GTX became GeForce 9800GX2, but we should not forget that the latter is some sort of a combination of two G92 chips, that is, in fact it is two video cards "all in one".
Finally, we can experience the comfortable values in all the three resolutions at Crysis powered by Windows Vista. GeForce 9800GX2 was unable to compete against the new product, which the SLI was to blame for. Alas, but in the new operating system this mode has not been brought to perfection.
At Need for Speed Pro Street Racing, GeForce 9800GX2 took the first place. That's it - sometimes even "top-ends" yield to their predecessors.
At Call of Juarez, the good luck again abandoned the new product, and the victory was behind GeForce 9800GX2. However, as we stated earlier, the latter uses the SLI mode, which imposes some restrictions on it. These are not only the issues of compatibility to some games, which is nowadays a rare thing, but also an issue with image output to no more than a single monitor.
On switching to Windows Vista, the alignment of forces in Call of Juarez almost has not changed, and the first place was preserved by GeForce 9800GX2.
At Need for Speed Carbon, Leadtek 280GTX won back the crown of leadership. GeForce 9800GX2 has not lagged behind much, but anyway the GT200 chip proved stronger than two G92 chips in the SLI mode.
At Prey, Leadtek 280GTX has not lost its leadership, however we expected somehow better results from it. This game is sensitive to the video memory bandwidth which should grow up in the new product due to expanding the memory bus to 512 bit.
As we can see, no revolution has happened: the new GPU GT200 and the GeForce 280GTX tested today is a further progress NVIDIA's unified shader architecture. The new GPU offers more functional blocks than its predecessors do, which gives it the right to be referred as the most powerful GPU to date. It should be noted that apart from high performance in 3D applications, the new chip claims for the first place at distributed computations, and currently this is the project Folding@Home. Nor we should forget about the technology for computations with NVIDIA Cuda graphic processors, as well as acceleration of "physics" in games. The latter will be fully compatible to the AgeiA PhysX engine, and only a special driver will be needed, whose time of release is already at hand. However, it is still unclear how popular the new chip GT200 will be in non-graphics applications, but that aspect will be clarified in our forthcoming reviews once we've got all the required drivers and utilities on hand.
- Discuss the material in the conference