CPU-boundedness of the video system Part I - Analysis
How to make the right estimation of video card's performance
This statement may seem far-fetched. It seems like what is the problem about that - just take a topmost test setup, install a video card and run it in al the modes. This is normally the way it is done. In fact, the question is about the interpretation of produced results. But all is not as easy as it seems. Let's once again look at our graph to look for the "pitfalls". So as not to overload the picture, we left only the lines for three modes.
 Graph 3
The red line depicts the results of video card tests in the 1024õ768 NO AA/AF mode. What are we measuring in this case? The video card's performance? That's unlikely. See how the results vary strongly as the CPU clock speed changes. But the video card is the same, so the performance can't have changed in it! Conclusion: in this mode we in fact measure the performance of the CPU at generating frames, because the results are practically in linear dependence on the CPU clock speed.
Now look at the brown line which depicts the 1600õ1200 NO AA/AF mode. The load upon the video card has gone up substantially, and we no longer see a straight line. Nevertheless, the scattering of results remains significant. So, which value reflects the video card's performance? As we can see, the correct answer must definitely involve the test conditions and, albeit not fully, the CPU performance (clock speed) at which the result was produced, which is a must.
Finally, the third line (green) depicts the 1600õ1200 4AA/16AF mode. See that starting from the CPU clock speed 1600 MHz the results shown by the video card no longer depend on the CPU capacity. That is why we can state with confidence that the FPS level at which the horizontal "shelf" is just reflects the performance of a video card! Hence, we formulate
The criterion for correct comparison of video cards performance (with other things being equal) – it is admissible to compare only those performance values (FPS) shown by video cards which are equivalent to the horizontal levels on the CPU-boundedness graph.
To that end, we need to use a test setup with a powerful enough CPU and select the test mode in a way to produce the horizontal "shelf" of results. The resultant "shelf" is just the performance level of a video card in the given mode.
Interpretation of results shown by multi-GPU systems
You must be already familiar with the methods for merging performance of the video cards – SLI and CrossFire, and you know that the sense of these technologies is to boost the performance of the video subsystem. In simple terms (again looking at Figure 1), two video cards are much faster at "painting" the "skeleton" of a frame than a single video card. We intentionally don't mention the phrase "twice as fast" because the ordinary arithmetic does not count here, and "two" is not always twice faster than "one". Among the hindrances are the overheads for load distribution between two video cards, time spent for synchronization, etc. So the twofold performance boost gained through merging two video cards is possible only in theory, but in practice the maximum boost is limited to approximately 80-90%, or amounts to 1.8-1.9 times. Even the 80% boost gained through installation of a second video card is far not always demonstrated. Using the above graphs, we can now explain why that happens. We take graph 3 and after adding a few lines show how that can be done.
 Graph 4
As before, the green, brown and red lines depict the results shown by the video card in various graphic modes depending on the CPU clock speed. The blue straight line denotes "the line of maximum results" which is a result of performance restriction from the CPU part. The grey double-sided arrows depict the theoretical possible boost on building up the performance of the video subsystem. As is easy to see, the "boost margin" is minimum for the red line because it almost merges with the blue straight line. Hence, we see a minimum or zero performance boost even if the capacity of the video subsystem is increased through use of the SLI or CrossFire technologies. For the brown line which depicts the more demanding mode, the «boost margin» is somehow higher but it is anyway smaller than the theoretical limit 80-90% ( 120 fps + 80% ~ 220 fps, but we get merely about 50 fps). The most favorable situation is seen for the most demanding graphic mode – 4AA/16AF at 1600õ1200. In this case, the "boost margin" is even greater, so a combination of two video cards may show its full worth. As you can see, to make the most of the capabilities of SLI and CrossFire we would need a powerful CPU (the "boost margin" goes up towards the axis Õ) as well as tests in demanding graphic modes.
Certainly, all these conclusions were intuitive still on the date of announcement of technologies for merging performances of video cards, and we simply demonstrated it vividly where the boost should be looked for.
You might ask - what are those orange dotted lines doing on the graph? Assume that the lowest of these dotted lines depicts the results demonstrated in even more demanding modes (say, 4AA/16AF at 2048x1536). The upper dotted line runs at a level which is 80% higher, that is, depicts the performance of two video cards in the SLI or CrossFire technologies (the lower dotted arrow). Then, what does the upper dotted arrow show? Of course, it shows the remaining «boost margin» which can be implemented, e.g. with … Quad SLI. As you can see, the search for a real performance boost in this case requires an even more demanding graphic mode and of course a powerful CPU. (Note – the example presented for Quad SLI does not depict real values of performance values for this combination and is merely a demonstration of that the approaches used in this article can be successfully applied to such video solutions).
Verification of the findings with other 3D-applications
Up till now, we have built our reasoning based on merely a single 3D application, namely – Half-Life 2 with the demo scene «d1_canals_09 3dnews02». How valid are the results we produced in other applications? Let's check that up. Below, we bring in two consolidated graphs similar to graph 2, but for DOOM 3 and F.E.A.R. games, using the demos integrated into these games.
 Graph 5 As you can see, the overall picture is very similar what we saw in Half-Life 2. Evidently, the absolute FPS values are different but the overall behavior of the lines is preserved.
 Graph 6
F.E.A.R. is so "hard" that even with the rather powerful 7800GTX we almost immediately produce the horizontal "shelves", and just at the NO AA/AF modes, that is, with the FSAA/AA modes disabled. Therefore, to find the "line of maximum possible results" we had to use the lowest available resolution - 640x480 (the dark green line on the graph). As regards much higher resolutions, then some «roughness» of the lines is caused by that the test integrated into the F.E.A.R. game outputs integer values, which under small absolute values gives an essential relative error.
Finally, the most popular synthetic tests of 3DMark. As an example, we took 3DMark’05.
 Graph 7
As it turned out, with the rise of CPU clock speed the results for GeForce 7800GTX (green line) under standard settings in 3DMark’05 turn into a «shelf». According to the criterion for correct comparison of video cards performance that we produced, that means the performance of GeForce 7800GTX was measured correctly in this test. That means it will correctly compare the "weaker" video cards with 3DMark’05.
I believe it is now clear why we decided not to bring in the results produced in 3DMark’06. Since this benchmark includes CPU tests, then we can't produce the horizontal "shelf" of results on the CPU-boundedness graph, thus the correct comparison of video cards' performance will be questionable.
Coming back to graph 7. In order to find the «line of maximum results» in this test we used Radeon X1900XTX (since 3DMark’05 is more favorable to Radeon’s) and tested it at 320õ240 (with the other test settings left unchanged). Although the resultant red line is not geometrically straight, it is quite suitable for the role of the «line of maximum results». As you can see, with the use of CPU Athlon 64 4000+ running at 2400 MHz the maximum number of "marks" is at about 12000, or 12500, if we follow the approximating curve. Up till now, none of the systems which we tested so far (7900GTX-SLI, CrossFire, Quad-SLI) has overcome the 12000 «marks» bar in 3DMark’05 on our test setup, which is a proof of the conclusions we made.
 |
Top Stories: |
 |
 |
 |
MoBo:


|  |
 |
 |
VGA Card:


|
 |
 |
 |
CPU & Memory:

|
|