CPU-boundedness of the video system Part II – Effect of the CPU cache memory and memory speed
Preface to the second part
Since the publication of the article «CPU-boundedness of the video system. Part I – Analysis» we received extensive feedback from you, dear readers. Along with the questions as to whether the second part would be released, there were a lot of remarks regarding the presented graphs and doubts of their trustworthiness in some specific cases.
Today, we are giving you an explanation of some fine points which attracted a special interest among the public but were not described in detail in the first part. We are examining the effect of the CPU cache memory size and the memory operating speed on the performance in 3D games. We'll also come close to the matter of comparing platforms on the whole. Well, let's go ahead.
«Non-fitment» ¹1. Or - where the "zero" has gone.
Note the graph presented below.
This graph was taken from the first part of the review. We see that the lines reflecting performance of the video card in various modes converge to the same slanting line as the clock speeds of the CPU go down. The "non-fitment" is in that if we try extending this approximating line up to crossing with the "FPS" axis, the we'll see that the straight line does not arrive at the origin of coordinates but somehow higher.than that. It turns out that at zero clock speed of the CPU we can play at as high as 15 FPS, doesn't it?!
How can that be? If we distract from the conditions of tests at which we were producing the results, then in theory such a situation is anyway possible. Assume that we load some data together with shader programs which the video chip is to execute into the memory of the video card, and let all run on its own - the CPU is not needed here. Examples of using video processors for mathematical computations are known. But in our conditions of tests it is physically impossible to produce such a result. Some part must be in charge of computing the position of objects in a game scene instead of the CPU! How should the graph of CPU-boundedness behave with the CPU clock speed tending to zero?
The attempt to run an experiment on the real hardware is complicated by the fact that much lower values of the CPU multiplier are no longer available, but if we take some "just a rather weak processors» – we then change the platform, that is the conditions of tests, and therefore we won't be able to adequately compare the results of tests. What to do then?
Let's try to predict the «behavior» of the CPU-boundedness curve using logics and considering the real "behavior" of a typical personal computer. To that end, we would have to go deeper into the principles of operating systems with preemptive multitasking. Don't be scared by this long term. Most likely, for work and games you use just an operating system like that. I mean the commonly known operating system - Windows XP. Apart from Windows XP, there is Windows2000 as well as cloned of Linux which fall within the group of preemptive multitasking operating systems. The trait of these operating systems which is essential for our consideration is the way they make use of the hardware resources, namely - the distribution of processor time for simultaneous execution of several tasks. While sitting at the personal computer, we think that all is run simultaneously - downloading files from the Internet,playing music, and recording a CD, however, all is somehow different in reality. All the applications which you started on your computer are run in a strict sequence! There is no contradiction about that. Since there is only one processor, all the applications are run one after another, in "pieces". But these pieces are so tiny, and the operating system is switching between them so fast that the human eye is unable to perceive all that, which creates an illusion of simultaneously running applications. To cut it short and simply put, all the operating time of the CPU is divided into portions, or time "quanta". Then these time "quanta" are "issued" to applications, sort of "here you are, use the processor for a couple of milliseconds". In so doing, the core of the multitasking operating system itself consumes some part of these "quanta" of the processor time so that the system services could operate, and needs some time to think which "quantum" should be given to a certain application. That is, there arise some "wasteful" (in terms of the user application) losses of the processor time which are directed for servicing the operating system itself.
All the above has the most direct relation to our "non-fitment". Let me explain why. If the operating system requires some fixed number of CPU time "quanta" to provide its own operation, then it is evident that with the decrease of CPU clock speed the number of vacant «quanta» which may be provided to the application (in our case - a 3D game) will drop faster than the CPU clock speed. This can be explained with different words either. Suppose that with the CPU clock speed 100 MHz the CPU performance is enough to service only the operating system Then, to produce the equivalent CPU clock speed, that is, the number of MHz available to the application, we should deduct those 100 MHz allocated to the operating system from the real CPU clock speed. In this case, it turns out that at CPU clock speed 1000 MHz the "operating system correction factor" amounts to 10%, at CPU 200 MHz – as much as 50%, and at CPU 100 MHz we get 0 FPS. On the below graph, we illustrated all the above stated.

The red dash line depicts the presumptive behavior of the CPU-boundedness behavior with the CPU clock speed tending to zero. Attention! This line is drawn arbitrarily and is not a reflection of any experimental data!
You may find it strange as to why we devote so much time and attention to this matter. Processors of such low clock speeds are no longer used in personal computers, and at first glance such an experiment, if we succeeded to perform it, would not bring any practical advantage. That is true, and not quite so.
Let us ask ourselves the question "how to exclude or minimize the effect of the operating system upon the operating speed of an application?" That is - it it really possible to produce a graph of CPU-boundedness that runs through the origin of coordinates? Running ahead, we say that it is possible if the operating system were run ... on another processor. We'll come back to this point a bit later.
Non-fitment ¹2. A nonlinearity of the «line of maximum possible results»
We have just examined the behavior of "the line of maximum possible results" with the decrease in CPU clock speed. Now let's see what happens if we move to the opposite direction and increase the CPU clock speed.
In fact, the essence of the "non-fitment" is well seen on the same above graph. Namely, no matter at which point of the "line of maximum possible results" we build the tangent, further results deviate from the tangent downwards as the CPU clock speed goes up. Why doesn't the graph follow the linear law but starts "bending" towards the X axis? Let us bring in a few causes which can explain this phenomenon.
The first cause is the consumption of the CPU capacity for the needs of the operating system. It's the matter which we already discussed.
The second cause is the effect of the CPU multiplier. We wonder what sense does the CPU multiplier make? The sense is that if the CPU clock speed is increased at the expanse of the multiplier, we somehow raise the data-processing power of the CPU, but the data needs to be fetched to the CPU core, whereas the CPU bus speed remains unchanged. For tasks with a large amount of data which need to be processed and which do not fit within the CPU cache memory, there may come up the moment when the CPU has already calculated the available data and is still waiting for another portion of data. That is the processor starts running idle, which can be regarded as a reduction of the "efficient" operating clock speed of the CPU.
The third possible cause is the pattern of processor time distribution between the graphic driver (executed on the CPU) and the computations of games (also run on the CPU). The situation looks a bit entangled since both the tasks use up the CPU, and the graphic driver can be attributed to both a component of the operating system (in terms of architecture) and to an important link in terms of running a 3D application.
Among the other possible causes is the latency and bandwidth of the memory, CPU bus, etc.
The list of presented causes is not final and exhaustive, and if necessary we could find a few more factors because of which the behavior of the "line of maximum results" will be different from the straight linear. Determining the extent of the effect of each of these causes and search for the bottlenecks is quite an extensive topic for research.
Prior to moving to specific matters, let us formulate the generic postulate:
In a multi-factor environment, the linear dependence of a quantity on a certain parameter can be achieved only if there are no restrictions from all the other parameters.
Or, in other words, it's just the parameter on whose dependence the graph is built should be the most limiting.
As applied to our research into CPU-boundedness of the video system, this means that apart from the CPU the performance of other components should be sufficient and not build up any restrictions. That is, a video card has to be powerful enough and be able operating in the lightest of the modes (e.g., 640x480 instead of 1600õ1200 with other settings being equal), the RAM should run at the maximum speed, with the effect of the operating system minimized, etc.
Somehow or other, in practice, as the CPU clock speed goes up, we anyway see the rise of the "line of maximum possible results". Although this boost is not strictly linear, then to estimate the limit of possible performance of the platform in 3D applications it is applicable enough.
Then we'll be examining a few factors which affect the computer performance in 3D applications. But we'll be talking about the things which we can control to some extent through a choice of the CPU and the type of the RAM, that is, through a choice of the «platform» for running 3D-applications.
 |
Top Stories: |
 |
 |
 |
MoBo:


|  |
 |
 |
VGA Card:


|
 |
 |
 |
CPU & Memory:

|
|