The Video Card and 3D Accelerator Guide
What does a Video Card do, what affects the performance?
We have to realize that the data as soon as it leaves the CPU has to go through 4 steps until it finally reaches the monitor:
|From the bus into the Video Chipset, where it's processed (digital data)|
|From the Video Chipset into the Video Memory, to store a mirror of the screen picture here (digital data)|
|From the Video Memory into the Digital Analog Converter (= RAM DAC), to read out the screen mirror and convert it for the monitor (digital data)|
|From the Digital Analog Converter to the Monitor (analog data)|
As you can see, except the step from the RAM DAC to the monitor, each step is some kind of a bottleneck and crucial for the overall performance of the graphical subsystem. The slowest step is the one which determines the overall speed. Lets now discuss, what these single steps mean and what actually happens:
The transfer of data between CPU and the Video Chipset
This bottleneck is mainly depending on the bus type and speed, the
mainboard and its chipset. The fastest bus system at present is the PCI bus, so you will
have slower performance with VL bus, ISA, EISA and NuBus (only for Macs). The PCI bus
however doesn't always run at highspeed of 33 MHz, so with a Pentium 75, P90, P120, P150
you'll have a PCI bus speed of only 25 (P75) or 30 Mhz, which already here decreases the
performance of the graphical subsystem. Later chipsets also offer faster PCI performance,
so the Intel 430HX chipset offers a faster PCI performance than the Intel Triton 430FX
chipset. As the name already says, it's not a bus, it's a port. This means you can only
run one device on it, the graphic device. It can runs at 66 MHz and can tranfer data at
the rising and falling edge of a clock circle (x2 mode). This makes it at least double as
fast as PCI, but this does not necessarily result in double performance of AGP graphic
cards, because the data transfer bandwidth is not the limiting factor of current graphic
The transfer of data between Video chipset to Video RAM and from Video RAM to the RAM DAC
I have been taking these two steps together because here lies the key for the performance of a video card as long as you exclude special chipset features. The big problem of a video card is that the poor video memory lies in between two very busy devices and has to serve both of them all the time. Each time the screen has to change the chipsets has to alter the video memory (and it changes continuously, e.g. mouse pointer, cursor blinking, etc.). Also the RAM DAC has to read out the video memory continuously, to maintain the screen. You can see, the video memory is caught in between them and here all these smart ideas like using VRAM, WRAM, MDRAM, SGRAM, EDO RAM, or increasing the video bus size like 32 bit, 64 bit and now 128 bit come in.
The higher the screen resolution and the higher the colour resolution,
the more data has to be transferred from the video chipset to the video memory and the
faster the data has to be read by the RAM DAC to be sent to the monitor. You can see that
the video memory has to be accessed all the time by the chipset and the RAM
DAC. Normal dynamic RAM can only be accessed at a max. frequency, so after the video
chipset finished accessing (r/w) the video memory, the RAM DAC has to wait until it's
allowed to read and vice versa.
The Video Card Manufacturers found 3 different ways to fight that problem:
1. Here comes the idea in, to make the video RAM dual ported. This means, that the video chipset reads or writes from/to the video memory via one port, but the RAM DAC reads out the video memory through an independent second port. The video chipset doesn't have to wait for the RAM DAC anymore and the RAM DAC doesn't have to wait for the video chipset anymore. This kind of video memory is called VRAM. It's obviously more complicated by having double the ports and therefore more expensive to produce. That's the simple reason why VRAM cards are more expensive and also faster. The WRAM used by Matrox and a few other cards is also dual ported, but organised somehow smarter so that it's faster than VRAM but also 20% cheaper to produce. If you should wonder why typically cards which offer a high refresh rate and high colour depth have these two kinds of memory, you should consider the following. A higher refresh rate means that the RAM DAC feeds the monitor with a complete screen picture more often than at a lower refresh rate. Therefore the RAM DAC has to read out the video memory more often. This only can be achieved with either VRAM/WRAM, by accessing the video memory via the second port, or by a considerable decrease of video performance of DRAM/EDO cards. If you don't believe it, just run your favourite video benchmark at a low and then at a high refresh rate - you'll see a considerable difference if you've got a DRAM/EDO card. The same is valid for a higher colour depth. At a 8 bit colour resolution (=256 colours) a 1024x768 screen needs 786,432 bytes to be read by the RAM DAC to send a complete screen picture to the monitor. At 24 bit colour resolution (16,777,216 colours) the same screen needs 2,359,296 bytes to be read by the RAM DAC - and this takes more time. This btw is also the reason why you often can't have the same high refresh rate at true colour as you had at low colour in cheaper cards.
2. The other way to fight this problem is to increase the video memory bus size. Years back everybody was amazed by the new 32 bit video cards. These cards had a 32 bit data path between video chipset, video memory and RAM DAC. With 32 bit data path you can transfer 4 bytes in one go. Later there came the 64 bit video cards = 8 byte in one go, which are the standard at present and only recently some new chipsets were born, to have a 128 bit data path = 16 byte in one go. It's easy to see, that video cards with both (VRAM/WRAM & wide data path) will be the best performers, but with a really wide data path you could get around VRAM/WRAM. Now by getting completely excited about these wide data paths we shouldn't forget one very important thing: a normal 8x1Mbit memory chip, as used on most video cards has a data bus of 32 bits !!! Therefore even a 128 bit chipset can access this memory chip only 32 bit wide !!! This is the reason why all 64 bit video cards are a lot slower if only fitted with 1 MB of video RAM ! Don't get a 64 bit video card with less than 2 MB !!!! Chipsets with 128 bit data path usually need at least 4MB local memory, otherwise their performance is cut in half. The NVidia Riva chipset e.g. is able to talk to only 2 MB as well, via a 64 bit data path. Riva cards with only 2 MB are therefore castrated. However, due to the architecture of the card you won't use 128 bit data path even if you upgrade to 4 MB, because the data path just stays the same. This is probably the case in many video cards, so be careful not getting a 1 MB 64 bit card or a 2 MB 128 bit card!
3. The third and to us maybe most obvious way to get the video RAM accessed faster is to simply increase the clock speed of the video chipset/video RAM/RAM DAC. Years back the video chipsets ran at clock speeds high above the mainboard memory bus speeds already. SGRAM is nowadys running at 100 MHz clock and some graphci chip manufacturers are already talking of 125 or even 133 MHz video RAM clock using 7 ns SGRAM. SGRAM is nothing but a special graphics version of SDRAM (synchronous DRAM), so we know this is able to run at clock speeds up to 133 MHz.
Summarizing all these performance aspects, we learn that for optimal performance we should have an AGP or at least PCI system with the latest chipset and 33 MHz PCI bus speed, a video card with a high performance chip and either SGRAM or WRAM, a wide data path or a high clock frequency of the video chipset or best all these three things together!
Now which parts of this video card/monitor-combo plays which role?
The Monitor plays a
crucial role in terms of sharpness, brightness, stability and max. screen resolution of
the picture. If you want to have a high quality picture you're asking for a high quality
monitor with a big screen, at least 17". Your video card can be as good as it wants,
as long as the monitor is crap the screen will still look horrible.
On the video card side, the RAM DAC is the part that is responsible to send the data for a decent picture to the monitor. Two factors are important, the quality of the RAM DAC, e.g. is it stand alone or integrated into the video chipset, and the max. pixel frequency, measured in MHz. A 220 MHz RAM DAC is not neccessarily but most likely better than a 135 MHz one and it certainly offers higher refresh rates - will tell you why further down on this page. RAM DACs tend to be included into the graphic chips more and more now, since it can decrease costs of graphic cards considerably and the quality of modern internal RAM DACs is coming close to the external ones.
The Amount of Video RAM
is responsible for the colour resolution in combination with the screen resolution in
2D, in 3D, which is getting more and more important, the amount of local card memory
is also determining the maximal 3D resolution. 3D needs much more local
memory than 2D for the same resolution. This is due to the fact that 3D needs a front, a
back and a Z-buffer. The front buffer holds what you see, the back buffer holds the next
picture while it's being processed and the Z-buffer holds the 3rd dimension value
(z-value, as x and y make two dimensions, z holds the third). That is the reason why a
card with 4 MB local memory can offer a resolution of 1600x1200 at high color (16 bit) in
2D, because it needs 1600x1200x2 byte = 3.7 MB. However games that are using z-buffer
information (and the good ones do, offering you real 3D) can only run at 800x600 x 16 bit
color x 16 bit z-buffer, 800x600x6 byte (2 byte color front buffer, 2 byte color back
buffer, 2 byte 16 bit z-buffer) = 2.74 MB. 3D at 1024x768 would require 4.5 MB and can't
be displayed by a 4 MB 3D card.
|The Type of Video RAM in combination with the Video Chipset is responsible for all performance aspects of the video card/monitor-combo. However we shouldn't forget that the bus system (PCI/VL/ISA/EISA/MCA/NuBus) and therefore also the mainboard and the mainboard chipset are responsible for how fast the data reaches the video card. AGP, the advanced graphic port can offer much higher transfer bandwidth than PCI.|
Basics and 2D Considerations
The video card, which is inevitable in each computer system, is responsible to process the special video data received from the CPU into a format that a monitor can understand to make a rastered picture on the screen out of it. More or less the monitor screen is still the output device of a computer system, it's the most important port through which we humans get data transferred from the computer. Therefore the Video Card/Monitor-Combo is one of the most important parts of our computer and we should take very good care of it.
Now what do we have to ask of this Video Card/Monitor Combo ?
Picture Quality is very important, so that it's not gonna be a pain for our eyes to sit and look at the screen. Here are the factors for picture quality:
Speed of the video device. Due to the work we do with the computer, the picture changes eversooften and these changes should take place as fast as possible. It depends on what in particular we do on the computer, so there are also some catagories:
2D performance, also
called GUI or Windows performance, due to Windows being the most popular GUI
OS. This used to be the most inportant performance so far, it determines how fast your
office applications perform, e.g. how fast you can scroll text/graphics or how fast you
can open and close new windows. Since the days of the Matrox Millennium 2D performance of
graphic cards got pretty close to the limit and nowadays the latest graphic cards don't
differ much in 2D performance anymore, most of them are pretty fast, faster than the old
standard of the Matrox Millennium.
3D performance is the
most important topic to distinguish between different graphic cards today. Cards without
3D acceleration will soon disappear from the market and Matrox had to learn this the hard
way when releasing the Millennium II with hardly any 3D features. They lost their market
leader position in an instance. S3 used to be a big player in the graphic chip market, but
the mediocre 3D performance of their chip got them almost completely out of business.
Similar things seem to happen to Cirrus Logic and others.
DOS performance, which
nowadays is to be equalized with game performance for DOS based games for all
professional applications today are running under a graphical user interface operating
system. DOS based games are disappearing as well, so that this performance is getting less
and less important.
|Video Display performance in my eyes is still not that important to most of us, but whoever likes to watch and process videos on his computer will have to look for a fast video processing card. DVD will probably bring a significant change here.|
Considerations for Gamers
In case gaming is most that you do on your system and you couldn't care less for Windows NT, true color and OpenGL, you want to go for a pure 3D gaming card or get an add on card.
Direct3D or Proprietary 3D
You'll now have to decide what kind of games are important to you. Currently the graphically best games are often designed for a special graphic chip, or at least they look best with this one chip. The number one supported 3D graphic chip is nowadays the 3Dfx Voodoo, found on add on cards like the Orchid Righteous3D, Diamond's Monster3D and several others. It looks as if upcoming games will still support this particular chip and since the Voodoo 2 is already on the horizon, you can expect 3Dfx's 'Glide' engine staying supported by many games for a long time. Alternatively to a special 3D chip support, many new games are using Direct3D's new features quite heaviliy, so that it depends on how well the 3D card's driver translates Direct3D to their proprietary engine. PowerVR's PCX1 and PCX2 chips are quite powerful 3D chips, but the cards that use them are highly incompatible. I've seen only very few games that run on this chip properly. If the PCX engine is used directly, the games look awesome though. The only 3D chip to my knowledge, that doesn't have a dedicated 3D engine, but is using Direct3D as its API directly, is NVidia's RIVA 128 chip, currently the fastest Direct3D chip available on the market. The RIVA 128 is wonderful for Direct3D games, but games that are only supporting a bunch of proprietary 3D engines will not run on the RIVA 128. The future will bring almost any game in Direct3D, which will help NVidia's RIVA a lot.
It is not easy to measure pure 3D performance, because there are so many different ways a 3D engine can be used. Most official benchmarks are using the Direct3D engine of DirectX, like e.g. ZD's 3D Winbench or VNU's Final Reality. These benchmarks can only show you the card's Direct3D performance, hence how well the driver translates Direct3D into the chip's own 3D engine. NVidia's RIVA 128 doesn't need this 'translator', it uses Direct3D as its own API. This is only one reason why the RIVA scores by far best in Direct3D benchmarks. However some games written for that specific 3D engine of a chip can run much faster than the 3D Winbench score would let you expect them to. VQuake for Rendition's Verite 1000 is one good old example. The Verite 1000 was never scoring well in 3D Winbench, but VQuake looked good and ran fast.
Now 3D performance is only one thing, 3D quality is another. There are a lot of 3D features used nowadays, most of them supported and used by DirectX 5, but there will be even more 3D features implemented in DirectX 6. A 3D chip can only support a special amount of 3D features, others are either not supported at all, or special drivers are used that emulate these features. In my latest test I came across only one chip that supports virtually every current 3D feature properly and this is 3Dfx's Voodoo chip. The big let down of the Voodoo chip leads to the other aspect for quality, the 3D screen resolution. The Voodoo chip can only do 640x480 in case of 2 MB frame buffer memory (4 MB cards), as in most of the Voodoo cards, or maximal 800x600 in case the card comes with 6 MB RAM (e.g. Quantum 3D Obsidian 100SB) , 4MB hereof as frame buffer. NVidia's RIVA 128 chip has got a simular problem, it can't support more than 4 MB onboard memory, only good for a 3D resolution of maximal 800x600. Now it doesn't have to be that bad, since we are quite pleased with our good old television as well, which has a lower resolution than 800x600. The 3D chip and the system CPU have to be powerful enough for running smoothly at this resolution as well. However, I've seen 'Forsaken' at 1024x768 on a PII 300 with an ATI XPERT card and it looks pretty awesome.
How Powerful is Your CPU?
Some 3D chips are taking a lot of workload from the CPU, others want decent CPU performance for its operation. PowerVR's PCX chips want at least a Pentium MMX 166 for decent quality, 3Dfx's Voodoo lets games run fast even on systems with weak CPUs and Rendition's new Verite 2100/2200 chip gives a huge improvement to slow CPUs, but fast CPUs are reaching its limit and don't really benefit of this chip anymore. NVidia's RIVA chip seems to scale linearily from 6x86 CPUs up to Pentium II CPUs. Under Direct3D its always the fastest chip.
Another thing you obviously want to take in consideration is the price you've got to pay for the card. Many cards that have good 2D performance as well are pretty expensive. This is often due to the more expensive memory they are using, but it could also be the additional features like e.g. TV out or video compression. Cards with more memory are also more expensive, but they offer higher resolutions in 3D, higher color depth and higher resolutions in 2D. Make sure you don't pay for something you won't need.
Considerations for Professionals
If you are working on your computer professionally one of the most important things is the picture quality. This is achieved by a high quality and high clocked RAMDAC. Most of the new graphic chips have included the RAMDAC internally, thus saving cost, but the best picture quality is still produced by an external RAMDAC. The most popular cards with external RAMDACs are Matrox Millennium I and II and Number Nine's Revolution 3D. These cards are still offering you the sharpest and cleanest picture on the screen. If you have got an expensive monitor, you want to use the high refresh rates your monitor supports. As a simple rule you should at least have a refresh rate of 85 Hz available for all the reolutions you want to use. Refresh rates of 120 and more sound nice, but they won't give you much of an advantage anymore. Responsible for this is again the RAMDAC. The higher its clock rate, the higher are possible refresh rates.
The 2D performance was what used to determine the quality of a graphic card in the past. Now 2D acceleration seems pretty close to the limit and almost all cards are offering a good 2D performance, at least at 16 bit color modi. Good 2D performance in true color is a virtue that's pretty rare still though. Matrox and Number Nine always used to fight about the 2D performance crown and it hasn't changed much still. If you are working really professionally at your computer, you can impossibly use Microsoft's mouse driver collection called Windows 95. Hence you are either using Windows NT or some really good OS that's not from the monopolist. NT drivers are very important for professional cards and the NT performance should be more important than the Windows 95 performance. There is often quite a bit of a difference between NT and 95 performance.
For people that use a real graphic workstation with CAD and/or 3D rendering, SGI's OpenGL as well as Heidi are of major importance. Nowadays if you hear 'PC' and 'OpenGL' one company comes into your minds ... 3DLabs. I will not discuss the real high end chips of 3DLabs, since this is off topic on this website, but 3DLabs' new Permedia 2 chip is one of the most impressive graphic chips on the market today in my opinion. For 3DLabs the Permedia 2 is nothing but a low end chip for the mass market, but amongst its competitors it's quite a gem in terms of professional work. In mid to higher prized systems cards with the Permedia 2 will offer you the best OpenGL performance combined with a good 2D and a fairly impressive Direct3D performance.
The introduction of real 3D-accelerators started with the 3Dfx Voodoo chip, and that was the time when the first hard core 3D gamers started to evolve, producing high expectations as well as crazy hypes and some pathological fanaticism too. Voodoo (1) and 3Dfx remained at the top of the 3D graphics scene for more than a year, completely underestimated by the big players in the graphics scene at this time, S3 and Matrox. 3Dfx won the second round also, the Voodoo2 was a worthy successor of Voodoo (1). The idea of Voodoo2 was the same as for Voodoo (1), offer an add-on 3D-card to the normal 2D card, use a pass-through cable and offer a lot more power than Voodoo1. Voodoo2 was using two parallel texture units, a lot more graphics memory and the user had the chance to run two of those cards in parallel, called 'SLI'-mode, offering at that time mind blowing 3D performance. This was certainly no cheap solution, but the hard core gamers jumped on this bandwagon right away, making Voodoo2 another huge success. It took six more months until the first decent 2D/3D-cards became available, NVIDIA shipped RIVA TNT and 3Dfx tried their slightly castrated version called 'Voodoo Banshee'. People who believed in the marketing hype, saying that those two chips were a really new generation of 3D chips, were disappointed, since Voodoo2 SLI was still offering the best performance, Banshee and TNT had only the chance to come close in some areas.
Voodoo3, TNT2, Permedia3 and what not are the first 3D-chips that show a real further development of the technologies implemented into Voodoo2. What took no less than six chips in the days of Voodoo2 SLI is now condensed into a TNT2-chip, beefed up with a faster RAMDAC, good AGP-support, support of larger texture-sizes and several other goodies, running at a much higher clock speed than Voodoo2 as well. Really new concepts are still not to be found though and can also not be expected before the second half of 1999. We are still working with the second generation of 3D-chips, but even this generation has still got a lot to offer and you can be sure that it will be a hot summer in the 3D-arena this year.
We can expect the first announcements of third-generation 3D-chips within the next couple of months and there's supposed to be a new quantum leap in 3D-performance and quality. Fill rates of 500-600 Mpixel/s, triangle rates of 20 million/s and more as well as the support of new quality enhancements will ask a lot more from game developers, CPU, memory, RAMDAC and even display technologies than we can imagine right now.
The Second Generation of 3D Chips
Let's focus on those chips, that provide today's top performance, TNT2, Voodoo3, Rage128, G400, PowerVR SG, Permedia3, etc. Voodoo3 was the first of those chips and the expectations were high when Voodoo3 was announced for the first time, since 3Dfx had the history of supplying two top performers in the past, the Voodoo in 1997 and the Voodoo2 in 1998. At Comdex 1998 the journalist and card-makers weren't too impressed with the Voodoo3-features though. Even the following marketing campaign wasn't able to convince more than a few fanatic 3Dfx followers, when 3Dfx threw in all of their arrogance and ignorance towards the criticism of 3D analysts and journalists about missing features of Voodoo3. What is sold as 'Voodoo3' today should rather carry the name 'Banshee2', since it's a straightforward development of Banshee with performance measures that are close to Voodoo2. We certainly don't want to forget improvements as digital flat panel support, higher resolutions and better RAMDAC and the price/performance ratio of Voodoo3 is also way better than Voodoo2 SLI. However, the absolute performance as well as the long list of missing features has disappointed a lot of hard core gamers.
3D Card Requirements for Power Gamers
OpenGl (ICD) and DirectX (6.x) Support
The support of those two is of the highest importance for optimal 3D-game support. The majority of 1st person 3D-action shooters is based on OpenGL, Quake2, Quake Arena, Unreal, Unreal Tournament, Halflife and many more are the typical examples. The majority of all other games is covered by DirectX, particularly flight simulators and racing games are mainly based on this 3D-interface from Microsoft. A 3D-card that doesn't support those two 3D-platforms is pretty much out of the question for any gamer.
Fill Rates in excess of 300 Mpixel/s or Mtexel/s
The fill rates quoted by the chip vendors should be taken with a decent grain of salt. Firstly those data are 'synthetic', comparable to MIPS or MFLOPS of CPUs. The fill rate can be calculated from the 3D-chip clock, the number of independent pipelines of the chip, the graphics memory bandwidth and of course the hardwired features of the chip. The color depth, Z-buffer and particularly rendering quality varies significantly in between the vendors. A good example for this is TNT2, which has a lower theoretical fill rate than Voodoo3, but it scores higher frame rates in complex 3D scenes. Still the fill rate can give you some kind of idea of what kind of 3D-performance you can expect from a 3D-chip.
Up to 32 MB of Onboard Memory at Clock Speeds of Way over 100 MHz
The onboard memory of each 3D-card has a very high impact on the overall 3D-performance. 16 MB is enough to run 3D applications at up to 1024x768 in 32 bit color depth (this is not valid for PVRSG, which needs less memory for that). Realistic 3D scenes at 1280x1024x32 bit color require 32 MB and that's what is state-of-the-art today. The memory clock still varies a lot between the different chips and so does the width of the memory interface, where currently 128 bit is state-of-the-art, moving towards 'dual-256 bit' = 2 interfaces with 128 bit or real 256 bit soon. The graphics memory bandwidth is an important limiting factor of the 3D-performance and you can say that a vendor who can safely handle highest memory clocks is well prepared for future developments.
More and more games are offering a true-color mode due to quality reasons, meaning that the displayed frames are using a 24 bit deep color depth (RGB, 16.7 million colors). This means that a huge amount of data needs to be computed for each frame, but the performance of the latest 3D-chips is starting to be capable of supplying this kind of computing power. A 3D-architecture that is limited to only 16 bit, as in case of the Voodoo3 requires a reduction of color information and is thus decreasing the quality and reality level. 32-bit color rendering with 32-bit Z-buffering will stay the optimal implementation for quite some time to come.
Fast RAMDACs with at least 250 MHz or Digital Flat Panel Support
The RAMDAC is the link between the digital display information stored in the frame buffer and the analog CRT. The higher the bandwidth of the RAMDAC, the higher is the number of pixels that can be displayed on the CRT each second. This means that a faster RAMDAC is crucial for decent refresh rates, its quality is crucial for the quality of the displayed picture on the screen and only fast RAMDACs can supply really high resolutions of more than 2000x1500, which is most important for 2D applications. If the user prefers a flat panel screen, he should make sure that there is no digital-analog-converter (DAC) and then a analog-digital-converter (ADC) between the graphics memory (or frame buffer) and the flat panel, because it is not only a waste of silicon, but it also decreases the display quality by a serious amount. A flat panel requires digital data and should thus be supplied with nothing else than that. That requires a digital output of the graphics card, where DFP is one of the standards. For optimal 2D-quality you should therefore either look for a fast RAMDAC for a CRT or a digital output for a flat screen.
AGP vs. PCI is a question that doesn't need to be asked anymore today. AGP has succeeded in the upper performance area quite a while ago already. When AGP 4x will be released later on this year, we may finally see the visible advantage that many of us are still missing. A today's 3D-card should support AGP 2x to be at least able to handle the data bandwidth that's required for 'several million polygons/s'. Games with large textures are coming up, FlightSim 99 is only one example, so that the 3D-chip should also be able to do AGP-texturing, which is one of the things that Voodoo3 is not able to do.
So far about the requirements of a well performing 3D-chip or 3D-card. Of course there are a lot of other effects that 3D-chips can do today as well, like fog, transparency or different kind of shadings, but those 'common' features don't really need to be mentioned again. However, there are still cases where features are implemented at a low quality level, when 3D-chip makers rate frame rates over image quality.
Realism is what we really want, the users and game developers have to tell the 3D-chip makers what they expect. The hardware developers have made another leap forward in terms of realism with the latest generation of 3D-chips and 3D-games should follow soon to take advantage of it. A look at the CPU-roadmaps shows that we can expect a lot more performance in very short time.