RAM Guide
Due to cost considerations, all but the very high-end (and very expensive) computers have utilized DRAM for main memory. Originally, these were asynchronous, single-bank designs because the processors were relatively slow. Most recently, synchronous interfaces have been produced with many advanced features. Though these high-performance DRAMs have been available for only a few years, it is apparent that they will soon be replaced by at least one of the protocol-based designs, such as SyncLink or the DRDRAM design from Rambus, Inc. and Intel.
Basic DRAM operation
A DRAM memory array can be thought of as a table of cells. These cells are comprised of capacitors, and contain one or more ‘bits’ of data, depending upon the chip configuration. This table is addressed via row and column decoders, which in turn receive their signals from the RAS and CAS clock generators. In order to minimize the package size, the row and column addresses are multiplexed into row and column address buffers. For example, if there are 11 address lines, there will be 11 row and 11 column address buffers. Access transistors called ‘sense amps’ are connected to the each column and provide the read and restore operations of the chip. Since the cells are capacitors that discharge for each read operation, the sense amp must restore the data before the end of the access cycle.
The capacitors used for data cells tend to bleed off their charge, and therefore require a periodic refresh cycle or data will be lost. A refresh controller determines the time between refresh cycles, and a refresh counter ensures that the entire array (all rows) are refreshed. Of course, this means that some cycles are used for refresh operations, and has some impact on performance.
A typical memory access would occur as follows. First, the row address bits are placed onto the address pins. After a period of time the RAS\ signal falls, which activates the sense amps and causes the row address to be latched into the row address buffer. When the RAS\ signal stabilizes, the selected row is transferred onto the sense amps. Next, the column address bits are set up, and then latched into the column address buffer when CAS\ falls, at which time the output buffer is also turned on. When CAS\ stabilizes, the selected sense amp feeds its data onto the output buffer.
Page Mode Access
By implementing special access modes, designers were able to eliminate some of the internal operations for certain types of access. The first significant implementation was called Page Mode access.
Using this method, the RAS\ signal is held active so that an entire ‘page’ of data is held on the sense amps. New column addresses can then be repeatedly clocked in only by cycling CAS\. This provides much faster random access reads, since the row address setup and hold times are eliminated.
While some applications benefit greatly from this type of access, there are others that do not benefit at all. The original Page Mode was improved upon and replaced very quickly so you will likely never see any memory of this type. Even if you do, it wouldn’t be worth even getting it for free, considering the advantages of later access modes.
Fast Page Mode
Fast Page mode improved upon the original page mode by eliminating the column address setup time during the page cycle. This was accomplished by activating the column address buffers on the falling edge of RAS\ (rather than CAS\). Since RAS\ remains low for the entire page cycle, this acts as a transparent latch when CAS\ is high, and allows address setup to occur as soon as the column address is valid, rather than waiting for CAS\ to fall.
Fast Page mode became the most widely used access method for DRAMs, and is still used on many systems. The benefit of FPM memory is reduced power consumption, mainly because sense and restore current is not necessary during page mode access. Though FPM was a major innovation, there are still some drawbacks. The most significant is that the output buffers turn off when CAS\ goes high. The minimum cycle time is 5ns before the output buffers turn off, which essentially adds at least 5ns to the cycle time.
Today, FPM memory is the least desirable of all available DRAM memory. You should only consider using this if it is either free, or your system does not support any of the later memory types (such as a 486 based system). Typical timings are 6-3-3-3 (initial latency of 3 clocks, with a 3-clock page access). Due to the limited demand, FPM is actually more expensive now than most of the faster memories now available.
HyperPage Mode (EDO)
The last major improvement to asynchronous DRAMs came with the Hyperpage mode, or Extended DataOut. This innovation was simply to no longer turn off the output buffers upon the rising edge of /CAS. In essence, this eliminates the column precharge time while latching the data out. This allows the minimum time for /CAS to be low to be reduced, and the rising edge can come earlier.
In addition to a 40% or greater improvement in access times, EDO uses the same amount of silicon and the same package size. EDO has been shown to work well with memory bus speeds up to 83MHz with little or no performance penalty. If the chips are sufficiently fast (55ns or faster), EDO can be used even with a 100MHz memory bus. One of the best reasons to use EDO is that all of the current motherboard chipsets support it with no compatibility problems, unlike much of the synchronous memory now being used.
Even with all the stated advantages, EDO is no longer considered mainstream. Most manufacturers no longer produce it, or have limited production. It is only a matter of time before the prices begin to rise, and the equivalent size SDRAM module will be less expensive.
If you already own EDO memory, there is no real reason to jump to SDRAM unless you require bus speeds above 83MHz. With a typical EDO timing of 5-2-2-2 at 66MHz, there is almost no noticeable improvement with SDRAM over EDO, and at 83MHz it is still negligible. If you require 100MHz bus operation, EDO will lag far behind current SDRAM in performance even if it does operate at that speed due to the need for 6-3-3-3 timings. On the other hand, with EDO being phased out, you will likely find SDRAM to be equal to or even lower in price.
Burst EDO (BEDO)
Burst EDO, while a good idea, was dead before it ever was born. The addition of a burst mode, along with a dual bank architecture would have provided the 4-1-1-1 access times at 66MHz that many expected with SDRAM. Burst mode is an advancement over page mode, in that after the first address input, the next 3 addresses are generated internally, thereby eliminating the time necessary to input a new column address. Unfortunately, Intel decided that EDO was no longer viable, and SDRAM was their preferred memory architecture so they did not implement support of BEDO into their chipsets. In fact, several large memory manufacturers had put considerable time and money into the development of SDRAM over the past decade, and were not very happy with the BEDO design.
Except for support of bus speeds of 100MHz and faster, BEDO would probably have been a much faster and more stable memory than SDRAM. Essentially, BEDO lost support as much for political and economic reasons as for technical ones, it seems.
Synchronous Operation
Once it became apparent that bus speeds would need to run faster than 66MHz, DRAM designers needed to find a way to overcome the significant latency issues that still existed. By implementing a synchronous interface, they were able to do this and gain some additional advantages as well.
With an asynchronous interface, the processor must wait idly for the DRAM to complete its internal operations, which typically takes about 60ns. With synchronous control, the DRAM latches information from the processor under control of the system clock. These latches store the addresses, data and control signals, which allows the processor to handle other tasks. After a specific number of clock cycles the data becomes available and the processor can read it from the output lines.
Another advantage of a synchronous interface is that the system clock is the only timing edge that needs to be provided to the DRAM. This eliminates the need for multiple timing strobes to be propagated. The inputs are simplified as well, since the control signals, addresses and data can all be latched in without the processor monitoring setup and hold timings. Similar benefits are realized for output operations as well.
JEDEC SDRAM
All DRAMs that have a synchronous interface are known generically as SDRAM. This includes CDRAM (Cache DRAM), RDRAM (Rambus DRAM), ESDRAM (Enhanced SDRAM) and others, however the type that most often is called SDRAM is the JEDEC standard synchronous DRAM.
JEDEC SDRAM not only has a synchronous interface controlled by the system clock, it also includes a dual-bank architecture and burst mode (1-bit, 2-bit, 4-bit, 8-bit and full page). A ‘mode register’ that can be set at power-on and changed during operation controls the burst mode, burst type (sequential or interleave), burst length and CAS latency (1, 2 or 3).
CAS Latency is one of several performance related timings for SDRAM. This measurement is the time it takes to strobe in the Row Address, and to activate the bank. When a burst read cycle is initiated, the addresses are set up and RAS\ and CS\ (chip select) are held low on the next clock cycle (rising edge of CLK), thereby activating the sense amplifiers on the bank. A period of time equal to tRCD (RAS\ to CAS\ delay) must pass after which CAS\ and CS\ are held low (again, at the next clock cycle). After the time period for tCAC (column access time) has passed the first bit of data is on the output line and can be retrieved (at the next clock cycle). The basic rule is that CAS latency times the clock speed (tCLK) must be equal or greater than tCAC (or CL x tCLK >= tCAC). This means that the column access time is the limiting factor for CAS Latency.
SDRAM was initially introduced as the answer to all performance problems, however it quickly became apparent that there was little performance benefit and a lot of compatibility problems. The first SDRAM modules contained only two clock lines, but it was soon determined that this was insufficient. This created two different module designs (2-clock and 4-clock), and you needed to know which your motherboard required. Though the timings were theoretically supposed to be 5-1-1-1 @ 66MHz, many of the original SDRAM would only run at 6-2-2-2 when run in pairs, mostly because the chipsets (i430VX, SiS5571) had trouble with the speed and coordinating the accesses between modules. The i430TX chipset and later non-Intel chipsets improved upon this, and the SPD chip (serial presence detect) was added to the standard so chipsets could read the timings from the module. Unfortunately, for quite some time the SPD EEPROM was either not included on many modules, or not read by the motherboards.
SDRAM chips are officially rated in MHz, rather than nanoseconds (ns) so that there is a common denominator between the bus speed and the chip speed. This speed is determined by dividing 1 second (1 billion ns) by the output speed of the chip. For example a 67MHz SDRAM chip is rated as 15ns. Note that this nanosecond rating is not measuring the same timing as an asynchronous DRAM chip. Remember, internally all DRAM operates in a very similar manner, and most performance gains are achieved by ‘hiding’ the internal operations in various ways.
The original SDRAM modules either used 83MHz chips (12ns) or 100MHz chips (10ns), however these were only rated for 66MHz bus operation. Due to some of the delays introduced when having to deal with the various synchronization of signals, the 100MHz chips will produce a module that operates reliably at about 83MHz, in many cases. These SDRAM modules are now called PC66, to differentiate them from those conforming to Intel’s PC100 specification.
PC100 SDRAM
When Intel decided to officially implement a 100MHz system bus speed, they understood that most of the SDRAM modules available at that time would not operate properly above 83MHz. In order to bring some semblance of order to the marketplace, Intel introduced the PC100 specification as a guideline to manufacturers for building modules that would function properly on their upcoming i440BX. With the PC100 specification, Intel laid out a number of guidelines for trace lengths, trace widths and spacing, number of PCB layers, EEPROM programming specs, etc.
There is still quite a bit of confusion regarding what a ‘true’ PC100 module actually consists of. Unfortunately, there are quite a few modules being sold today as PC100, yet do not operate reliably at 100MHz. While the chip speed rating is used most often to determine the overall performance of the chip, a number of other timings are very important. tRCD (RAS to CAS Delay), tRP (RAS precharge time) and CAS Latency all play a role in determining the fastest bus speed the module will operate on to still achieve a 4-1-1-1 timing.
PC100 SDRAM on a 100MHz (or faster) system bus will provide a performance boost for Socket 7 systems of between 10% and 15%, since the L2 cache is running at system bus speed. Pentium II systems will not see as big a boost, because the L2 cache is running at ½ processor speed anyway, with the exception of the cacheless Celeron chips of course.
DDR SDRAM
One limitation of JEDEC SDRAM is that the theoretical limitation of the design is 125MHz, though technology advances may allow up to 133MHz operation. It is obvious that bus speeds will need to increase well beyond that in order for memory bandwidth to keep up with future processors. There are several competing new standards on the horizon that are very promising, however most of them require special pinouts, smaller bus widths, or other design considerations. In the short term, Double Data Rate SDRAM looks very appealing. Essentially, this design allows the activation of output operations on the chip to occur on both the rising and falling edge of the clock. Currently, only the rising edge signals an event to occur, so the DDR SDRAM design can effectively double the speed of operation up to at least 200MHz.
There is already one Socket 7 chipset that has support for DDR SDRAM, and more will certainly follow if manufacturers decide to make this memory available. In this industry, many times it is the first to market that gains the support, rather than the best technology.
Enhanced SDRAM (ESDRAM)
In order to overcome some of the inherent latency problems with standard DRAM memory modules, several manufacturers have included a small amount of SRAM directly into the chip, effectively creating an on-chip cache. One such design that is gaining some attention is ESDRAM from Ramtron International Corporation.
ESDRAM is essentially SDRAM, plus a small amount of SRAM cache which allows for lower latency times and burst operations up to 200MHz. Just as with external cache memory, the goal of a cache DRAM is to hold the most frequently used data in the SRAM cache to minimize accesses to the slower DRAM. One advantage to the on-chip SRAM is that a wider bus can be used between the SRAM and DRAM, effectively increasing the bandwidth and increasing the speed of the DRAM even when there is a cache miss.
As with DDR SDRAM, there is currently at least one Socket 7 chipset with support for ESDRAM. The deciding factor in determining which of these solutions will succeed will likely be the initial cost of the modules. Current estimates show the cost of ESDRAM at about 4 times that of existing DRAM solutions, which will likely not go over well with most users.
Protocol Based DRAM
All of the previously discussed DRAM have separate address, data and control lines which limits the speed at which the device can operate with current technology. In order to overcome this limitation, several designs implement all of these signals on the same bus. The two protocol based designs currently getting the most attention are SyncLink DRAM (now called SLDRAM due to trademark issues) and Direct Rambus DRAM (DRDRAM) licensed by Rambus, Inc.
DRDRAM
Intel has placed their money on the proprietary memory design developed by Rambus, Inc. On the surface, this looks to be a very fast solution for system memory due to its fast operation (up to 800MHz). The reality is, however, that the design is only up to twice as fast as current SDRAM operation due to the smaller bus width (16 bits vs. 64 bits).
Despite the claims from Intel and Rambus, Inc., there are some potentially serious issues which need to be addressed with this technology. The higher speeds require short wire lengths and additional shielding to prevent problems with EMI. In addition, latency times are actually worse than currently available fast SDRAM. Since most of today’s applications do not actually utilize the full bandwidth of the memory bus even today, simply increasing the bandwidth while ignoring latency issues will likely not provide any real performance improvements. In addition, processors operating with 800MHz bus speeds will certainly require more than double the current memory bandwidth.
While these issues are serious enough, the biggest drawback to the technology is that it is proprietary technology. Manufacturers wishing to implement a solution with DRDRAM will be required to pay a royalty to Intel and Rambus, Inc., and will also have no real control over the technology. This is not an attractive outlook for most memory manufacturers who have no desire to essentially become chip foundries.
SLDRAM
Many memory manufacturers are putting their support behind SLDRAM as the long-term solution for system performance. While SLDRAM is a protocol-based design, just as RDRAM is, it is an open-industry-standard, which requires no royalty payments. This alone should allow for lower cost. Another cost advantage for the SLDRAM design is that it does not require a redesign of the RAM chips.
Due to the use of packets for address, data and control signals, SLDRAM can operate on a faster bus than standard SDRAM – up to at least 200MHz. Just as DDR SDRAM operates the output signal at twice the clock rate, so can SLDRAM. This puts the output operation as high as 400MHz, with some engineers claiming it can reach 800MHz in the near future.
Compared to DRDRAM, it seems that SLDRAM is a much better solution due to the lower actual clock speed (reducing signal problems), lower latency timings and lower cost due to the royalty-free design and operation on current bus designs. It appears that even the bandwidth of SLDRAM is much higher than DRDRAM at 3.2GB/s vs. 1.6GB/s
Though Intel initially intended to support only DRDRAM in future chipsets, competing chipset manufacturers, memory manufacturers and pressure from end users may force them to include support for SLDRAM as well. If the marketplace can successfully influence Intel to provide this support, we may actually see a situation where the best technology wins over marketing hype.