Memory technology evolution: an overview of system memory technologies technology brief, 8 edition Abstract.............................. 2 Introduction............................2 Basic DRAM operation ......................... 2 DRAM storage density and power consumption ................... 4 Memory access time......................... 4 Chipsets and system bus timing......................4 Memory bus speed...........................
Abstract The widening performance gap between processors and memory along with the growth of memory- intensive business applications are driving the need for better memory technologies for servers and workstations. Consequently, there are several memory technologies on the market at any given time. HP evaluates developing memory technologies in terms of price, performance, and backward compatibility and implements the most promising technologies in ProLiant servers.
Each DRAM chip contains millions of memory locations, or cells, which are arranged in a matrix of rows and columns (Figure 1). On the periphery of the array of memory cells are transistors that read, amplify, and transfer the data from the memory cells to the memory bus. Each DRAM row, called a page, consists of several DRAM cells.
are many mechanisms to refresh DRAM, including RAS only refresh, CAS before RAS (CBR) refresh, and Hidden refresh. CBR, which involves driving CAS active before driving RAS active, is used most often. Figure 2. Representation of a write operation for FPM or EDO RAM DRAM storage density and power consumption The storage capacity (density) of DRAM is inversely proportional to the cell geometry.
Figure 3. Representation of a bus clock signal Over the years, some computer components have gained in speed more than others have. For this reason, the components in a typical server are controlled by different clocks that run at different, but related, speeds.
additional data sections are accessed with every clock cycle after the first access (6-1-1-1) before the memory controller has to send another CAS. Figure 4. Burst mode access. NOP is a “No Operation” instruction. Clock Command Active Read Address Data Data Data Data...
Bank interleaving SDRAM divides memory into two to four banks for simultaneous access to more data. This division and simultaneous access is known as interleaving. Using a notebook analogy, two-way interleaving is like dividing each page in a notebook into two parts and having two assistants to each retrieve a different part of the page.
DIMM Configurations Single-sided and double-sided DIMMs Each DRAM chip on a DIMM provides either 4 bits or 8 bits of a 64-bit data word. Chips that provide 4 bits are called x4 (by 4), and chips that provide 8 bits are called x8 (by 8). It takes eight x8 chips or sixteen x4 chips to make a 64-bit word, so at least eight chips are located on one or both sides of a DIMM.
Parity and ECC DIMMs The ninth DRAM chip on one side of a DIMM is used to store parity or ECC bits. With parity, the memory controller is capable of detecting single-bit errors, but it is unable to correct any errors. Also, it cannot consistently detect multiple-bit errors.
Memory channel interleaving Multi-core processors running multi-threaded applications pose a significant challenge to the memory subsystem. The processor cores share the bandwidth of the memory bus; therefore, the multi-core processor’s performance is limited by the memory bus bandwidth. Even with sufficient memory bus bandwidth, the actual throughput of a single memory controller can create a bottleneck as it handles memory requests from multiple cores.
Advanced memory technologies Despite the performance improvement in the overall system due to use of SDRAM, the growing performance gap between the memory and processor must be filled by more advanced memory technologies. These technologies, which are described on the following pages, boost the overall performance of systems using the latest high-speed processors (Figure 9).
Double transition clocking Standard DRAM transfers one data bit to the bus on the rising edge of the bus clock signal, while DDR-1 uses both the rising and falling edges of the clock to trigger the data transfer to the bus (Figure 10).
DDR-1 DIMMs DDR-1 DIMMs require 184 pins instead of the 168 pins used by standard SDRAM DIMMs. DDR-1 is versatile enough to be used in desktop PCs or servers. To vary the cost of DDR-1 DIMMs for these different markets, memory manufacturers provide unbuffered and registered versions. Unbuffered DDR-1 DIMMs place the load of all the DDR modules on the system memory bus, but they can be used in systems that do not require high memory capacity.
DDR-3 DDR-3, the third-generation of DDR SDRAM technology, will make further improvements in bandwidth and power consumption. Manufacturers of DDR-3 will initially use 90 nm fabrication technology and move toward 70 nm as production volumes increase. DDR-3 will operate at clock rates from 400 MHz to 800 MHz with theoretical peak bandwidths ranging from 6.40 GB/s to 12.8 GB/s.
Fully-Buffered DIMMs Traditional DIMM architectures use a stub-bus topology with parallel branches (stubs) that connect to a shared memory bus (Figure 13). Each DIMM connects to the data bus using a set of pin connectors. In order for the electrical signals from the memory controller to reach the DIMM bus-pin connections at the same time, all the traces have to be the same length.
Consequently, JEDEC developed the Fully-Buffered DIMM (FB-DIMM) specification, a serial interface that eliminates the parallel stub-bus topology and allows higher memory bandwidth while maintaining or increasing memory capacity. FB-DIMM architecture The FB-DIMM architecture has serial links between the memory controller and the FB-DIMMs, which are connected in a daisy chain configuration (Figure 15).
Challenges The challenges for the FB-DIMM architecture include latency and power use (thermal load). Memory latency is the delay from the time the data is requested to the time when the data is available from the memory controller. The FB-DIMM architecture increases this latency in two ways: serialization and transmission.
Rambus DRAM Rambus DRAM (RDRAM) allows data transfer through a bus operating in a higher frequency range than DDR SDRAM. In essence, Rambus moves small amounts of data very fast, whereas DDR SDRAM moves large amounts of data more slowly. The Rambus design consists of three key elements: RDRAMs, Rambus application-specific integrated circuits, and an interconnect called the Rambus Channel.
With the high data rate of Rambus, signal integrity is troublesome. System boards must be designed to accommodate the extremely stringent timing of Rambus, and this increases product time to market. Additionally, each Rambus channel is limited to 32 devices, imposing an upper limit on memory capacity supported by a single bus.
For more information For additional information, refer to the resources listed below. Resource description Web address JEDEC Web site http://www.jedec.org HP Advanced Memory http://h18004.www1.hp.com/products/servers/technology/whitepapers/adv- Protection technology.html#mem Fully-Buffered DIMM http://h18004.www1.hp.com/products/servers/technology/whitepapers/adv- technology in HP ProLiant technology.html#mem servers Call to action Send comments about this paper to TechCom@HP.com ©...