|
Ece 4100/6100 Advanced Computer Architecture Lecture 11 dram prof. Hsien-Hsin Sean Lee
|
tarix | 26.10.2017 | ölçüsü | 501 b. | | #13553 |
|
ECE 4100/6100 Advanced Computer Architecture Lecture 11 DRAM School of Electrical and Computer Engineering Georgia Institute of Technology
Reading Section 5.3 Suggested Readings
Main Memory Storage Technologies DRAM: “Dynamic” Random Access Memory - Highest densities
- Optimized for cost/bit main memory
SRAM: “Static” Random Access Memory - Densities ¼ to 1/8 of DRAM
- Speeds 8-16x faster than DRAM
- Cost 8-16x more per bit
- Optimized for speed caches
The DRAM Cell Why DRAMs - Higher density than SRAMs
Disadvantages - Longer access times
- Leaky, needs to be refreshed
- Cannot be easily integrated with CMOS
SRAM Cell Bit is stored in a latch using 6 transistors To read: - set bitlines to 2.5v
- drive wordline, bitlines settle to 0v / 5v
To write: - set bitlines to 0v / 5v
- drive wordline, bitlines “overpower” latch transistors
One DRAM Bank
Example: 512Mb 4-bank DRAM (x4)
DRAM Cell Array
DRAM Basics Address multiplexing - Send row address when RAS asserted
- Send column address when CAS asserted
DRAM reads are self-destructive Memory array - All bits within an array work in unison
Memory bank - Different banks can operate independently
DRAM rank - Chips inside the same rank are accessed simultaneously
Examples of DRAM DIMM Standards
DRAM Ranks
DRAM Ranks
DRAM Organization
Organization of DRAM Modules
DRAM Configuration Example
Memory Read Timing: Conventional
Memory Read Timing: Fast Page Mode
Memory Read Timing: Burst
Memory Controller Consider all of steps a LD instruction must go through! - Virtual physical rank/bank
Scheduling policies are increasingly important - Give preference to references in the same page?
Integrated Memory Controllers
DRAM Refresh Periodic Refresh across DRAM rows Un-accessible when refreshing Read, and write the same data back Example: - 4k rows in a DRAM
- 100ns read cycle
- Decay in 64ms
- 4096*100ns = 410s to refresh once
- 410s / 64ms = 0.64% unavailability
DRAM Refresh Styles
DRAM Refresh Policies RAS-Only Refresh CAS-Before-RAS (CBR) Refresh
Types of DRAM Asynchronous DRAM - Normal: Responds to RAS and CAS signals (no clock)
- Fast Page Mode (FPM): Row remains open after RAS for multiple CAS commands
- Extended Data Out (EDO): Change output drivers to latches. Data can be held on bus for longer time
- Burst Extended Data Out: Internal counter drives address latch. Able to provide data in burst mode.
Synchronous DRAM - SDRAM: All of the above with clock. Adds predictability to DRAM operation
- DDR, DDR2, DDR3: Transfer data on both edges of the clock
- FB-DIMM: DIMMs connected using point to point connection instead of bus. Allows more DIMMs to be incorporated in server based systems
RDRAM
Main Memory Organizations The processor-memory bus may have width of one or more memory words Multiple memory banks can operate in parallel - Transfer from memory to the cache is subject to the width of the processor-memory bus
Wide memory comes with constraints on expansion - Use of error correcting codes require the complete “width” to be read to recompute the codes on writes
- Minimum expansion unit size is increased
Word Level Interleaved Memory Memory is organized into multiple, concurrent, banks Single address generates multiple, concurrent accesses Well matched to cache line access patterns Assuming a word-wide bus, cache miss penalty is - Taddress + Tmem_access + #words * Ttransfer cycles
Sequential Bank Operation
Concurrent Bank Operation
Concurrent Bank Operation Each bank can be addressed independently Difference with interleaved memory Support for non-blocking caches with multiple outstanding misses
Data Skewing for Concurrent Access How can we guarantee that data can be accessed in parallel? Storage Scheme: - A set of rules that determine for each array element, the address of the module and the location within a module
- Design a storage scheme to ensure concurrent access
- d-ordered n vector: the ith element is in module (d.i + C) mod M.
Conflict-Free Access Conflict free access to elements of the vector if Multi-dimensional arrays treated as arrays of 1-d vectors Conflict free access for various patterns in a matrix requires - M >= N. gcd(M,δ1) for columns
- M >= N. gcd(M, δ2) for rows
- M >= N. gcd(M, δ1+ δ2 ) for forward diagonals
- M >= N. gcd(M, δ1- δ2) for backward diagonals
Conflict-Free Access Implications for M = N = even number? For non-power-of-two values of M, indexing and address computation must be efficient Vectors that are accessed are scrambled - Unscrambling of vectors is a non-trivial performance issue
Data dependencies can still reduce bandwidth far below O(M)
Avoiding Bank Conflicts: Compiler Techniques Many banks int x[256][512]; for (j = 0; j < 512; j = j+1) for (i = 0; i < 256; i = i+1) x[i][j] = 2 * x[i][j]; Even with 128 banks, since 512 is multiple of 128, conflict on word accesses Solutions: - Software: loop interchange
- Software: adjust array size to a prime # (“array padding”)
- Hardware: prime number of banks (e.g. 17)
- Data skewing
Study Guide: Glossary Asynchronous DRAM Bank and rank Bit line Conflict free access Data skewing DRAM High-order and low-order interleaving Leaky transistors Memory controller Page mode access RAS and CAS Refresh RDRAM SRAM Synchronous DRAM Word interleaving Word line
Study Guide Differences between SRAM/DRAM in operation and performance Given a memory organization determine the miss penalty in cycles Cache basics - Mappings from main memory to locations in the cache hierarchy
- Computation of CPI impact of miss penalties, miss rate, and hit times
- Computation CPI impact of update strategies
Find a skewing scheme for concurrent accesses to a given data structure - For example, diagonals of a matrix
- Sub-blocks of a matrix
Evaluate the CPI impact of various optimizations Relate mapping of data structures to main memory (such as matrices) to cache behavior and the behavior of optimizations
Dostları ilə paylaş: |
|
|