Ece 4100/6100 Advanced Computer Architecture Lecture 11 dram prof. Hsien-Hsin Sean Lee



Yüklə 501 b.
tarix26.10.2017
ölçüsü501 b.


ECE 4100/6100 Advanced Computer Architecture Lecture 11 DRAM

  • Prof. Hsien-Hsin Sean Lee

  • School of Electrical and Computer Engineering

  • Georgia Institute of Technology


Reading

  • Section 5.3

  • Suggested Readings



Main Memory Storage Technologies

  • DRAM: “Dynamic” Random Access Memory

    • Highest densities
    • Optimized for cost/bit  main memory
  • SRAM: “Static” Random Access Memory

    • Densities ¼ to 1/8 of DRAM
    • Speeds 8-16x faster than DRAM
    • Cost 8-16x more per bit
    • Optimized for speed  caches


The DRAM Cell

  • Why DRAMs

    • Higher density than SRAMs
  • Disadvantages

    • Longer access times
    • Leaky, needs to be refreshed
    • Cannot be easily integrated with CMOS


SRAM Cell

  • Bit is stored in a latch using 6 transistors

  • To read:

    • set bitlines to 2.5v
    • drive wordline, bitlines settle to 0v / 5v
  • To write:

    • set bitlines to 0v / 5v
    • drive wordline, bitlines “overpower” latch transistors


One DRAM Bank



Example: 512Mb 4-bank DRAM (x4)



DRAM Cell Array



DRAM Basics

  • Address multiplexing

    • Send row address when RAS asserted
    • Send column address when CAS asserted
  • DRAM reads are self-destructive

  • Memory array

    • All bits within an array work in unison
  • Memory bank

    • Different banks can operate independently
  • DRAM rank

    • Chips inside the same rank are accessed simultaneously


Examples of DRAM DIMM Standards



DRAM Ranks



DRAM Ranks



DRAM Organization



Organization of DRAM Modules



DRAM Configuration Example



Memory Read Timing: Conventional



Memory Read Timing: Fast Page Mode



Memory Read Timing: Burst



Memory Controller

  • Consider all of steps a LD instruction must go through!

    • Virtual  physical rank/bank
  • Scheduling policies are increasingly important

    • Give preference to references in the same page?


Integrated Memory Controllers



DRAM Refresh

  • Leaky storage

  • Periodic Refresh across DRAM rows

  • Un-accessible when refreshing

  • Read, and write the same data back

  • Example:

    • 4k rows in a DRAM
    • 100ns read cycle
    • Decay in 64ms
    • 4096*100ns = 410s to refresh once
    • 410s / 64ms = 0.64% unavailability


DRAM Refresh Styles

  • Bursty



DRAM Refresh Policies

  • RAS-Only Refresh

  • CAS-Before-RAS (CBR) Refresh



Types of DRAM

  • Asynchronous DRAM

    • Normal: Responds to RAS and CAS signals (no clock)
    • Fast Page Mode (FPM): Row remains open after RAS for multiple CAS commands
    • Extended Data Out (EDO): Change output drivers to latches. Data can be held on bus for longer time
    • Burst Extended Data Out: Internal counter drives address latch. Able to provide data in burst mode.
  • Synchronous DRAM

    • SDRAM: All of the above with clock. Adds predictability to DRAM operation
    • DDR, DDR2, DDR3: Transfer data on both edges of the clock
    • FB-DIMM: DIMMs connected using point to point connection instead of bus. Allows more DIMMs to be incorporated in server based systems
  • RDRAM

    • Low pin count


Main Memory Organizations

  • The processor-memory bus may have width of one or more memory words

  • Multiple memory banks can operate in parallel

    • Transfer from memory to the cache is subject to the width of the processor-memory bus
  • Wide memory comes with constraints on expansion

    • Use of error correcting codes require the complete “width” to be read to recompute the codes on writes
    • Minimum expansion unit size is increased


Word Level Interleaved Memory

  • Memory is organized into multiple, concurrent, banks

  • World level interleaving across banks

  • Single address generates multiple, concurrent accesses

  • Well matched to cache line access patterns

  • Assuming a word-wide bus, cache miss penalty is

      • Taddress + Tmem_access + #words * Ttransfer cycles


Sequential Bank Operation



Concurrent Bank Operation



Concurrent Bank Operation

  • Each bank can be addressed independently

    • Sequence of addresses
  • Difference with interleaved memory

  • Support for non-blocking caches with multiple outstanding misses



Data Skewing for Concurrent Access

  • How can we guarantee that data can be accessed in parallel?

    • Avoid bank conflicts
  • Storage Scheme:

    • A set of rules that determine for each array element, the address of the module and the location within a module
    • Design a storage scheme to ensure concurrent access
    • d-ordered n vector: the ith element is in module (d.i + C) mod M.


Conflict-Free Access

  • Conflict free access to elements of the vector if 

    • M >= N
    • M >= N. gcd(M,d)
  • Multi-dimensional arrays treated as arrays of 1-d vectors

  • Conflict free access for various patterns in a matrix requires

    • M >= N. gcd(M,δ1) for columns
    • M >= N. gcd(M, δ2) for rows
    • M >= N. gcd(M, δ1+ δ2 ) for forward diagonals
    • M >= N. gcd(M, δ1- δ2) for backward diagonals


Conflict-Free Access

  • Implications for M = N = even number?

  • For non-power-of-two values of M, indexing and address computation must be efficient

  • Vectors that are accessed are scrambled

    • Unscrambling of vectors is a non-trivial performance issue
  • Data dependencies can still reduce bandwidth far below O(M)



Avoiding Bank Conflicts: Compiler Techniques

  • Many banks

  • int x[256][512];

  • for (j = 0; j < 512; j = j+1)

  • for (i = 0; i < 256; i = i+1)

  • x[i][j] = 2 * x[i][j];

  • Even with 128 banks, since 512 is multiple of 128, conflict on word accesses

  • Solutions:

    • Software: loop interchange
    • Software: adjust array size to a prime # (“array padding”)
    • Hardware: prime number of banks (e.g. 17)
    • Data skewing


Study Guide: Glossary

  • Asynchronous DRAM

  • Bank and rank

  • Bit line

  • Burst mode access

  • Conflict free access

  • Data skewing

  • DRAM

  • High-order and low-order interleaving

  • Leaky transistors

  • Memory controller

  • Page mode access

  • RAS and CAS

  • Refresh

  • RDRAM

  • SRAM

  • Synchronous DRAM

  • Word interleaving

  • Word line



Study Guide

  • Differences between SRAM/DRAM in operation and performance

  • Given a memory organization determine the miss penalty in cycles

  • Cache basics

    • Mappings from main memory to locations in the cache hierarchy
    • Computation of CPI impact of miss penalties, miss rate, and hit times
    • Computation CPI impact of update strategies
  • Find a skewing scheme for concurrent accesses to a given data structure

    • For example, diagonals of a matrix
    • Sub-blocks of a matrix
  • Evaluate the CPI impact of various optimizations

  • Relate mapping of data structures to main memory (such as matrices) to cache behavior and the behavior of optimizations




Yüklə 501 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2020
rəhbərliyinə müraciət

    Ana səhifə