Can DDR ram read 128 bits that are not in sequence?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

What if I want to read only every tenth 32bit-integer of a large array?
or one 32bit-member in every 200bit-object in a large array of
200bit-objects, can i still benefit from a 128 bandwidth?

Re: Can DDR ram read 128 bits that are not in sequence? wrote:
Quoted text here. Click to load it

"DDR" has nothing to do with it -- the answer lies in the memory
controller and the memory configuration and how the data is stored. If
the compiler is optimized properly it will keep each integer variable
from breaking across the memory chunk size (in this case 128 bits). So
it will place 4 of the 32-bit (4 byte) integers within one 128 bit (16
byte) memory chunk. If the compiler is not optimizing then you might
find that reading any given integer might (about 1/4 of the time?) have
to involve two 128-bit memory chunks. If you are dealing with object
with 200 bits then there is no way that a single memory operation will
be able to read or store the whole object. Don't really know how modern
languages handle picking a 32-bit member from 200 but I suspect that it
will again fall to the compiler's optimization strategy but until I
knew, based on testing, I'd assume that two reads/writes would always be
involved in dealing with that 200-bitter even if only a small portion of
it was the real target of the operation.

So much for my SWAG...

John McGaw
[Knoxville, TN, USA]

Re: Can DDR ram read 128 bits that are not in sequence? wrote:

Quoted text here. Click to load it

You should be looking for "software optimization guide" on the Intel and
AMD web sites. Perhaps a section like "Cache and Memory Optimization" for
example. Different processor families may have different optimizations.

Software Optimization Guide for the AMD64 Processors

AMD Athlon Processor x86 Code Optimization Guide

There may be more on the Intel site, but I didn't search further than
the first hit I got. is where I usually start.

   "IA-32 Intel Architecture Optimization Reference Manual"

When writing software, you have to be really aware of how caches work.
Hinting that prefetch is required (via a pragma ? I'm not a software
guy), can make your data available from the cache, a number of
instruction times after you deliver the hint. This can help hide
reads from memory, up to the point that the stride of the data is
wider than a cache line. (Right off hand, I don't know
how you would optimize for more widely spaced data. Or the best way
for pulling in larger chunks of data than a single cache line.)

As a hardware guy, I notice that processors do things a lot more
with bursted data. I don't know how frequently they do single cycle
style accesses any more. There is a lot of overhead associated with
doing a bus transaction, and using cached data, and pulling
in data a cache line at a time is not much more expensive than just
trying to surgically extract 32 bits from a single access to memory.

A dual channel chipset, can have memory installed in a single
channel configuration, or in a dual channel configuration. Some
BIOS offer the option for two different burst sizes (say 4 and 8).
One value helps you fill a cache line in single channel mode,
the other helps you fill a cache line in dual channel mode.
Selecting the correct value should help performance, as the burst
times the width should be matched to the quanta used by the L1 and
L2 caches on the processor.

An architecture group or a software development group may have
individuals who know how to do optimizations at the machine level.
In any case, careful testing is the best teacher, if you cannot get
real good details on how the hardware works. (Traslation - hardware
documentation sucks these days :-))

Good luck,

Site Timeline