Simple Hardware Clock question

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Ok, so I just read that the clock synchronizes other hardware
components of a computer szystem; meaning that, because the processor
is faster than the RAM for instance, or the hard-disk, the next CPU
instruction is delayed till the next clock tick, in order to ensure
that each component completes its operation before the next phase. And
for this reason, the clocks often run at relatively slow speeds such as
333MHz - much slower than the 3GHz CPUs that we have now.

If this is so, one may say that: this CPU execute no more than
333,000,000 instructions per second!

Is this so?

Thanks for helping the Noob


Re: Simple Hardware Clock question

Quoted text here. Click to load it

Yes, if you write lousy software.

There are a number of techniques that help to improve the performance. To
name a few:
- on-chip CPU memory cache
- off-chip/external CPU memory cache
- interrupts (as opposed to continuous polling/busy-waiting)
- code optimization to remove any redundancy in both calculations and
memory/device accesses

Got the idea? :)


Re: Simple Hardware Clock question

Thanks Alexi!

I undesrstand how numbers (1) and (5) can help, but not the others.
Putting your answer together with Maxim's, is it correct to say all
these techniques do NOT require the external interface?

- Olumide

Re: Simple Hardware Clock question

Quoted text here. Click to load it

2 (off-CPU memory cache) helps just like the other cache. It's basically a
herarchy of caches, each working at its own speed and the closer the cache
to the CPU the faster data retrieval. But if the cache does not contain the
information the CPU needs, the dirty work will have to be done anyway, i.e.
read from the memory.

3 (interrupts and multithreading in general): suppose you're waiting for a
key in your application, and all your system and application software is
single-threaded, i.e. no multiprocessing of any kind, no parallelism. The
easiest and the least effective is a loop like this:
while (!kbhit()) ; // <conio.h> used
This simply wastes the CPU time, which could have been used for something
more useful, like parallel calculations in some background activity,
whatever. This is where interrupts help -- instead of waiting in an infinite
loop and doing nothing, you set your keyboard interrupt routine that is
called once per key hit/release, opposed to some millions of calls to
kbhit() in a loop. You advance your state machine upon the keyboard event,
using as little of the CPU time as needed, with no excessive overhead.

4 (DMA): this tiny bit of circutry does memory-to-device I/O transparent to
the CPU, it goes w/o too much of the CPU time overhead because the CPU is
interrupted only at the times when there's some data ready for it or can be
taken from it. Just that, no loops like in the above. Yet DMA usually works
with blocks of data, which again helps to minimize the overhead (you get one
interrupt on a block of bytes as opposed to getting on each byte).

Read some computer architecture book, like Tanenbaum's...


Re: Simple Hardware Clock question

Quoted text here. Click to load it
basically a
contain the
anyway, i.e.
for a
transparent to
CPU is
can be
get one

Thanks Alexei,

I know about all this - trust me, but I fail to see how the external
cache, or the use of interrupts, or DMA can cause the CPU to execute
more than 1 instruction in a hardware clock cycle. What I'm trying to
say is that I fail to see how external cache, or the use of interrupts,
or DMA constitute an internal interface for/of the CPU. (I really like
Maxim's answer ;-) . Are you there Maxim?)

- Olumide

Re: Simple Hardware Clock question

Quoted text here. Click to load it
they are

Right, and now you may have CPUs with several cores or that hyperthreading
feature, so, you can effectively have more than 1 instruction per clock due
to the parallelism. intel x86 CPUs probably have not a lot of useful
instructions that take just 1 clock :)

What I was trying to say in my previous posts is that even though the
circuitry that is connected to the CPU can be rather slow (effectively
running with slower clocks than that of the CPU), it just doesn't mean the
CPU itself starts running as slow as they do.


Re: Simple Hardware Clock question

Alexei A. Frounze wrote:
Quoted text here. Click to load it
trying to
clock due
Quoted text here. Click to load it
mean the

All modern CPUS (since about 1980) are pipelined in some form, meaning
that the work of an individual instruction is broken up into many
units, each taking a clock cycle.

A common analogy is doing laundry: there is a washer and a dryer. When
the first load A finishes washing, we can put it in the dryer, but
while A is drying, we can start the next load B in the washer. Then A
finishes drying and B finishes washing. A is now done, and B moves to
drying while the next load C starts washing. If the time for washing
and drying is T, then we achieve 1/T loads throughput, while each load
actually takes 2T to complete.

In modern CPUs like the Athlon or Pentium 4, the pipeline can be as
long as 10 or 20 stages. Therefore even though each instruction takes
10 or 20 cycles, they are pipelined so that we can achieve 1
instr/cycle throughput.

For more information, see:

Computer Architecture: A Quantitative Approach, John Hennessy, David

Re: Simple Hardware Clock question

Quoted text here. Click to load it

More so.

Even P5 Pentium was capable of running 2 instructions in the flow in parallel,
provided they do not depend on one another (operands of second are not altered
by first).

This feature is called "superscalar". Sparc CPU is even more great in such

The weak point of superscalar is that the decision on paralleling is done in
runtime by CPU hardware, which cannot keep large context.

The Very Long Instruction Word (VLIW) CPU like IA-64 loads this burden to the
compiler. The compiler (which can keep huge context) decides how to parallel
the operations between several CPU cores.

The back sides are huge complexity of compiler and assembler language (nearly
impossible to write manual assembler, too much context to keep in head).

Yet another approach to fast CPUs. Throw away any complexity, use the saved
silicon space for cache and raise the frequency as fast as it is possible.
Pentium 4 and Alpha go this way (Alpha even sacrificed any complexity away from
assembler language - it only has 64bit arithmetics, if you want byte one -
write a subroutine).

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation

Re: Simple Hardware Clock question

On Wed, 23 Feb 2005 05:34:41 +0300, Maxim S. Shatskih
Quoted text here. Click to load it

In a former job, I was working with (among other things) TI-'c6x DSPs,
with an open pipeline VLIW architecture.

A TON of effort went into hand optimizing inner loops that exactly fit
inside the on-chip memory.  TI provided an intermediate form of assembly
language that helped quite a bit--you could define your independent
instruction sequences that the optimizer could USUALLY arrange

But we had abundant war stories about squeezing the last smidgeon of
performance out of ten instructions.

Re: Simple Hardware Clock question wrote in comp.os.linux.hardware:

Quoted text here. Click to load it

There are many clocks in a PC.  For example the CPUs clock might be
running seventeen times faster than the motherboard's main bus.  

It is quite normal for the CPU to do many things between motherboard
clock ticks.  Hence the need for the main cache.  

Peter D.
Sig goes here...

Site Timeline