Archived from groups: rec.audio.pro (
More info?)
> Join mailto
CDAW-subscribe@yahoogroups.com and search the recent archives
> for posts by Dave Haynie.
Since I made the original post, I'll provide Haynie's response:
> What's the consensus on Wintel based dual CPU machines, performance
> wise? Apple seems to be going this way.
Intel-based SMP systems offer the least additional per-processor add-on
performance (at least based on similar cache sizes), as they're all
based on conventional shared-bus technology -- always potentially in
contention for the main memory pool. Apple's "G4" machines are of a
similar design, though they have been made with L3 caches (as, I suppose,
are some flavors of Intel Xeon high-end P4s), which helps to an extent.
Apple's G5 (IBM's PowerPC 970) is roughly comparable to AMD's Athlon in
terms of architecture. In each case, the CPU presents an independent,
point-to-point high speed link to the "memory hub" chip, which
internally routes access to I/O and memory. This moves the burden of SMP
performance away from the CPU and into system, where at least
potentially, the performance over uniprocessor can scale. In the AMDs,
for example, access to the main memory is buffered on writes, one CPU's
access of I/O doesn't block the other's access of main memory, etc.
AMD took this a step further in the Opteron systems, in which there are
still high-speed, point to point links... only several of them. They've
also moved the memory controller onto the CPU, which basically allows
each CPU to add its own pool of memory, thus much less contention for
memory. Obviously, this resultes in a NUMA system (non-uniform memory
access), not a fully symmetric system, but the basic effect is that a
CPU's local memory is accessed dramatically faster than in
Apple/K7/Intel systems, while access between CPUs is roughly as fast as
in any conventional machine. With a NUMA-optimized OS, this is the
fastest kind of multiprocessor machine going for under $100K or so
(where they move to ultra-fast main shared buses behind huge caches,
memory-tagged coherency rather than snooping, and other really expensive
ideas).
> Hyper-threading doesn't seem to cut it completely.
It's a good use of a relatively few number of transistors. What it winds
up doing is finding useful work for otherwise lost time in the
single-threaded CPU's work -- filling in those pipeline stalls by
shifting to the other thread. The obvious downside is that, when you're
just building two sets of registers, you effectively devote have the
L1/L2 cache to each CPU's work (at least in the worst case). This is the
main reason HT-enabled is sometimes slower than HT-disabled.
As well, this is threading only; it doesn't accelerate the execution of
unrelated programs, as SMP does, but only threads within the same memory
context. Well written code is multithreaded in general, but it hasn't
been the rule that Windows programmers (or Linux programmers, for that
matter) know how to properly thread code. They have been getting better
at it, at least (in the latter case, it helped to actually have threads
added to the Linux OS).
Mike
http/www.MusicIsLove.com