can IBM actually deliver a teraflop part? My answer is, yes. Based
on 0.10 micron technology, EE3 will be able to clock at 3 Ghz. This means EE3 has to
deliver 333 flop per cycle to reach teraflop. Presuming that the fundamental data size of
128 bit vector does not change(there is no reason), each VU with 4 FMACs and 2
dividers will deliver 10 flop per cycle and you will need 32 VUs + CPU FPU to reach
teraflop. This is technically feasible since you have 100~200 million transistors to play
with at 0.10 micron level.
But how would programmers be able to manage 32 VUs, when they were unable to
cope with only 2 VUs of EE1? Recall that EE1 programming headache arose because of
direct VU visibility and the problem goes away if the multiple units are properly
shadowed, much like how programmers only see one pixel shader even though four are
actually present in GF3. Likewise, IBM could arrange 32 VUs in bank and keep only
one input to make 32 VUs appear as one. Under this programming mode, a programmer
would simply dump his data packet into VU bank input port; the VU manager read the
packet from input port and send it to an idle VU along with tagged script code. The
results are then sent to destination indicated by the script, be it CPU, memory, or
rasterizer.
We will have to wait until EE3 unveiling to find out about exact details, but I have my confidence in IBM and Sony.
Speaking of EE3, it is quite evident than GS3 will not have T&L since the so much
computational power is focused on EE3. From Sony\'s ISSCC2001 presentation, the GS3
will have 32~64 MB of eDRAM and clock at 714 Mhz. Sony\'s continued focus on
eDRAM makes GS3 T&L unlikely due to low clockspeed and die-size limitation. ***3 will have a very good CPU but the GPU could stay underwhelming.----DM
i think this is to complicated for most of you so this will be my last post in this thread.