Note that nVIDIA are now listing \'AA Samples\' rather than fillrate in Pixels / Texels per second, I\'ve supplied what I would assume the relevant Pixel/Texel fillrates would be (for reference the specs of the highest GeForce3 are included)
The NV2a is more like a GeForce4-NV25(not GF3), hence the name NV2A.
It got most of the straits found in a final NV25, and it was done 2-3 months before the NV25. Both the NV2A and the NV25 have:
1. nfiniteFX II engine (dual fully programmable vertex shaders). The Geforce3 has the nfiniteFX engine.
2. Improved Z-Occlusion Culling: Z-Occlusion Culling has already been found in the Geforce3. But in the NV25, this feature has been improved to be able to cull more pixel while using less memory bandwidth to do it. The most of the cullings are now done by using on-chip cache to avoid off-chip memory accesses. The NV2A is able to cull 3.7 gigapixels per seconds
(--->I think it\'s only in theory=that\'s why you won\'t see 3.7 Gpixel/sec in game but--->it doesn\'t mean you won\'t see more than 0.932 Gpixel/sec//occlusion-detection circuitry can increase fill rate by up to 4X but the effect varies depending on whether pixels are occluded when they\'re drawn, but tends to be greatest exactly when it\'s needed most,when there\'s a lot of overdraw//so you\'ll see it at different degrees).
This spec puts it right up there with the chips in the NV25 family.
3. Vertex Cache, Dual Texture Cache, and Pixel Cache
4. Accuview
5. 4xS FSAA (confirmed by Xbox developers)
Note that the Geforce 3 doesn\'t have the Accuview and 4xS FSAA features. Now proofs:
Geforce4\'s features (according to Tomshardware):
Vertex Cache: storing vertices after they were sent across the AGP. It\'s used to make the AGP more efficient, by avoiding multiple transmissions of teh same vertices (e.g. primitives that share edges).
Primitive Cache: assembles vertices after processing (after vertex shader) into fundamental primitives to pass onto triangle setup.
Dual Texture Caches: those were already found in Geforce3. The new cache algorithms are advanced to "look ahead" more efficiently in cases of multi-texturing or higher quality filtering. This contributes to the significantly improved 3 and 4 texture performance of Geforce4 Ti.
Pixel Cache: This cache at the end of the rendering pipeline is a coalescing cache, which is very similar to the "write combining" feature of Intel and AMD processors. It waits until a certain amount of pixels have been drawn until it writes them to memory in burst modes.
Improved Z-occlusion culliing: This feature was also found in Geforce3 already, but for NV25 it has been tuned to cull more pixels while using less memory bandwidth to do it. The culling is now done in a certain culling surface cache on-chip to avoid off-chip memory accesses.
Accuview: makes AA look better and run faster
4xS FSAA: this mode is supposed to look a lot better than 4x AA mode, due to a 50% increase in subpixel coverage.
NV2A\'s features (according to ExtremeTech):
Vertex Cache/Dual Texture Caches/Pixel Cache: Other plumbing of interest is the XGPU\'s texturing caching scheme, which is configured in a kind of L1/L2 layout. According to Microsoft\'s Seamus Blackley, textures are decompressed between the two caches. nVidia was willing to state that the XGPU has vertex, texture and pixel caches, though declined to detail their respective sizes.
Improved Z-occlution culling: There are however both L1 and L2 texture caches on the XGPU, and while nVidia was unwilling to disclose their sizes, developers will likely look to tune their engines to get as many cache hits as possible to minimize memory touches. According to Microsoft, many of the Z-occlusion tests occur on-chip and don\'t even touch the caches, never mind system memory.
Accuview: Nvidia provided some additional detail regarding how they architected the XGPU to be as miserly in its memory usage as possible. For instance, with its 4X multisampling antialiasing enabled, the XGPU gains what can be thought of as four-fold increase in performance. That\'s not a typo, here\'s how it works: XGPU has dedicated multisampling hardware than can generate up to four sub-samples per pixel per clock cycle, meaning the "rest" of the XGPU doesn\'t have to spend pixel processing power to generate these samples. In addition, nVidia states that this multisampling hardware doesn\'t have to fetch textures from memory to generate its sub-samples, which is a huge relief for beleaguered memory bandwidth that traditionally gets hammered when super-sampling FSAA is enabled. nVidia likely does this by generating the sub-samples when a texture has already been fetched for some other operation, so as not to duplicate the accesses.
4xS FSAA: there is a 4-sample 9X multi-sampling AA mode on the Xbox that looks even more amazing than 4x FSAA (leaked info from developers)!
Please seven,XBox is not a GF3...