![]() ![]() ![]() Our EVGA GeForce GTX 1660 Ti XC Black Gaming sample topped out around 1,845 MHz through three runs of Metro: Last Light, while other cards we’ve seen readily exceed 2,000 MHz. However, the official base clock rate is 1,500 MHz with a GPU Boost specification of 1,770 MHz. Board partners will undoubtedly target a range of frequencies to fill the gap between GTX 1660 Ti and RTX 2060. With 64 FP32 cores per SM, that’s 1,536 CUDA cores and 96 texture units across the entire GPU. Nvidia packs 24 SMs into TU116, splitting them between three Graphics Processing Clusters. That story also introduced Turing's accelerated video encode and decode capabilities, which carry over to GeForce GTX 1660 Ti as well. We covered this technology in Nvidia’s Turing Architecture Explored: Inside the GeForce RTX 2080. In addition to the Turing architecture’s shaders and unified cache, TU116 also supports a pair of algorithms called Content Adaptive Shading and Motion Adaptive Shading, together referred to as Variable Rate Shading. In between, it's free to issue a different instruction to any other unit, including the INT32 cores. Because Turing doubles up on schedulers, it only needs to issue an instruction to the CUDA cores every other clock cycle to keep them full. Four of those 16-core groupings comprise the SM, along with 96KB of cache that can be configured as 64KB L1/32KB shared memory or vice versa, and four texture units. The newer architecture assigns one scheduler to each set of 16 CUDA cores (2x Pascal), along with one dispatch unit per 16 CUDA cores (same as Pascal). Turing’s Streaming Multiprocessors are composed of fewer CUDA cores than Pascal’s, but the design compensates in part by spreading more SMs across each GPU. When you hear about Turing cores achieving better performance than Pascal at a given clock rate, this capability largely explains why. Like the higher-end GeForce RTX 20-series cards, GeForce GTX 1660 Ti supports simultaneous execution of FP32 arithmetic instructions, which constitute most shader workloads, and INT32 operations (for addressing/fetching data, floating-point min/max, compare, etc.). Some of the growth is attributable to Turing’s more sophisticated shaders. But despite its smaller transistors, TU116 is still 42 percent larger than the GP106 processor that preceded it. After stripping away the RT and Tensor cores, we’re left with a 284mm² chip composed of 6.6 billion transistors manufactured using TSMC’s 12nm FinFET process. Nvidia considers TU116 the boundary where shading horsepower drops low enough to preclude Turing’s future-looking capabilities from serving much purpose. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |