Our 32-bit Tegra K1 mobile processor has been racking up praise for bringing amazing performance and true console-quality graphics to the mobile space.
The 32-bit Tegra K1 mobile processor has repeatedly been praised for its amazing performance and it has brought out the best of the console quality graphics on a mobile space. As far as the GPU performance benchmarks are concerned, it quite “handily beats every other ARM SoC” as reported by Anandtech. And as PC Perspective puts it, the “GPU performance is what stands out with the Tegra K1, nothing else on the market today is really able to get even close.”
It has been eight months since Tegra K1’s 32-bit version was unveiled and Nvidia already has some further architectural details of the chip’s 64-bit version. Most of these details will be given away on the Hot Chips event which is a technical conference on high-performance chips. Nvidia has already announced that the new Tegra K1 version has brought together the 192-core Kepler architecture-based GPU with custom-designed, 64-bit, dual-core “Project Denver” CPU. This CPU we are talking about here is the ARMv8 architecture compatible and moreover the Denver is fully pin compatible with the 32-bit Tegra K1. This generates ease of implementation and faster time to market.
The 64-bit Tegra K1 has delivered an outstanding performance and has already become the world’s first 64-bit ARM processor for Android. In no time it has managed to outspace the ARM-based mobile processors. Denver has exclusively been designed for the highest single-core CPU throughput and it is known to deliver industry-leading dual-core performance. The Denver cores have the ability to implement a 7-way superscalar microarchitecture which includes a 128KB 4-way L1 instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2 cache. The innovative process implemented by Denver is called the Dynamic Code Optimization. This entire process is responsible for optimizing the frequently used software routines at runtime into dense, highly tuned microcode-equivalent routines. The 128MB main-memory-based optimization cache inside is responsible for the storage of these routines.
Though there are overheads associated with this dynamic optimization process, it is all worthwhile in the end to already have at hand the optimized code ready to execute. There are also instances when the codes cannot be reused that frequently and this is when Denver can process those ARM instructions directly without having to go through the dynamic optimization process. This results in delivery which makes the best use of both the worlds.