Is Apple’s A7 chip twice as fast at processing and graphics, as Apple promised when announcing the new iPhone 5S? At least for Ben Weiss, the answer is yes — but his app is unusually well suited to take advantage of the new processor’s features.
The app in question is Iter9’s new Frax app for generating lavishly detailed fractal imagery by running mathematics calculations on both the CPU and GPU. It gets a major speed boost on the iPhone 5S, Weiss said, evident in faster rendering times for the swirling psychedelic images laboriously calculated one pixel at a time.
Frax is currently a 32-bit app, Weiss said, and it runs 50 percent faster on the A7-powered iPhone 5S compared with the A6-powered iPhone 5. And then came the second speed boost: When he compiled the first 64-bit version, Frax ran 25 percent faster than the 32-bit version on the iPhone 5S. Together, that means the 64-bit app runs nearly 90 percent faster on an iPhone 5S than the 32-bit version on the iPhone 5.
“We’ll have this ready in a dot release soon,” so iPhone 5S customers can take advantage of the 64-bit version, Weiss told CNET. And he expects further performance gains from software optimization.
Other developers probably shouldn’t expect the same speedup, though. Calculating fractal imagery, while taxing for a processor, also happens to be the sort of work that it can do very predictably and efficiently without a lot of the fits and starts common in mainstream software, said David Kanter, principal analyst at Real World Tech.
“Fractalization is the nicest kind of workload,” Kanter said. “Whatever speedup they’re seeing is probably the best case.”
Apple wasn’t clear what speed tests it was using and how universal their results were when announcing the better A7 performance. But one thing is sure: Benchmarks are always an imperfect measure of real-world performance, even when there are no shenanigans involved.
Speed tests let people compare software performance on different hardware, operating systems, and configurations. But it’s hard to find benchmarks that directly predict how well the tremendous variety of real-world apps will do. Different software stresses different aspects of computing performance — memory access, graphics operations, running a single sequence of steps, running multiple sequences that can be spread across multiple processor cores, and so on.
64-bit boost? Nope
There are some points of clarity, though. One in Frax’s case is that the 64-bit nature of the A7 — temporarily derided as a gimmick by an executive at rival chipmaker Qualcomm — doesn’t actually help performance today. A prime reason for 64-bit chips is easy support for more than 4GB of memory, and iOS devices aren’t yet at that limit, but the 64-bit design brings plenty of other improvements that do help performance.
The Frax boost instead comes chiefly from two changes, Weiss said: the A7’s larger number of storage slots, called registers, and the fact that it can perform high-precision calculations faster on numbers stored in floating-point formats. In his words:
There are two main reasons the 64-bit version tends to be much faster. First is that the number of hardware registers on the CPU is doubled, from 16 to 32. Frax has some fairly complicated inner loops that keep track of more than 16 numbers at a time, which means that some values are constantly shuffled back and forth between registers and memory to make room. But with 32 registers, there’s plenty of space for all the numbers we need, so the code runs much more efficiently.
The second reason is that the 64-bit chip can perform two double-precision operations in parallel, whereas previous chips could only perform one at a time. This needs to be specially coded for, but in theory it can result in a doubling of speed…
The instructions to do two double-precision operations at once with SIMD [single instruction, multiple data] were added as part of the overall 64-bit design changes, though it’s not related to the 64-bitness of the chip per se.
He also said the ARMv8 chip architecture that the A7 uses gives Frax a boost because it can perform a combination of multiplication and addition in one step instead of two.
Real World Tech’s Kanter points to other advantages that come with the A7 that will help a wider variety of software: better performance when it’s time for the chip to retrieve data from a device’s main memory or its high-speed cache memory. Specifically, the chip can retrieve data sooner from its level-2 cache — the first place a processor looks for data — which means the chip wastes less time idling.
“There were some really big improvements in the A7 totally unrelated to the processor core,” Kanter said. “In particular their caches got way, way, way faster. The memory bandwidth is about two times faster, and the L2 cache is about half the latency it was before.”
Revamping software for 64-bit chips
Creating the 64-bit version was “surprisingly easy,” Weiss said, though his software already was largely written to be independent of 32-bit vs. 64-bit issues because of the PC industry’s transition.
“Our code base is about 100,000 lines; it took only about an hour to fix the compilation issues, after which it ran the first time flawlessly,” he said. “Having gone through this transition on desktop machines several years back, my code was written in anticipation of this.”
As processor designers ran into overheating problems that capped clock speeds a decade ago, chipmakers started pushing toward multicore processors that can perform multiple sequences of operations at the same time. The thinking is that if you can’t run the clock speed faster, you do more work by dividing it into parallel tasks.
Unfortunately, though, a lot of software is written to run in a single sequence of operations. Multicore chips can help with multitasking, saving files in the background, and some calculations that can be broken up into independent pieces. Graphics tasks are easily divided among multiple cores, which is why graphics chips rapidly pushed into the multicore realm.
Apple has steadfastly stuck with dual-core processors while some Android rivals have built quad-core and even eight-core devices. Frax is one of those apps, though, that could actually use all those extra cores. Fortunately for Iter9, it also can use the processing power of the graphics processing unit (GPU), Weiss said:
Frax makes full use of both the CPU and GPU, and scales to absorb as much processing power as is available. We’ve seen nearly linear speedups with the number of cores on earlier chips, so we expect that Frax would run nearly twice as fast on a 4-core chip as on a 2-core chip. GPUs are inherently scalable like this, and the one on the iPhone 5s is a monster! The GPU on the iPhone 5s is about 20 times faster than the one on the iPhone 4.
Even if there wasn’t a performance boost or critical near-term need to move to a 64-bit architecture, there are reasons for Apple to make the move now — perhaps most notably, the improved programming features that come along with the design.
Among other reasons, Apple might want to take care of the software switch before the hardware switch is urgent, Kanter said. There also are limits to using all 4GB of the memory capacity that’s in principle possible with a 32-bit design, something that could steer a company toward the 64-bit switch even if you’re only building in 2GB or 3GB of memory. Last, Apple might want to get the architectural shift done before it considers expanding its manufacturing suppliers from Samsung to include TSMC, too.
So there are real reasons to make the 64-bit move, as Frax can attest. A marketing gimmick the A7 is not.