The First bit…
Jobs’ claim that its new G5 systems will be the fastest personal computers on the market was reiterated several times during Monday’s keynote.
Historically, Apple has validated such claims by using a very small number of benchmarks, and often just one: a now ancient benchmark developed by BYTE Magazine, called BYTEmark,and one which even BYTE suggestedcould not, by itself, accurately describe the performance of the system. Apple, however, used BYTEmark to test its own G3 processors as this Google cache shows.
Apple seemed to break its habit on Monday morning, when Jobs trotted out benchmarks based upon what is considered to be an industry-standard benchmark, SPEC, administered by SPEC.org. Apple’s tests, a copy of which can be found in this PDF document, were performed under contract from Apple by Lionbridge Technologies’ VeriTest testing labs, an independent testing agency.
Tweaks apparently made to optimize Apple’s CPU
Although no benchmark is perfect, SPEC.org posts a list of submitted scores to its web site, and each list includes a detailed description of the system configuration. None of the Apple scores, however, have yet to appear.
As is often the case, Lionbridge appears to have performed the tests to the letter. Those letters, however, are very telling.
In the system configuration for the G5 system, Apple appears to have asked Lionbridge to do quite a bit of tweaking. According to the section titled “Initial Power Mac G5 Configuration for all SPEC CPU2000 Testing,” the following steps were taken:
"Install the Computer Hardware Understanding Development kit ( CHUD ) version 3.0.0b19. This tool is designed to simplify performance studies of PowerPC Macintosh systems running Mac OS X by providing a set of tools for developers to analyze their applications. CHUD will be available for download after June 23, 2003 at http://developer.apple.com/tools/performance.
*Using the “Reggie” tool available from CHUD, modify CPU registers to enable memory Read By-pass. As Read requests are speculatively sent to the memory controller, this eliminates the need to wait for the snoop response required in a multiprocessor configuration thus reducing the time required for a read request.
*Used the command “hwprefetch -8” to enable the maximum of eight hardware pre-fetch streams and disable software-based pre-fetching.
*Installed a high performance, single threaded malloc library. This library implementation is geared for speed rather than memory efficiency and is single-threaded which makes it unsuitable for many uses. Special provisions are made for very small allocations (less than 4 bytes). This library is accessed through use of the -lstmalloc flag during program"
And it goes on…
For both the Dell Dimension 8300 and the Dell Precision 650, Apple/Veritest performed the multi-processor “Rate” benchmarks with hyperthreading DISABLED. They had hyperthreading ENABLED for the single-processor benchmarks, but DISABLED for the multi-processor benchmarks, despite the fact that hyperthreading would have improved the performance of the multi-processor “Rate” benchmarks, while having little or no effect on the single-processor benchmarks. In either case, this performance-enhancing feature of the Intel processors should not have been disabled.
Apple crippled the floating-point performance of the Pentium 4 by setting a compiler option incorrectly. Apple/Veritest enabled the GCC “-mfpmath=sse” option. A number of people have e-mailed me to say that this option causes GCC to use SSE1 and SSE2 instructions for floating-point, however this is an experimental feature and it actually DECREASES performance. They say that the regular x87 instructions are faster and should be used for floating-point, not SSE/SSE2.