Geek Patrol



32-bit vs 64-bit Performance Under Mac OS X

One of the assertions I keep hearing is that with the transition from 32-bit to 64-bit computing applications will receive an increase in performance “for free” since processors will be working with 64 bits, instead of 32 bits, at a time (provided, of course, applications are re-compiled for 64-bit platforms).

Now that Geekbench is available for both 32- and 64-bit processors on Mac OS X, I thought I’d see if that assertion is correct. I’ve compared performance on the three 64-bit processors currently available for Mac OS X; the PowerPC G5, the Intel Xeon, and the Intel Core 2 Duo.

Setup

Here are the configurations of the test machines:

  • Mac Pro

    • Intel Xeon 5150 @ 2.66GHz
    • 2048MB RAM
    • Mac OS X 10.4.7 (Build 8K1124)
    • Geekbench 2006 (Build 208)
  • iMac (Late 2006)

    • Intel Core 2 Duo @ 2.0GHz
    • 1024MB RAM
    • Mac OS X 10.4.7 (Build 8K1106)
    • Geekbench 2006 (Build 208)
  • Power Mac G5

    • PowerPC G5 @ 2.0GHz (two processors)
    • 1024MB RAM
    • Mac OS X 10.4.7 (Build 8J135)
    • Geekbench 2006 (Build 208)

I’ve only included the scores for single-threaded tests (since I think they’re the most relevant when comparing 32- and 64-bit performance on the same machine). I’m using the baseline score (where a score of 100 is equivalent to the performance of a Power Mac G5 at 1.6GHz), where higher is better. I’ve also computed the 64-bit score for each machine as a percentage of the machine’s 32-bit score.

Mac Pro Performance

Overall Score

Mac Pro
(32-bit)
Mac Pro
(64-bit)
Overall Score 344.8 365.2
(105.9%)

Integer Performance

Benchmark Mac Pro
(32-bit)
Mac Pro
(64-bit)
Emulate 6502
single-threaded scalar
162.8218.0
(133.9%)
Blowfish
single-threaded scalar
232.8205.2
(88.1%)
bzip2 Compress
single-threaded scalar
223.5277.4
(124.1%)
bzip2 Decompress
single-threaded scalar
251.9300.4
(119.3%)

Floating Point Performance

Benchmark Mac Pro
(32-bit)
Mac Pro
(64-bit)
Mandelbrot
single-threaded scalar
179.9180.0
(100.1%)
Dot Product
single-threaded scalar
362.0364.1
(100.6%)
Dot Product
single-threaded vector
153.5127.2
(82.9%)
JPEG Compress
single-threaded scalar
161.0195.7
(121.6%)
JPEG Decompress
single-threaded scalar
154.9199.9
(129.1%)

Memory Performance

Benchmark Mac Pro
(32-bit)
Mac Pro
(64-bit)
Read Sequential
single-threaded scalar
354.3356.7
(100.7%)
Write Sequential
single-threaded scalar
631.3423.1
(67.0%)
Stdlib Allocate
single-threaded scalar
279.0357.1
(128.0%)
Stdlib Write
single-threaded scalar
124.3116.1
(93.4%)
Stdlib Copy
single-threaded scalar
234.9252.1
(107.3%)

Stream Performance

Benchmark Mac Pro
(32-bit)
Mac Pro
(64-bit)
Stream Copy
single-threaded scalar
199.4204.2
(102.4%)
Stream Copy
single-threaded vector
197.1198.2
(100.6%)
Stream Scale
single-threaded scalar
217.6208.7
(95.9%)
Stream Scale
single-threaded vector
196.4195.6
(99.6%)
Stream Add
single-threaded scalar
188.4210.7
(111.8%)
Stream Add
single-threaded vector
184.7218.2
(118.1%)
Stream Triad
single-threaded scalar
147.0210.0
(142.9%)
Stream Triad
single-threaded vector
183.7173.9
(94.7%)

Mac Pro Summary

Overall performance in 64-bit mode is 5% higher than overall performance in 32-bit mode. However, a number of benchmarks that were slower in 64-bit mode than in 32-bit mode (like the Blowfish and Write Sequential benchmarks).

iMac Performance

Overall Score

iMac
(32-bit)
iMac
(64-bit)
Overall Score 205.2 221.5
(107.9%)

Integer Performance

Benchmark iMac
(32-bit)
iMac
(64-bit)
Emulate 6502
single-threaded scalar
122.3164.0
(134.1%)
Blowfish
single-threaded scalar
175.0153.5
(87.7%)
bzip2 Compress
single-threaded scalar
168.5209.7
(124.5%)
bzip2 Decompress
single-threaded scalar
212.9227.7
(107.0%)

Floating Point Performance

Benchmark iMac
(32-bit)
iMac
(64-bit)
Mandelbrot
single-threaded scalar
135.1135.1
(100.0%)
Dot Product
single-threaded scalar
271.5273.2
(100.6%)
Dot Product
single-threaded vector
113.2115.0
(101.6%)
JPEG Compress
single-threaded scalar
120.7147.2
(122.0%)
JPEG Decompress
single-threaded scalar
116.1154.8
(133.3%)

Memory Performance

Benchmark iMac
(32-bit)
iMac
(64-bit)
Read Sequential
single-threaded scalar
308.4307.5
(99.7%)
Write Sequential
single-threaded scalar
416.7439.5
(105.5%)
Stdlib Allocate
single-threaded scalar
208.6273.4
(131.1%)
Stdlib Write
single-threaded scalar
104.3104.7
(100.4%)
Stdlib Copy
single-threaded scalar
218.9221.1
(101.0%)

Stream Performance

Benchmark iMac
(32-bit)
iMac
(64-bit)
Stream Copy
single-threaded scalar
170.7178.1
(104.3%)
Stream Copy
single-threaded vector
161.0159.1
(98.8%)
Stream Scale
single-threaded scalar
183.2176.0
(96.1%)
Stream Scale
single-threaded vector
160.8160.4
(99.8%)
Stream Add
single-threaded scalar
159.0192.5
(121.1%)
Stream Add
single-threaded vector
176.2179.0
(101.6%)
Stream Triad
single-threaded scalar
159.7187.8
(117.6%)
Stream Triad
single-threaded vector
141.7148.9
(105.1%)

iMac Summary

Despite the fact that the Core 2 Duo and the Xeon share the same underlying architecture, the Core 2 Duo’s 64-bit performance is better than the Xeon’s 64-bit performance; overall performance for the Core 2 Duo is up 7% (compared to 5% for the Xeon). Plus, the only benchmark that was significantly slower in 64-bit mode was the Blowfish benchmark.

Power Mac Performace

Overall Score

Power Mac G5
(32-bit)
Power Mac G5
(64-bit)
Overall Score 154.9 140.1
(90.4%)

Integer Performance

Benchmark Power Mac G5
(32-bit)
Power Mac G5
(64-bit)
Emulate 6502
single-threaded scalar
125.1100.0
(79.9%)
Blowfish
single-threaded scalar
124.789.0
(71.4%)
bzip2 Compress
single-threaded scalar
156.5110.8
(70.8%)
bzip2 Decompress
single-threaded scalar
108.7106.0
(97.5%)

Floating Point Performance

Benchmark Power Mac G5
(32-bit)
Power Mac G5
(64-bit)
Mandelbrot
single-threaded scalar
125.2129.8
(103.7%)
Dot Product
single-threaded scalar
112.3112.8
(100.4%)
Dot Product
single-threaded vector
125.542.0
(33.5%)
JPEG Compress
single-threaded scalar
122.0105.9
(86.8%)
JPEG Decompress
single-threaded scalar
129.6107.5
(82.9%)

Memory Performance

Benchmark Power Mac G5
(32-bit)
Power Mac G5
(64-bit)
Read Sequential
single-threaded scalar
133.9130.1
(97.2%)
Write Sequential
single-threaded scalar
145.9161.2
(110.5%)
Stdlib Allocate
single-threaded scalar
101.993.2
(91.5%)
Stdlib Write
single-threaded scalar
129.7131.1
(101.1%)
Stdlib Copy
single-threaded scalar
134.2124.7
(92.9%)

Stream Performance

Benchmark Power Mac G5
(32-bit)
Power Mac G5
(64-bit)
Stream Copy
single-threaded scalar
132.9127.9
(96.2%)
Stream Copy
single-threaded vector
129.2122.8
(95.0%)
Stream Scale
single-threaded scalar
129.9127.5
(98.2%)
Stream Scale
single-threaded vector
129.6131.1
(101.2%)
Stream Add
single-threaded scalar
127.4129.3
(101.5%)
Stream Add
single-threaded vector
130.8140.7
(107.6%)
Stream Triad
single-threaded scalar
134.5129.7
(96.4%)
Stream Triad
single-threaded vector
137.1139.5
(101.8%)

Power Mac Summary

Overall performance is down 10% in 64-bit mode. Hardly any tests are appreciably faster in 64-bit mode, and several are noticeably slower (such as most of the integer tests, as well as the dot product test).

Conclusion

It turns out the assertion that software runs faster in 64-bit mode than 32-bit mode is both correct and incorrect; Geekbench runs faster in 64-bit mode on Intel-based Macs, but slower on PowerPC-based Macs. I find this incredibly surprising.

On Intel-based Macs, most of the benchmarks that are slower in 64-bit mode are benchmarks that perform bit operations on 32-bit integers, where the compiler has to emit extra instructions to preserve the semantics of 32-bit arithmetic while using 64-bit registers.

However, extra instructions don’t explain the surprising performance hit PowerPC-based Macs experience in 64-bit mode. I haven’t had a chance to investigate it, but compiler quality could be a factor; the 64-bit PowerPC is a somewhat exotic platform, and GCC might not be generating great code for it.

I don’t think the performance hit in 64-bit mode on PowerPC-based Macs is really something to be concerned about; I think that when 64-bit applications become mainstream, most users will have switched to Intel-based Macs (where 64-bit performance isn’t a concern).

Update

There’s an interest comment over on MacSlash suggesting why 64-bit performace (compared to 32-bit performace) is better on x86 than PPC:

As someone who used to work at AMD which designed the x86-64 architecture: – 16 integer pipe registers versus 8 in 32 bit mode (of which 6 get used) – Carefully designed CISC so that 64-bit mode takes only 10% more space than 32 bit mode. This is important because the main bottleneck in modern systems is memory speed (hence the constant increase in cache sizes) PowerPC: – no increase in registers – much larger code size increase, although I can’t find exact figures.


Trackbacks & Pingbacks

  1. Mac OS X 32-bit and 64-bit Performance pingbacked Posted September 27, 2006, 9:42 pm
  2. I-R-Coops Blog » 32bit OS X vs 64bit OS X pingbacked Posted September 28, 2006, 7:28 am
  3. EveryDigg » Blog Archive » Mac OS X 32-bit and 64-bit Performance pingbacked Posted October 4, 2006, 7:21 am
  4. Anyone SORRY they installed Leopard? - MacNN Forums pingbacked Posted February 21, 2008, 9:31 am

Comments

  1. 1 stingerman says:

    The Intel processor’s in 64 bit mode allow Apple to use more registers and thus function parameters are passed via processor registers than over the memory stack. A significant performance boost alone for latency bound tightly looped integer programs.

    However 64 bit pointers take up twice the processor cache space over 32 bit pointers. Data should not be declared 64 bit unless double precision is required, as it would also take up more cache space. So the large caches are helpful and programmers need to be mindful of their data declarations.

    Of course double precision calculations are significantly faster.

    Posted September 26, 2006, 10:12 am
  2. 2 Sam says:

    The results are exactly what I would expect. In general, going to 64 bits slows things down, since you have twice as much data to deal with and most of the time that data is unused (most integers used in your average calculation are less than 4,294,967,296). There is an exception for the Intel architecture, though, since going to 64-bits buys you some extra registers because of the different ISA, so you get an effect that is not strictly due to the jump to 64-bit wide registers.

    Posted September 26, 2006, 10:34 am
  3. 3 Onyx says:

    I don’t know how these test really matter. OS X isn’t 64-bit yet so we don’t know how it has been optimized for such processors. Most software isn’t optimized for such a thing either. While I understand the cache issues and such, 64-bit will make your computer more powerful IF the software can take advantage of it. Benchmarking is extremely general and doesn’t give real world numbers; only predictions.

    Posted September 26, 2006, 5:21 pm
  4. 4 John C. Randolph says:

    Onyx,

    You are mistaken.  OS X has been a 64-bit OS since the G5 shipped.

    -jcr

    Posted September 26, 2006, 10:25 pm
  5. 5 Peter says:

    John C. Randolph, you are mistaken. Mac OS X is not a 64bit OS.

    Yes, it is able to run 64bit binarys without GUI using the libSystem.dylib.

    But the kernel and the drivers are not 64 bit (they are universal binary “ppc, i386″.) So it is not a 64bit OS.

    The only file in the whole system being 64bit ist the /usr/lib/libSystem.B.dylib (this file is universal binary “ppc, ppc64, i386, x86_64″).

    On the PowerPC it makes no difference if the OS is 64 or 32 bit.

    Quote from the IBM PPC970 documentation:

    The kernel can be either 32- or 64-bit oriented.

    The user-level instructions specific to the 64-bit PowerPC processors can be executed in both 32-bit and in 64-bit computation mode. The user-level instructions implemented by the 32-bit PowerPC processor can also be executed in either computation mode.

    But how about Intel?

    Quote from the Intel documentation:

    Note that a new 64-bit operating system and device drivers will be needed to access the 64-bit capabilities of the enhanced Xeon processors. 32-bit applications will need to be recertified for the 64-bit operating environment even if the software remains 32-bit.

    Since the device drivers in Mac OS X Tiger on the Intel Macs with EM64T are unchanged from the ones used on the 32bit Intel Macs (all are still i386) and the kernel is executed in the same memory space as the drivers (it is also still i386) it is kind of a riddle why Mac OS X is even executing x8664 code.

    Maybe Apple will tell sometimes how they manage to make x8664 binary run on the MacPro or iMac Core 2 Duo running Tiger even so the OS is not 64bit.

    Posted September 27, 2006, 4:45 am
  6. 6 coolfactor says:

    OS X is 64-bit at the kernel level, but wasn’t so at the frameworks level. That is changing with 10.5. Everything will be 64-bit enabled.

    Posted September 27, 2006, 8:19 am
  7. 7 Shaun Wexler says:

    John C. Randolph is a former Apple software engineer, FYI. Although the majority of Apple-supplied GUI libraries and frameworks remain 32-bit prior to Leopard 10.5, he is technically correct in his statement in that G5 can run in 64-bit mode on Tiger. It’s also likely that the kernel will still run in 32-bit mode on Leopard, because the memory management semantics are no different than Tiger’s, for supporting a 64-bit process address space.

    Posted September 27, 2006, 9:13 am
  8. 8 Peter says:

    coolfactor

    Mac OS X is not 64-bit at the kernel level.

    If it was 64bit, it wouldn’t be running on the Yonah (the Core Processor in the first Intel Macs). The Yonah only uses the i386 instruction set. A 64bit kernel would use a different instruction set, the x8664. The Yonah is not able to execute 64bit (x8664) instuctions the kernel would be using if it was 64bit.

    Apart from that the kernel doesn’t even contain any x86_86 code.

    Posted September 27, 2006, 9:34 am
  9. 9 l0ne says:

    Although not knowing exactly the details, I could make the hypotesis that both the PowerPC and the x86_64 processors are able to run code of a single process in a 64-bit mode, removing the need for the kernel to be 64-bit itself.
    OS X is a 64-bit OS in that it is 64-bit-aware and capable at the libc layer; being aware and able to put the processor in a mode in which it can run 64-bit code does not necessarily need a fully 64-bit kernel. (This I can say for sure as OS X does it, which means it is possible.)

    Posted September 27, 2006, 1:01 pm
  10. 10 rory says:

    Peter is absolutely right. This ability to switch between 32-bit and 64-bit mode at the CPU level, is one of the advantages of the PowerPC 970 architecture.

    And yes, xnu has been 32-bit since it’s inception. I too would like to know what the final is on the EM64-T Mactels though. I figured for sure, Apple have to be offering a 64-bit kernel on those machines, as I would think that’s required to support 64-bit in userland. Peter seems to be saying though, that they’re running a 32-bit xnu? Can someone who owns one of these machines verify this? I’ve often wondered if this limitation in IA, was one of the reasons Apple decided to delay the move to 64-bit, even though they had 64-bit hardware back in 2003, with introduction the G5.

    Posted September 27, 2006, 1:38 pm
  11. 11 Andrew says:

    XNU absolutely has to support EM64T. If this weren’t the case, how could the kernel service sytem calls from 64-bit processes? Assuming XNU works either with a SYSENTER or an “int x” instruction, how could it access the dataspace of the calling process? How would write() work? How would brk() work? What would happen when a 64-bit process had a page fault at address 00001000_00000000, and data needed to be swapped in?

    Posted September 27, 2006, 7:33 pm
  12. 12 waffffffle says:

    The OS X kernel must be 64-bit. I have 6.5 GB of RAM in my PowerMac G5 (first revision). A 32-bit OS cannot handle more than 2 GB of RAM. The kernel addresses the RAM, therefore the kernel is 64-bit.

    Posted September 27, 2006, 7:57 pm
  13. 13 MaxInux says:

    Uhm there are many methods to address >2gig of ram with a 32bit kernel… NT does it, novell does it….. the method name escapes me.. pae mainly?

    AWE .. address window extensions
    PAE physical address extensions

    Max

    Posted September 27, 2006, 8:41 pm
  14. 14 Jon says:

    Obviously there is something very wrong with the dot product test under ppc64. As well, profiling GeekBench with a tool like Shark will probably let you properly optimize the app for 64-bit systems. But that’s not really necessary, at least on the PPC.

    Posted September 27, 2006, 8:53 pm
  15. 15 Daniel Bisping says:

    Right, the PPC970 can natively handle 32 and 64 bit. It’s the bridge between the POWER 3/4 lines and the PowerPC 7410.

    Wrong, the PPC is not exotic. It’s in every damned new car on the road, minus 2 or 3. It’s a cockroach!

    In a good way, though.

    The thing is you can’t compare apples to oranges, sic. You never could. The Intel designs are not the IBM/Freescale mantra, as it were.

    Please, they don’t sell the Calgon near me, so if you would then geek out better.

    The next thing I don’t want to read is how MySQL is better than PostgreSQL or vice versa.

    The premise of “purpose built” really needs to become the chant. The tools at hand are powerful and elegant even if not in that order at all times.

    I mean really, we are talking about how software performs on hardware.

    Regardless, the conversation is good. I had to look up a few things. I like that.

    Posted September 27, 2006, 10:02 pm
  16. 16 hayden says:

    ahhh so this is where the real geeks hang

    Posted September 28, 2006, 5:10 am
  17. 17 Eric says:

    It is my understanding that the OS X 10.4 kernel is 64-bit but the Aqua user interface is all 32-bit. This means that you can compile and link programs in 64-bit mode only if you are not linking to any of the user interface. Somewhere I saw a discussion of a way to fake a 64-bit program with a GUI by having two separate programs that work together, one 32-bit to do the GUI and one 64-bit for whatever.

    Posted September 29, 2006, 8:24 am
  18. 18 tji says:

    I have seen all kinds of contradictory information on 64 bit mode with Intel processors. I have seen it stated that 10.4.7 added support for 64 bit mode in new Mac Pro and iMac systems. But, I have also seen it stated that pointers are still 4 bytes on the new iMacs. Obviously, these systems are 64 bit capable. But, unless the kernel runs in 64 bit mode, it’s not gonna run 64 bit code.

    Anyone have a new Mac Pro / iMac / or MacBook Pro? What does sizeof() say for long integers and pointers?

    Also, I was surprised to see that the new Core 2 Duo MacBook Pros had absolutely NO mention of 64 bit support in any of the Apple information. I wonder why that is..

    Posted October 28, 2006, 1:52 pm
  19. 19 John Truong says:

    I believe the issue has to do witih CISC vs. RISC. In a CISC instruction set with variable length instructions, you can (as a processor vendor) add a new instruction to load 64 bit values into memory locations and registers with one instruction, using probably one clock cycle. In RISC, the instruction length is fixed. For PPC, instructions are 32 bits long, and only 16 bits are available for an immediate value.
    To load a 32 bit immediate value into a register, you have to use two instructions to load the 32 bits, 16 bits at at time.

    To load a 64 bit immediate value into a register, you have to use two instructions to load the lower 32 bits of the register (16 bits at a time), then another instruction to load those bits into the higher 32 bits, then two more instructions to load the lower 32 bits again. That’s 5 instructions to make up a 64 bit value in a register compared to 2 instructions for the 32 bit case.

    It looks to me a case of making 64 bit possible, but optimizing for 32 bit. Perfectly reasonable for a time when the PC world was still trying to get everyone on board with 32 bit software. Now that the industry is shifting to 64 bit, though, it’s a design philosophy that’s come back to haunt them.

    Source: http://www-128.ibm.com/developerworks/linux/library/l-ppc/

    Posted November 2, 2006, 1:49 pm