PlayStation 3 Performance
Update: Check out the latest Playstation 3 benchmarks which use Geekbench 2 (the new hotness) instead of Geekbench 2006.
On Sunday I saw a clip of Fedora Core 5 for PPC running on the PlayStation 3 over at Kotaku; I’d completely forgotten that Sony was going to make it easy to boot other operating systems on the PlayStation 3!
On Monday I started receiving requests for Geekbench for Linux PPC so people could run it on the PlayStation 3. I managed to get a beta version out last night and while it’s not quite ready for public release yet, one beta tester sent in the results for his PlayStation 3 which I thought I’d share. To give the results some context, I’m going to compare the PlayStation 3 results against one of the first Power Mac G5s running at 1.6GHz.
Update: Geekbench 2006 for Linux PPC is available now, if you’re interested in benchmarking your PlayStation 3 (or PPC-based Linux box) yourself.
Setup
Playstation 3
- Cell Broadband Engine @ 3.2GHz
- 256 MB RAM
- Fedora Core 5
- Geekbench 2006 (Build 243)
Power Mac G5
- PowerPC G5 @ 1.6GHz
- 1280 MB RAM
- Fedora Core 4
- Geekbench 2006 (Build 243)
I’m reporting the baseline score, rather than the raw score, for each test (where 100 is the score a PowerMac G5 1.6GHz running Mac OS X would receive on the same test). As always, higher scores are better.
Results
Overall Score
Integer Performance
Floating Point Performance
Memory Performance
Stream Performance
Conclusion
There was a comment on Slashdot last year that made the following assertion about the Cell processor:
The problem is that though the main CPU is PowerPC-based like current Apple chips, it is stripped down, and the Altivec support will be much lower than in current G5s. Unoptomized, Apple code would run like a G4 on this hardware.
Turns out the comment was right; Cell processor performance is comparable to low-end PowerPC G5 performance (which in turn is comparable to high-end PowerPC G4 performance). I can’t comment on Altivec performance, unfortunately, since Geekbench for Linux PPC doesn’t measure Altivec performance yet.
Geekbench also isn’t able to exploit the eight vector processors on the Cell processor. Any program designed and optimized for the Cell processor should be a lot faster than one designed for a generic processor (like, say, Geekbench). So while the Geekbench results might seem disappointing, keep in mind that Geekbench can’t exercise the PlayStation 3 to its full potential.
Trackbacks & Pingbacks
- Link Patrol | Geek Patrol | PlayStation 3 Performance pingbacked Posted November 22, 2006, 3:59 am
- Geek Patrol » PlayStation 3 Performance pingbacked Posted November 23, 2006, 10:47 pm
- An Especial Web Log for Alex Wu » links for 2006-11-24 pingbacked Posted November 24, 2006, 11:20 am
- Newslens Episode 4 at Wiggler pingbacked Posted November 27, 2006, 1:26 am
- In the mind of KCorax :: Newslens podcast Επεισόδιο 4 :: November :: 2006 pingbacked Posted November 27, 2006, 3:10 pm
- domelhor.net trackbacked Posted December 2, 2006, 1:52 pm
- Poor Linux Performance on PS3 « My Two Cents pingbacked Posted December 4, 2006, 3:40 am
- Playstatic » Linux on Playstation 3 is slow pingbacked Posted December 5, 2006, 7:22 pm
- Console News - The Latest PS3, Wii, Xbox 360, DS and PSP News » Blog Archive » Geekbench for PS3 Linux: PlayStation 3 Performance pingbacked Posted December 27, 2006, 2:17 pm
- PS3 using Fedora 5 vs G5 using Fedora 4 « This too was Dugg by … pingbacked Posted March 16, 2007, 5:03 pm
- Benchmarks of Linux on the Playstation 3 pingbacked Posted February 25, 2009, 7:17 pm

For a doubling of the clock speed I expected more of an increase compared to the G5. If anything it shows there’s a lot more to the Cell processor than most people think – it’s not going to compare well to other processors without the benchmark application leveraging the horsepower behind the Cell which breaks the “objectiveness” of a program that doesn’t cater to any processor.
It doesn’t show that the Cell processor has a lot of potential, it shows that when compared to general cpu’s doing somewhat general tasks, the Cell under performs greatly. This is not a good thing.
I know the linux community is all about optimizing to hardware, but I hope people realize what it really takes to optimize for the cell and its oh so special SPE’s.
On a gaming standpoint, this really shows how badly unoptimized games run, and I honestly dont think optimizing for cell is going to net as huge of gains as a cpu like a conroe. Just my opinion though.
Would stripping the mac of 1GB of memory make a difference?
The Cell architecture in the PS3 is designed for a very specific purpose, and that’s one reason there’s no Altivec. The PS software is written (one would hope) to maximise the multiple vector processors in the Cell. I would both agree and disagree with Jacob, in that pooly optimised code can, and will, cripple almost any system, but it would seem that the PS3 is going to be especially sensitive to bad code. Optimising for the Cell should prove to be a… unique… experience.
If Geekbench were recomplied to exploit this I would expect significantly higher numbers. I would be curious to see the results of the G5 running the bench with equal amounts of RAM…
the 1.6 is special in the fact it is also one of 2 g5s that can use 256 ram, all of the macs after it and the single 1.8 needed at least 512
Running the Mac with 256MB of memory (instead of 1.25GB of memory) might make a small difference, but I’ve found that Geekbench results don’t vary with the amount of physical RAM installed provided the amount of free RAM is over 100MB when Geekbench runs.
If I’m feeling adventurous I might try removing 1GB of memory from the Power Mac G5 and re-running the tests, but I’m willing to bet the results will be almost identical.
I’d just like to point out, that on the 27th, I believe, Yellow Dog Linux will be released. It’s designed to take advantage of the Cell processor architecture. If you guys read this post, think you could try this again with Yellow Dog, or do you think the results would not change?
Did you use a compiler with cell support to build geekbench for PS3? I’m not sure if it would make much of a difference, but if you’re just running a general PowerPC build of it on Cell, you may be suffering performance penalties. A compiler with Cell support will create code taking into account it’s in-order, pipeline and branching characteristics, all of which may not be of minor relevance.
All of that said, it’s not that surprising. The PPE in Cell is not meant to be as robust as a larger core..the Cell programming model going forward will centre mainly around the SPUs, and that requires very focussed and specific work which would be at odds with an effort to comparatively benchmark the system versus others.
It might also have some bearings that:
1.) PPE has 32 KB L1 I-Cache while the PowerPC 970 has 64 KB of L1 I-Cache.
2.) The PowerPC 970 is a dynamically scheduled (Out Of Order execution) core, albeit tracking instruction groups and not individual instructions. The PPE is a statically scheduled (In program Order execution) core. The G5 can cover L1 cache misses and mitigate a bit L2 cache misses, CELL’s PPE just stalls.
3.) CELL’s PPE execution units: 1 Load/Store Unit (or LSU) and 1 FXU (arithmetic integer and fixed-point instructions as well as shifts and rotates, etc…) that are fed from the main instruction queue and 1 VSU (Vector/Scalar Unit: it has its own instruction queue that feeds the VMX unit and the FPU) so the very best case scenario would be four decoded instructions executed per clock cycle with the right mix of FP heavy and Integer heavy threads.
http://researchweb.watson.ibm.com/journal/rd/494/kahle.pdf (and also
the CELL BE Programming handbook
http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/9F820A5FFA3ECE8C8725716A0062585F/$file/BEHandbookv1.0_10May2006.pdf
at page 751 details the valid dual-issue possibilities, but note that the main Issue Queue can issue two instructions like 1 LSU and 1 FXU while the Vector/Scalar Issue Queue issues two instructions to the VMX unit and to the FPU block in theory).
To make a long story short, the PowerPC 970 has 2x the number of FXU’s and 2x the LSU’s compared to the PPE and in the very best case scenario it can execute 8 decoded instructions (IOP’s) in parallel when all the issue queues have something to deliver (do not count the Branch Unit and the Condition Register Unit):
http://arstechnica.com/cpu/02q2/ppc970/images/ppc-970-large-diagram.png
http://arstechnica.com/cpu/02q2/ppc970/screenshot-1.html
The PPE issues 2 instructions per cycle max, not four. The G5 can potentially issue more instructions per cycle.
Also the G5 has 2 seperate integer units, and 2 FPU units – compared to 1 integer and 1 FPU unit on the PPE. This may also help to explain the relative performance in the benchmark code.
Crazyace said:
Crazy, I did forget to note about the dual FPU, but I did mention the ALU disparity.
I did not that the G5 can issue up to 8 IOP’s to its execution units (from the bundle of five IOP’s it is tracking).
Buddy (I assume you are the B3D’s and GAF’s Crazyace
), I did not think my post was that messily written
. Well, it was a bit
.
Also, I am under the impression given all IBM literature that the Main Issue Queue and the Vector/Scalar Queue can both dual-issue: the main Issue Queue feeds up to two instructions per cycle to either the FXU, the LSU, etc… or to the Vector/Scalar Queue (which affords to execute Vector/Scalar FP instructions out of order compared to rest of the code, as stated in the IBM’s manuals). The Vector/Scalar Queue can issue up to 2 instructions.
Well, at least I have to find a place in the docs where it specifically say that it cannot work like that, but I do not see why there could not be the right mix of instructions to produce a situation where in the main Issue Queue there are two instructions (not nop’s) to be issued to say the LSU and the FXU while also in the Vector/Scalar Queue there is at least an instruction that is really ready to be issued to either the VMX unit or the scalar FPU.
CELL BE Programming Handbook said:
In plain words, from main pipeline stage IS2 the “Dual-Instruction Stall and Issue” Unit (stages IS1 and IS2), as the diagram at page 300 of said Handbook also shows, up to two decoded instructions are issued to a sea of destinations such as the FXU, the LSU or also the VSU (our Vector/Scalar Unit and related Queue).
I am missing the point in which at stage IS2 there could not possibly be one or two instructions (say one for the FXU and/or one for the LSU) to issue right away and at the same time being a similar situation at the bottom of the Vector/Scalar Queue after stage VQ8 (even though if we look at the diagrams is really after stage RF0 that you go in either the VXU (they could not just call it VMX… bah
) and/or the FPU pipeline
I hate to be to critical here, but the truth is that your benchmarks and comparisons are absolutely worthless.
Run some heavy benchmarks that have been optimized to take advantage of the SPEs, and you will find that the G5 you reference will be out done 6 – 8x by the PS3.
Your comparison is just about as crazy as it would be to compare the picture quality on a 1080P HDTV to that of a ten year old SD TV when an SD signal is run into both of them (pretty pointless). But, you could comclude that the old SD TV looks better.
Also, as far as the yellow dog comment goes … It may turn out to be a great OS, but it wont re-write your benchmark program for you, thus the results will be nearly when run under YD Linux.
bigViking,
I’m not sure how this comparison is useless? There was a lot of discussion on the interweb about whether or not Apple should’ve switched to the Cell processor instead of Intel processors. I think this comparison shows how bad a move that would’ve been.
After all, comparing picture quality between an HD and a SD TV isn’t pointless if the signals you’re dealing with are SD signals.
The big question is how much access you have to the hardware when running linux on the ps3.
For instance, how much ram can you use? If you’re limited to 256mb, a modern linux desktop is going to suffer. (I believe the memory is split 256/256 between the cpu and gpu with a dma controller between them, using all that ram efficiently won’t be easy for a desktop os)
And do you even get access to any of the SPEs from linux? I think on ps2linux they gave you access to both VUs but not the IOP.
And finally, lets hope we get an acellerated X driver for the RSX. Linux is going to suck without it.
In the end, if mplayer gets optimized video decoding and rendering for the ps3, i’ll be happy
Interesting info to report!
There are a couple of issues here that should be highlighted. One is that Linux on PS3 is a work in progress. That is there are a lot of updates scheduled to go into kernels 2.6.20 and 2.6.22 I believe. So in a sense I believe we are jumping the gun somewhat. I don’t expect huge improvements in the future but for synthetic bench marks like these there may be some positive results.
This doesn’t even take into account maturing libraris and compilers.
In any event the thing that really scuttles the PS3 in my mind is the puny allotment of RAM. When it comes to running Linux RAM is everything. Well atleast for general desktop usage. PS3 with Linux might make a good compute node, but I see it as having a hardtime competeing with my desktop PC running Linux.
Life will be very interesting a year down the road though.
Dave
I agree that the comparison is meaningless, because the codes of the benchmark software is not optimized for Cell processor. It didn’t reveal the true computational power of a Cell processor. Soon there will be new benchmark software available, optimized for Cell processor. Then we will truly see how well the Cell processor will perform. Also the other factors such as the amount of ram, the interface designs, and the other hardware components on the motherboard will affect how well the final system performs.
At this moment, the only meaningful comparison will be to compare the theoretical raw performance of a Cell processor and a G5 Processor, based on the different architecture of two.
HA! … I hope you guys all know your nerds….I love you though.
My opinion:
1. Playstation 3=gaming consol
2. Playstation 3=computer
3. I need each.
4. G5 cannot do each.
5. I get Playstation 3.
thanks for ur time. back to nerd talk.
ding
linux on ps3 it’s a true disappointment, his in order ppe is comparable to a G4, less than 190 MB available, no 3D acceleration..
anyone can run this test on a last intel and amd processors to make other comparisons?
Interesting comparison with a Xeon, it’s from 2-4 to 10-18 times fast than cell with the test..
Mac Pro con Xeon a 3 GHz – ps3
Benchmark Score Rate Result
Emulate 6502
single-threaded scalar 245.4 – 105.2
Emulate 6502
multi-threaded scalar 974.6 – 57.3
Blowfish
single-threaded scalar 231.2 – 118.7
Blowfish
multi-threaded scalar 920.4 – 165.6
bzip2 Compress
single-threaded scalar 314.1 – 89.8
bzip2 Compress
multi-threaded scalar 1194.4 – 124.1
bzip2 Decompress
single-threaded scalar 289.8 76.6
bzip2 Decompress
multi-threaded scalar 1353.3 – 99.5
Floating Point Performance
Benchmark Score Rate Result
Mandelbrot
single-threaded scalar 202.5 – 49.0
Mandelbrot
multi-threaded scalar 807.5 – 72.1
Dot Product
single-threaded scalar 362.9 120.0
Dot Product
multi-threaded scalar 1134.5 119
Dot Product
single-threaded vector 173.5 70
Dot Product
multi-threaded vector 516.7 119
JPEG Compress
single-threaded scalar 218.9 70
JPEG Compress
multi-threaded scalar 877.6 94.8
JPEG Decompress
single-threaded scalar 230.5 61.6
JPEG Decompress
multi-threaded scalar 788.9 72.9
Memory Performance
Benchmark Score Rate Result
Read Sequential
single-threaded scalar 368.7 51.9
Read Sequential
multi-threaded scalar 183.0 56.9
Write Sequential
single-threaded scalar 573.6 194
Write Sequential
multi-threaded scalar 281.3 191
Stdlib Allocate
single-threaded scalar 405.8 43.4
Stdlib Allocate
multi-threaded scalar 52.1 51
Stdlib Write
single-threaded scalar 121.5 331
Stdlib Write
multi-threaded scalar 204.4 365
Stdlib Copy
single-threaded scalar 243.3 64.5
Stdlib Copy
multi-threaded scalar 441.1 102
Stream Performance
Benchmark Score Rate Result
Stream Copy
single-threaded scalar 213.0 89.7
Stream Copy
multi-threaded scalar 359.7 109.9
Stream Copy
single-threaded vector 209.5 101.4
Stream Copy
multi-threaded vector 336.9 62.6
Stream Scale
single-threaded scalar 226.2 93.2
http://www.geekpatrol.ca/browse/2006/?view&id=10783
Personally I’d like to see you do a benchmark comparison with this:
http://www.terrasoftsolutions.com/products/mercury/
I’d have to say that I’m a little surprised by the results. While the PPE is a striped down G5 I would still think it would perform better at 3.2 GHz then a G5 at 1.6 GHz. Is there something I’m missing here? I thought it was because it was the difference in ram but a comment already said the changing the ram wouldn’t change the results.
While the numbers are comparable most of the time there are one test where the Cell completely slaughters the G5. Can anyone tell me why the Stdlib Write is ridiculously better then the G5?
A lot has been said about how it’s not using the SPE’s. Personally I could careless about that for the moment. A lot has been said about Intel Macs beating PPC macs but the comparison has never been fair because it was a dual-core vs a single-core. Maybe if they both had dual-cores it would be fair.
ohh and canyou coment on what compiler and options you used to compile it with please.
also keep in mind that the G4 type Altivec and the G5 type Altivec need to be treated slightly differently to get the best out of them…
The memory performance of the CELL PPC core looks strange
The read performance is very low
and the write performance looks ok.
Maybe there is an issue with the silicon or software that
is killing the memory read performance.
which would affect the rest of the other tests.
Has anyone run a read latency test?
Anyone know what the expected memory performance should be?
well i see only one point off that test.. : it shows
what performace we can get from ps3 with currently existing linux
..ps3 architeture is wery diferent from currently existing ppc & pc
I don’t know if I’m fair saying, N00bs, but i’m going to.
Firstly, the CELL running linux is currently emulating the graphics (acting as a gfx chip, the RSX isn’t used) and various other interfaces (+managing the system os spe). This hugely lowers performance and lowers the 256mb of memory to 192mb in reallity (the 64mb is reserved for gfx). Sony are supposedly going to add more hw support (especially gfx) later on.
Secondly, the linux OS uses NONEONONEONEONEONEONEONE sorry, NONE!!! of the CELL cpu’s SPEs, which provide the builk of its power. This will remain true until programs are coded specifically for linux running on a CELL chip.
http://patchwork.ozlabs.org/linuxppc/list?q=ps3
tells you whats patchs have been submitted so far if your interested.
intigrating the Altivec optimised code into the PPC linux tree would be a very good start for all the ppc/Cell based machinesm never mind the SPE’s.
if your really into helping the PPC projects then you could help
theres always libfreevec http://freevec.org/ that might even out the scores and improve overall general throughput for all apps if someones willing to take the time to include it as a PPC GLIBC replacement.
“libfreevec is a free (LGPL) library with hand-optimized replacement routines for GLIBC, such as memcpy(), strlen(), etc. These routines have been written specifically to take advantage of the AltiVec unit (a.k.a Velocity Engine or VMX), “
“For example, did you know that you can do byte swapping with AltiVec 7 times faster than with scalar code? Or that it is possible to sort integers and floats 4 times faster with the help of AltiVec? Were you aware that it helps to do string searching faster? Memory hashing gets upto 7 times faster. The list could just go on and on… “
http://www.powerdeveloper.org/forums…=asc&start =0
you could even take the libfreevec idea and extend it to include the SPE’s as well if your clever enough, (most are not, so it doesnt happen….), markos is back and working on ALtivec again after time away so you might help him and help yourself in the process.
I wonder if they could get Geekbench running on a single SPE?
Obviously real world performance would be much much less than the sum of the 6 available SPEs but it’d be interesting to see nonetheless.
Here’s one figure to help make some wild guesses though: they reserved 2 SPEs in Resistance just for collision detection.
Interesting results though.
I’m wondering now when they’ll finally get fixes for yellow dog linux on PS3 (full use of the rsx and SPEs). I’m just itching to see if performance results using linux would increase for the PS3!! drools ..
..by the way.. can any of you state, in one sentence, just HOW powerful is the PS3 when weighed against the 360? or is that question irrelevant to the topic, hehe, sorry.
It is hard to judge how powerful the PS3 is in comparison to the 360. The 360 would definitely outperform the PS3 if you were able to run a benchmark similar to this one, simply because it uses a more common multi-core design.
But, if you were to run the benchmark with code designed to take full advantage of the Cell, as well as the 360’s multiple cores, you’d likely see the Cell take the lead.
The problem is that the Cell is simply too new, we just don’t have the code necessary to make full use of it…yet.
Combined with the fact that it appears to me, as a novice programmer, quite difficult to write code that would make efficient use of all the SPEs in the Cell.
The real question to me, is whether or not it would even be worth the effort to write software for a linux environment that would effectively unlock the full potential of the Cell; I’m certain the current answer would be “No”.
Could you use ppu-gcc where available from Barcelona Super Computing Center http://www.bsc.es/plantillaH.php?cat_id=252 to compile geekbench?
In our evaluation ppu-gcc has nearly 20% improvement on speed compared to current gcc which generate pure PowerPC instructions while ppu-gcc generates Cell optimized binary. Sony Computer Entertainment is currently pushing cell optimization code to go into mainline gcc which expecting to be merged around gcc 4.3.
Unstill the merged gcc become the default compiler for common Linux distribution, using ppu-gcc will be greatly appreciated for benchmarking purpose.
For which GeekBench version are these results? At least “Geekbench 2006 (build 250)” runs happily the vector floating point calculations as well, though on PS3 they perform half slower than same as scalar. Perhaps build 250, too, does not yet contain altivec support?
Jan,
The results are from Geekbench 2006 (build 243); SIMD support on Linux PPC was added in Geekbench 2006 (build 250). From what I’ve heard the Cell processor doesn’t execute SIMD instructions very well, plus Geekbench itself isn’t optimized for the Cell processor, so I’m not surprised that SIMD benchmarks are slower than expected on the Cell.