Benchmarking M1 Pro with Blender 3.1a Cycles with Metal Backend

blender m1 pro macbook apple silicon benchmark
Blender development team and Apple have worked together to achieve a small wonder in bringing back GPU rendering to Macs. To make things even more interesting, they started with their Apple Silicon chip no less. I've spent some time to see how much speed can be gained in using Metal-enabled GPU and CPU rendering in blender 3.1a compared to just CPU rendering in blender 3.0

Below is a video of me doing and explaining the benchmarks using my 14″ M1 Pro MacBook Pro:

Quick explanation

What do all these numbers mean

Here’s how macOS measures CPU load: it assumes 1 full core as 100% load, so having 8 cores would result in a total of 800% full CPU load. GPU is measured within the 100% limit, so it comes in a “what you see is what you get” fashion.

I am also providing the basic overview of the benchmark setups (all are default except for barbershop, where I cut the samples from 800 to 100), resolution modifier or effective resolution, time in minutes:seconds, and overall speedup result. So, let’s see the numbers.

BMW MBP M1 Pro Blender Benchmark

BMW MBP M1 Pro Blender Benchmark

This is an old bench, but it provides an insight into the brute force of rendering not a lot of geometry with simple-ish shaders. The most affordable Pro Apple Silicon with Metal backend blender gives an uplift of at least 3X using the GPU mode, and 3.4X using GPU+CPU mode. Not a bad start.

Classroom MacBook Pro Blender Benchmark

Classroom MacBook Pro M1 Pro Blender Benchmark

A challenging enough scene that relies on GI bounce for light. Rendering a fullHD result at 300 samples gives us enough time for MacBook Pro to utilize all of its cores. 

Speaking of utilization, and this will be a returning result, the pure CPU rendering uses about 760%-770%. Rendering using Metal backend with just GPU uses about 99% or more of its potential, and leaves the CPU around 3% or less. And remember, that is just out of 1 core, so it uses virtually nothing. 

Going GPU+CPU usually uses around 600% of CPU and 97% or so of GPU. Only notable exception would be Junkshop. More on that below.

All in all we get 2.59X speedup with GPU, and 3.0X with GPU+CPU mode.

Barbershop Apple M1 Pro Blender Benchmark

Barbershop Apple M1 Pro Blender Benchmark​

This particular scene is really noisy to render, due to how it is setup. The default amount of samples to render this is 800, but it would take way too much waiting time to render. I figured it would be reasonable to cut the samples to 100 and see the results. 

The utilization is about the same as everywhere, scene itself loads within 11-12 seconds, which we are counting as render time in final results.

The MacBook Pro ends up being 2.08X faster with GPU exclusively, and 2.24X faster using both CPU and GPU.

Spring Apple Silicon Benchmark

Spring Apple Silicon M1 Pro Blender Benchmark​

We are finally getting into more modern territories. Spring was a splash screen for 2.80, and it is a great scene. It uses just 300 samples and denoises it afterwards, resulting in a clear image, instead of just rendering thousand samples or around that. Scene populates in 24-27seconds, we are counting that as rendering time.

The “basic” 14″ Macbook Pro on Apple Silicon manages to get 1.89X speedup using just GPU, and 2.34X speedup utilizing both GPU and CPU. It does not sound like too much, admittedly the scene population hurting a more impressive number. Maybe this process will get faster with next iterations of Cycles for macOS.

Junkshop Apple Silicon M1 Pro Blender Benchmark

Junkshop Apple Silicon M1 Pro Blender Benchmark​

Really interesting this one. For starters, it uses only 50 samples and denoises afterwards, giving a great result in the end. It also is the only benchmark that renders in GPU mode faster than in GPU+CPU mode.

Not sure what exactly is going on here, but I can tell that CPU was somewhat underutilized, getting around 510% usage in the combined mode. The result is 1.75X speedup in GPU+CPU mode, and interestingly enough, 2.17X in GPU mode. Scene population was 11sec in 3.0, and 9 and 8 seconds in 3.1a.

PartyTug Apple Silicon EEVEE Benchmark

PartyTug Apple Silicon M1 Pro Blender Benchmark​

Finishing off we have the PartyTug scene, the splash screen from 2.83. It uses the “realtime” EEVEE rendering engine, which does not benefit as much from the 3.1a patches if at all. I attribute the 6% increase in speed to overall blender speedup that is being worked on in the new version. Still, it is faster and uses 200MB less memory.

What is interesting about this one though is that enabling Screen Space Reflections, Ambient Occlusion, Bloom, Motion Blur and other EEVEE effects did not halt of freeze the M1 Pro 14core GPU. Just a year ago I tried to use those effects on M1 Mac Mini, and enabling just SSR would drop the frames into single digits zone. It was the opposite of fun to use. Now it is much improved and really is a good experience.

What about M1 Max MacBook speedups?

Sadly, I do not have an M1 Max Apple Silicon machine in my possession, and I could not try the benchmarks. However, talking to those who do have such machine, the speedups are anywhere from 3X to 4X. Keep in mind, M1 Max machines also have a 10 Core CPU, which is significantly faster that my 8 core machine to begin with, and would render the scenes at least twice faster than my machine.

To sum up, it is not as fast as latest and greatest NVidia or AMD machines, but it is significantly faster than it was, say, a week ago. And the development team recently patched the Alpha version so it does not crash when we are rendering in viewport, which is basically a must have for developing shaders, looks, and light setups. It will only get better from here, and it already is a great start.

— Song

Forum update link: https://devtalk.blender.org/t/cycles-apple-metal-device-feedback/21868 

Blender daily builds: https://builder.blender.org/download/daily/

Intel based machines will get similar patch later, read more here: https://developer.blender.org/T92212