Raytracing faster with Rust and Rayon
Being able to quickly turn a single-threaded application into a multi-threaded one is one of the Holy Grails of programming languages these days. Modern pure functional languages offer the most obvious alignment with these goals, but what if you prefer imperative programming?
I’ve been using Rust for a few years now. One of the basic ideas behind Rust is that, by telling the compiler more about your intentions for the code you’re writing, the compiler can enforce them and make sure it’s more correct. Those of you who currently worship at the altar of const-correctness in C++ will likely fit right in.
This has some interesting ramifications:
- You can have the code manage heap allocations automatically without needing a garbage collector, bulky runtime, or explicit reference counting,
- More of your code becomes pure-functional than it would if you were writing in C/C++,
- And for the purposes of this post, multithreading is much easier since the compiler can detect potential race conditions.
Rayon is a cool new library written in Rust. The basic theory behind it is that you can take anywhere you use an iterator - and just turn it into a parallel iterator. That’s it, now your code is multithreaded. Internally, it also has a work-stealing queue, and has deep customization, but I wanted to see how easy the out of the box experience was.
I had the perfect candidate lying around already - a path tracer. A raytracer is a great model for multithreading, because it’s trivially parallelizable. In my model, each pixel of the raytrace job doesn’t care about what the other pixel does.
Let’s take a look at the main loop of my raytracer before it was parallelized with Rayon:
for y in 0..image_height {
for x in 0..image_width {
let mut colour = Vector3::new(0.0, 0.0, 0.0);
for _sample in 0..number_of_samples {
let u = (x as f64 + rng::gen<f64>()) as f32 / image_width as f32;
let v = (y as f64 + rng::gen<f64>()) as f32 / image_height as f32;
let ray = camera.get_ray(u, v);
colour += raycast(&ray, &world, 0);
}
}
}
colour /= number_of_samples as f32;
It’s pretty straightforward. For each pixel of the target image, we dispatch number_of_samples
rays with some random offset, and then combine the result to get a final colour for that pixel.
After adding Rayon, the main loop now looks more like this:
// launch parallel iterator
let mut colour: Vector3<f32> = (0..number_of_samples).into_par_iter()
.map(|_sample| {
// need a new rng for each thread
let mut rng = thread_rng();
let u = (x as f64 + rng::gen<f64>()) as f32 / image_width as f32;
let v = (y as f64 + rng::gen<f64>()) as f32 / image_height as f32;
let ray = camera.get_ray(u, v);
// Return the colour (note lack of semicolon)
raycast(&ray, &world, 0)
})
.sum();
colour /= number_of_samples as f32;
We had to drop the for
-loop and convert it to an iterator/lambda-style syntax, but that’s the big code change.
It wouldn’t compile, complaining that a lot of my raytracer types (materials, geometries, etc.) didn’t have the Sync
trait. In Rust, think of traits like you would interfaces in other languages. In the specific case of Sync
, it’s a “marker” trait, which tells the compiler that it is safe to share a reference to an instance of the type between threads.
That’s as simple as changing something like this:
pub trait Material {
fn scatter(&self, r_in: &Ray, rec: &HitRecord, attenuation: &mut Vector3<f32>, scattered: &mut Ray) -> bool;
}
To something like this:
pub trait Material : Sync {
fn scatter(&self, r_in: &Ray, rec: &HitRecord, attenuation: &mut Vector3<f32>, scattered: &mut Ray) -> bool;
}
The whole commit was only 15 lines of code, and now I have a multithreaded raytracer. Is it any faster? Well, on my mid-2012 Macbook Pro, it took a large scene from 303 seconds to only 93 seconds to render. That’s a pretty good return on investment!
I’ll be continuing with this raytracer in the future, but for now I’m pretty pleased with how Rayon has worked out for me.
And since you can’t be expected to read an entry about a raytracer without seeing a picture: