Skip to content

Mixing C++ and Rust for Fun and Profit: Part 2 Of structs and strings

In the beginning, there was C.

That sentence actually could serve as the introduction to a multitude of blog posts, all of which would come to the conclusion “legacy programming conventions are terrible, but realistically we can’t throw everything out and start over from scratch”. However, today we will merely be looking at two ways C has contributed to making language interoperability difficult.

extern "C", but for structs

In the first installment of this series, I mentioned that one blocker to language interoperability is struct layout. Specifically, different programming languages may organize data in structs in different ways. How can we overcome that on our way to language interoperability?

Layout differences are mostly differences of alignment, which means that data is just located at different offsets from the beginning of the struct. The problem is that there is not necessarily a way to use keywords like align to completely represent a different language’s layout algorithm.

Thankfully, there is a solution. In our example used previously, we were using Rust and C++ together. It turns out that Rust can use the #[repr(C)] representation to override struct layouting to follow what C does. Given that C++ uses the same layouting as C, that means that the following code compiles and runs:

// file: cppmodule.cpp
#include <iostream>
#include <cstdint>

struct Foo
{
    int32_t foo;
    int32_t bar;
    bool baz;
};

void foobar(Foo foo)
{
    std::cout << "foo: " << foo.foo
              << ", bar: " << foo.bar
              << ", baz: " << foo.baz
              << '\n';
}
extern {
    #[link_name = "_Z6foobar3Foo"] pub fn foobar(foo: Foo);
}

#[repr(C)]
pub struct Foo {
    pub foo: i32,
    pub bar: i32,
    pub baz: bool,
}

fn main() {
    let f = Foo{foo: 0, bar: 42, baz: true};
    unsafe {
        foobar(f);
    }
}

My proof-of-concept project polyglot automatically wraps C++ structs with #[repr(C)] (and also does so for enums).

The one major downside of this approach is that it requires you to mark structs that you created in your Rust code with #[repr(C)]. In an ideal world, there would be a way to leave your Rust code as is; however, there is currently no solution that I am aware of that does not require #[repr(C)].

Arrays, strings, and buffer overflows

Now that we’ve covered structs in general, we can look at the next bit of C behavior that turned out to be problematic: handling a list of items.

In C, a list of items is represented by an array. An array that has n elements of type T in it really is just a block of memory with a size n * sizeof(T). This means that all you have to do to find the kth object in the array is take the address of the array and add k * sizeof(T). This seemed like a fine idea back in the early days of programming, but eventually people realized there was a problem: it’s easy to accidentally access the seventh element of an array that only has five elements, and if you write something to the seventh element, congratulations, you just corrupted your program’s memory! It’s even more common to perform an out-of-bounds write when dealing with strings (which, after all, is probably the most used type of array). This flaw has led to countless security vulnerabilities, including the famous Heartbleed bug, (you can see a good explanation of of how Heartbleed works at xkcd 1354).

Eventually, people started deciding to fix this. In languages like Java, D, and pretty much any other language invented in the last 25 years or so, strings (and arrays) are handled more dynamically: reading from or writing to a string at an invalid location will generally throw an exception; staying in bounds is made easy by the addition of a length or size property, and strings and arrays in many modern languages can be resized in place. Meanwhile, C++, in order to add safer strings while remaining C-compatible, opted to build a class std::string that is used for strings in general (unless you use a framework like Qt that has its own string type).

All of these new string types are nice, but they present a problem for interoperability: how do you pass a string from C++ to Rust (our example languages) and back again?

Wrap all the things!

The answer, unsurprisingly, is “more wrappers”. While I have not built real-life working examples of wrappers for string types, what follows is an example of how seamless string conversion could be achieved.

We start with a C++ function that returns an std::string:

// file: links.cpp
#include <string>

std::string getLink()
{
    return "https://kdab.com";
}

We’ll also go ahead and create our Rust consumer:

// file: main.rs
mod links;

fn main() {
    println!("{} is the best website!", links::getLink());
}

Normally, we would just create a Rust shim around getLink() like so:

// wrapper file: links.rs
extern {
    #[link_name = "_Z7getLinkB5cxx11v"]
    pub fn getLink() -> String; // ???
}

However, this doesn’t work because Rust’s String is different from C++’s std::string. To fix this, we need another layer of wrapping. Let’s add another C++ file:

// wrapper file: links_stringwrapping.cpp
#include "links.h" // assuming we made a header file for links.cpp above

#include <cstring>

const char *getLink_return_cstyle_string()
{
    // we need to call strdup to avoid returning a temporary object
    return strdup(getLink().c_str());
}

Now we have a C-style string. Let’s try consuming it from Rust. We’ll make a new version of links.rs:

// wrapper file: links.rs
#![crate_type = "staticlib"]

use std::ffi::CStr;
use std::os::raw::c_char;
use std::alloc::{dealloc, Layout};

extern {
    #[link_name = "_Z28getLink_return_cstyle_stringv"]
    fn getLink_return_cstyle_string() -> *const c_char;
}

pub fn getLink() -> String {

    let cpp_string = unsafe { getLink_return_cstyle_string() };
    let rust_string = unsafe { CStr::from_ptr(cpp_string) }
        .to_str()
        .expect("This had better work...")
        .to_string();
    // Note that since we strdup'ed the temporary string in C++, we have to manually free it here!
    unsafe { dealloc(cpp_string as *mut u8, Layout::new::()); }
    return rust_string;
}

With these additions, the code now compiles and runs. This all looks very convoluted, but here’s how the program works now:

  1. Rust’s main() calls links::getLink().
  2. links::getLink() calls getLink_return_cstyle_string(), expecting a C-style string in return.
  3. getLink_return_cstyle_string() calls the actual getLink() function, converts the returned std::string into a const char *, and returns the const char *.
  4. Now that links::getLink() has a C-style string, it converts it into a Rust CString wrapper, which is then converted to an actual String.
  5. The String is returned to main().

There are a few things to take note of here:

  1. This process would be relatively easy to reverse so we could pass a String to a C++ function that expects an std::string or even a const char *.
  2. Rust strings are a bit more complicated because we have to convert from a C-style string to CString to String, but this is the basic process that will need to be used for any automatic string type conversions.
  3. This basic process could also be used to convert types like std::vector.

Is this ugly? Yes. Does it suffer from performance issues due to all the string conversions? Yes. But I think this is the most user-friendly way to achieve compatible strings because it allows each language to keep using its native string type without requiring any ugly decorations or wrappers in the user code. All conversions are done in the wrappers.

Implementation

Based on the concepts here, I’ve written a (non-optimal) implementation of type proxying in polyglot that supports proxying std::string objects to either Rust or D. In fact, I’ve taken it a bit further and implemented type proxying for function arguments as well. You can see an example project, along with its generated wrappers, here.

Next up

Interoperability requires lots of wrappers, and as I’ve mentioned, polyglot can’t generate wrappers for anything more complex than some basic functions, structs, classes, and enums. In the next installment of this series, we’ll explore some viable binding generation tools that exist today.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

FacebookTwitterLinkedInEmail

Categories: C++ / Rust / Technical

Tags: /
Leave a Reply

Your email address will not be published. Required fields are marked *