Text search library in Rust, day 2


I’m rewriting my GNETextSearch C library in Rust.


The plan

Taking some inspiration from Michael Bryan’s article “How to not RiiR”, my plan is to first write a Rust wrapper around GNETextSearch. I’ll try to write an ergonomic Rust API for the library, add a bunch of tests, and then start replacing the C implementation with a Rust one.

Creating a Rust workspace

Before getting started on any code, of course, I needed to create a new Rust project. After creating a new GitHub repository at atdrendel/text-search and cloning it, I followed these instructions to create a Rust workspace. Inside of the workspace, I created two library crates (text-search and text-search-sys) to host, respectively, the Rust text search library and the old C one.

$ cargo new text-search --lib
$ cargo new text-search-sys --lib
# Cargo.toml in the root of the workspace

[workspace]

members = [
  "text-search",
  "text-search-sys",
]

Configuring text-search-sys to build GNETextSearch

My first task was to get GNETextSearch building inside of text-search-sys. I wanted to consume GNETextSearch directly, which meant adding it to the text-search-sys crate. Not being a fan of git submodules, I elected to go with git subtree.

# inside of the workspace's root folder
$ git subtree add --prefix text-search-sys/vendor/GNETextSearch [email protected]:atdrendel/GNETextSearch.git master --squash

Since GNETextSearch is a small library without any dependencies, it was pretty easy to get to build. Apparently, the preferred way to include C dependencies in Rust projects is to use the cc-rs crate to build them. I added cc-rs to text-search-sys’s list of build dependencies by modifying its Cargo.toml file:

# Cargo.toml in text-search-sys

[package]
name = "text-search-sys"
version = "0.1.0"
authors = ["Anthony Drendel"]
edition = "2018"
links = "GNETextSearch"
description = "Raw bindings to the GNETextSearch C library"
license = "BSD-2-Clause"
build = "build.rs"

[dependencies]

[build-dependencies]
cc = "1.0"

The Rust file designated by the build command (relative to the package root) will be compiled and invoked before anything else is compiled in the package, allowing your Rust code to depend on the built or generated artifacts.

The Cargo Book

So, given that, the next thing I did was to create a build.rs file in the text-search-sys folder. After a bit of trial and error, I ended up with the following implementation for build.rs.

use std::{env, path::PathBuf};

fn main() {
  let project_dir = PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap())
    .canonicalize()
    .unwrap();
  let src = project_dir
    .join("vendor")
    .join("GNETextSearch")
    .join("GNETextSearch");

  build_gne_text_search(&src);
}

fn build_gne_text_search(src: &PathBuf) {
  cc::Build::new()
    .file(src.join("Set/countedset.c"))
    .file(src.join("String/stringbuf.c"))
    .file(src.join("Tree/ternarytree.c"))
    .file(src.join("UTF-8/tokenize.c"))
    .include(&src)
    .include(src.join("Set"))
    .include(src.join("String"))
    .compile("GNETextSearch");
}

I had to specify each C file to compile and then include all of the folders containing required header files. After this, running cargo build in the root folder of the workspace succeeded.

Next, I’ll need to generate the Rust bindings for GNETextSearch’s C interface.