We have created a Proof Of Concept (POC) to transform DOS-style C programs into web services for the French National Museum of Natural History. This very first article of our new “Case studies” series explains what we did and why.
At TailorDev, we are polyglot developers and scientists: we are comfortable with numerous programming languages, and we use our Le lab sessions to play with new programming languages and alternative paradigms. Our respective backgrounds are also helpful to discuss with researchers because we understand underlying science as well as their needs.
In the following, we describe the xper-tools project, in collaboration with a research team (LIS team) hosted in the French National Museum of Natural History in Paris. The XPER tools are a set of very old programs (~30 tools), written in C, which are used for taxonomy purpose. By old, we mean almost 30 years old. You read it well, that is older than the first Standard C published by ANSI!
VOID main(argc, argv)
int argc;
char **argv;
{
VOID loadkb();
VOID freekb();
VOID help();
VOID evalxp();
char *gets();
fprintf(stderr, "EVALXP V1.02 (02/08/1987) J. LEBBE & R. VIGNES.\n");
...
}
Even though these programs work pretty well and are still used on a daily basis, there are some limitations. First, these software have been written for MS-DOS and therefore require a rather old computer to use them. This leads to two more issues: not everyone can easily use them, and it is nearly impossible to interface them with other software.
We have been asked to build a Proof Of Concept (POC) to transform these programs
into web services in a week (it was a nice to have at the end of a bigger
project we will likely present in another blog post). Challenge accepted!
![]()
Analysis
We started by analyzing the different programs and decided on a first program to
use in our POC. The source code of these numerous tools also bundle different
Makefile and some documentation. Luckily, these programs are well written even
though some parts are cryptic. All tools are designed to be run from the
command line, use the same set of data as input (knowledge bases), and some of
them have options (flags) with the following DOS-like syntax: /B. In addition,
all programs respond to the /H help option, providing interesting information
for each program:
$ bin/chkbase
CHKBASE V1.06 (22/05/1988) J. LEBBE & R. VIGNES.
Syntax: CHKBASE name-of-base [/H] [/V]
/H Help
/V Verbose mode
Nom de fichier absent
Every time we work for/with a customer, we make sure that what we produce is easily reusable afterwards. In this context, we designed the POC as the foundation of a production-ready software, which could leverage all the existing programs. Hence, we decided to focus on two main tasks:
- being able to compile and run the programs on different platforms;
- proposing a unified solution to expose the programs over HTTP.
Hello Autotools!
Instead of having to deal with many Makefile and other files to build the
different tools, why not using a common tool that would do most of the job for
us? Wouldn’t be super cool if we would only have to run make to build all the
tools at once? The Autotools
(not to be confused with the Autobots)
are the solution!
If you do not know what the Autotools are, you may already have installed software from source with the following commands:
$ ./configure
$ make
$ (sudo) make install
The first line executes a shell script to, first, determine if all requirements
are met to build the software, and second, to create a Makefile based on a
template (Makefile.in). If a mandatory dependency is missing on your system,
the script will abort, forcing you to install that dependency. That is very
useful to ensure reproducibility. The configure script has not been written by
hand, but generated by autoconf, using yet another template file named
configure.ac:
AC_INIT([xper-tools], [1.0.0], [author@example.org])
AM_INIT_AUTOMAKE # use `automake` to generate a `Makefile.in`
AC_PROG_CC # require a C compiler
AC_CONFIG_FILES([Makefile]) # create a `Makefile` from `Makefile.in`
AC_OUTPUT # output the script
The Makefile.in template is also generated thanks to automake and a
Makefile.am template. That is also why we had to use the AM_INIT_AUTOMAKE
directive in the configure.ac file above.
A Makefile.am template usually starts by defining the layout of the project,
which should be
foreign
if you are not using the standard layout of a GNU project (which is likely the
case). In the example below, we provide global compiler flags with the
AM_CFLAGS and AM_LDFLAGS directives. Next, we tell automake that the
Makefile should build the different programs using the bin_PROGRAMS
directive:
# Makefile.am
AUTOMAKE_OPTIONS = foreign
# Global flags
AM_CFLAGS = -W -Wall -ansi -pedantic
AM_LDFLAGS =
# Target binaries
bin_PROGRAMS = chkbase \
makey \
mindescr
...
The bin prefix tells automake to “install” the listed files into the
directory defined by the variable bindir, which should point to
/usr/local/bin by default (/usr/local being the “prefix” directory).
The PROGRAMS suffix is called a primary and tells automake which properties
the listed files own. For instance, PROGRAMS are compiled. Hence, we must tell
automake where to find the source files (we also add per-program compilation
flags):
# Makefile.am
...
# -- chkbase
chkbase_CFLAGS = -D LINT_ARGS
chkbase_SOURCES = xper.h det.h loadxp.c detool.c chkbase.c
By adding more similar lines to the Makefile.am, we can support all the
existing programs, leveraging a simple and uniform way to build all the tools.
Now that the configuration templates/files have been written, we can use the
Autotools to generated the ready-to-use files. Let’s start with the configure
script:
$ autoreconf --verbose --install --force
Various files have been generated, but the most important one is the configure
script, which will be useful to generate the final Makefile. You can pass
some options to this script such as --prefix to specify the prefix directory.
For instance, to install all the files into your current directory, you could
run:
$ ./configure --prefix=$(pwd)
We can run make to compile all the tools at once, and make install to
“install” the binaries into the <PREFIX>/bin folder. But we also get a
distribution solution for free by using make dist. This target builds a
tarball of the project containing all of the files we need to distribute. End
users could download this archive and run the commands below without having to
worry about the Autotools:
$ ./configure
$ make
$ (sudo) make install
After having successfully ported one tool to this new(-ish) build system, we
wrote a procedure to port the other programs and we tested it by asking someone
else to port another program. Fortunately, compiling these programs was not too
difficult as soon as we figured out which encoding was used (hello CP
850), found all the required
header files, and performed minor code changes such as adding proper exit codes
and removing a case '/': line used for parsing the (DOS-style) options because
it caused an incompatibility with UNIX absolute paths.
Naturally, we added some smoke tests to ensure the compiled binaries were behaving correctly (based on the outputs given by the old computer in the research team’s lab) and automated the building and testing phases with GitLab CI. With little effort, the different XPER tools can now be compiled and executed on any new system. The first goal is therefore satisfied and we can now present how we designed an API to expose these tools over HTTP in the next section.
RPC-style HTTP API
The different source codes are very application-oriented and not library-oriented, which prevented us to compile C libraries that we could have imported in Go or Python. Hence, we decided to “wrap” the C tools to integrate them with the API code. We chose the Python programming language as it is usually a good choice in Academia (and also because it is fast).
We wrote a generic yet smart wrapper that is able to:
- execute any C program and return its output thanks to the Python
subprocessmodule; - determine the options of any C program it wraps by invoking the program with
the help (
-H) flag (cf. the Analysis section); - validate the supplied options. Since the wrapper knows which options a program can accept, it can easily reject invalid options and prevent invalid calls;
- provide a nice and simple programmatic API:
from api.wrappers import ToolWrapper
makey = ToolWrapper('makey')
cp = makey.invoke('/path/to/data', B=True)
# cp.stdout contains the output result
Hat tip to Julien for this clever wrapper. Once we were able to call a XPER tool from Python, we started to write a HTTP API using a Python web framework such as Flask. At TailorDev, we like to write pragmatic HTTP APIs, and we always adopt a documentation-first approach. Apiary and API Blueprint are our favorite tools for that.

We drafted a HTTP API that speaks JSON and exposes two main endpoints:
-
/knowledge-basesto manage the data for the different XPER tools; -
/tools.runto call the XPER tools.
The former responds to the GET and POST methods to return a set of data (a
knowledge base) and create such knowledge bases respectively. The latter is a
Remote Procedure
Call
(RPC) endpoint, which is perfectly fine for representing what we want to
achieve: calling a function (over HTTP).
Each knowledge base is identified by a
UUID, and the bases
are persisted on the filesystem (which may evolve in the future). With both the
tools ready to be executed and the data on the server, we only had to glue them
thanks to the /tools.run endpoint, which can be triggered by the POST
method:
POST /tools.run/chkbase
Content-Type: application/json
Accept: application/json
{
"knowledge_base_id": "27d99b56-9327-4d28-a69c-31229bf971aa"
}
Nevertheless, the different programs do not output JSON content but formatted plain text. In order to reach interoperability, it was not conceivable to keep the output as is, hence the concept of parsers. Each program gets its own parser for transforming the plain text output into a Python data structure we can later serialize as we wish. Using this approach, we were able to write a lot of unit tests based on different realistic outputs, and guarantee enough flexibility into the application. We then created a configuration file for the supported tools and their associated parsers and options:
from .parsers import MindescrParser, ChkbaseParser
supported_tools = {
'mindescr': {
'parser': MindescrParser(),
'options': []
},
'chkbase': {
'parser': ChkbaseParser(),
'options': [
('verbose', 'V'),
]
}
}
The controller logic behind the /tools.run/<name> relies on this configuration
to determine which tools (and options) are allowed, but also which parser to
use. When all conditions are met, it runs the program with the knowledge base as
input thanks to the wrapper, it parses the output with the appropriate parser,
and returns the result as a JSON response.
Adding support for a new program only requires to write a parser for the output
of that program and update the configuration. As you may have noticed, the
options array contains tuples (option_name, tool_option) that map more
meaningful option names (e.g., verbose) to their corresponding tool options
(e.g., -V). That way, we can completely hide the program details behind the
API, which might also be handy in the future.
We ended this part by writing a small Node.js CLI to demonstrate how this API could be used, but also to give non-technical people a way to consume this API and understand what has been done.
Conclusion
Tackling technical challenges is usually not a problem. In this case, the most interesting yet complicated task was to strike a happy medium between a good software architecture and an easy way to upgrade all the existing XPER tools. All in all, it took us 8 days to design, implement, test and document this solution, including the CLI. We ported three programs to the new build system, and exposed two different tools on the HTTP API.
This project was awesome because we felt really proud of giving a second life to these very old C programs. It was challenging to come up with a production-ready Proof Of Concept that could be easily improved in the future, in a short amount of time.
That is the kind of things we do and like to do! ![]()
A free, once–weekly e-mail round-up of news and articles for everyone
interested in Open-, Cutting-edge Science, Academia and beyond!
You can read