Questioning libraries

If there’s one thing that defines modern programming, it’s the reuse of code that’s already written, usually packaged in so called “libraries”. Besides making the life easier, by reducing the amount of code needed to write and maintain, it also makes the code that uses libraries easier to understand thanks to abstracting over the details and focusing on what it actually does.

The idea of abstraction has issues of its own, that I might write about later. Here I’ll discuss the most popular mechanism for code reuse today and some of its disadvantages.

What is a library?

In this article, a library is an independent piece of code, defining named functions and other programming elements as its interface, meant to be used by other programs or libraries. Examples range from the standard C library (libc), perl’s Carp module, and the is-even package for javascript, to frameworks like sinatra and react.

To make the idea clearer, I’ll give the procedure of how all libraries are used, modulo the exact naming and the time at which its done:

The programmer has the code of the library, downloaded from the internet, copied from a cd, or perhaps he has written it himself.
He, or some program he runs, installs the library in a place the compiler or interpreter (from now on “the language”) knows to look.
In his program he tells the language he uses the library, and freely uses whatever functions, types, etc the library defines.
When compiling or interpreting, the language treats the code in the library like any other, doing what it says at appropriate points during the program.

The problem

Most criticisms of libraries revolves around the first step above, do you use code written by others, and how do you trust its author? That sort of issue however isn’t specific to libraries, but to various kinds of dependencies, which is a topic on its own.

Another issue when making libraries is the scope: what should be included and what should be split into a separate library? This question is as complex as the first, and is not limited to libraries but also programs, network services, and even parts of programs. it also has more to do with our understanding of the library than the actual technology, which makes it especially difficult to answer.

One issue I’m particularly interested in myself is the fact that library’s are strongly dependent on the language being used. This is especially obvious from the description above, where the language is responsible for finding the library, loading it when it’s necessary, and finally executing the code inside it.

The effect of this is that the library must be written in the same programming language that it’s used from. This is quite restrictive and duplicates effort, making the same kind of library for different programming languages, which are only increasing in number.

Solutions and alternatives

Libraries are pretty much ubiquitous now, it’s hard to imagine much could be changed, let alone improved, on the standard idea. It’s hard to imagine that there were no libraries, or even subroutines at one point, and it took a white lie by Doug McIlroy¹ to make programmers to start using subroutines in the 60s.

Ideas here aren’t in wide use, and I can’t guarantee they’ll actually hold up in practice, only that they’re different and are, in my opinion, worth considering.

Binary libraries

Libraries written in some compiled languages, particularly C don’t distribute the source code itself, but instead compile it to a binary form, along with “header files” that define the interface to the library.

For example, the extension language tcl distributes a library file, usually called libtcl8.6.so as well as a header file tcl.h. When a C programmer wants to use tcl, he tellls the language about tcl.h in the source file, and later tells the compiler to link libtcl when compiling.²

A thing to note that the source language for the library file is irrrelevant, it could be any language that compiles down to machine code. Only thing specific to C is the header file, and it would be possible to generate an analogous file for a different programming language, or devise a new standard that all languages can understand.

There are two main issues with this approach however. Binary interfaces are by nature much more strict than language interfaces, and breaking changes can cause much more drastic issues that in the worst cases might not even be caught as such. It’s also necessary to agree on binary formats for various value types, everything from integers and floats to strings and arrays and other complex structures. And the most important is the calling convention in use.

Another issue is that many languages aren’t compiled, at least not in the way that C is. Perl, python, tcl and javascript programs and libraries are always distributed in source form and expect an interpreter for their language to be present on the system. Although some of these languages can be compiled to machine code, with a tool like python’s pyinstaller it usually only works for full fledged programs, and it isn’t guaranteed to exist for other languages.

A possible way is to adapt the shebang mechanism, and allow it to be used within libraries. The library code would be written in the source language, safe for the first line which could look something like this:

#!/usr/bin/perl --library

This tells the dynamic linker to not treat the file as binary code, but instead run the program after the #! and perhaps use its output as the library code, among other possibilities.

Besides the uncertainty of what the interpreter should do, there are other deeper problems. In particular, many interpreted languages are “dynamically typed”, meaning that it’s not clear from the source code what types a function expects. Some support more complex types like objects that can’t be passed directly to other programming languages. Some require special runtimes to function which could cause problems with the runtimes of the host language or perhaps even the runtimes of other libraries.

Library programs

An alternative approach to reuse code is to write programs that can be executed by other programs. This is the approach most commonly used in shell scripts and certainly does its job, and has some advantages regular libraries don’t, like being language agnostic.

The code to be reused is packaged in a program, that accepts command line arguments and perhaps data on standard input and prints the result on standard output, or otherwise does what is expected of it, and finally exits with an exit code, 0 for success and non-0 for failure. This sounds abstract, but to people who use unix utilities this is second nature. For example curl is a command to fetch a resource from the internet that can be found at a url. This is how you’d get my homepage in a shellscript:³

webpage=$(curl -sL https://nslisica.neocities.org/sw)

While it’s certainly the easiest to do it in shell, it can be done in other languages too, any that can start programs and open pipes.

# In tcl
set webpage [exec curl -sL https://nslisica.neocities.org/sw]

# python
import subprocess
curlproc = subprocess.run(['curl', '-sL', 'https://nslisica.neocities.org/sw'],
    capture_output = True)
webpage = curlproc.stdout

-- In lua
local fh = io.popen('curl -sL https://nslisica.neocities.org/sw')
local webpage = fh:read('a')
fh:close()

What language is curl written in? Well, it doesn’t matter, as long as it does what you expect it to. It doesn’t even have to be this curl here but any program that takes those options, and that do what we expect them to, of course.

Two main drawbacks with this method are the performance issues associated with forking a new process, which are relatively minor especially with caching features of modern operating systems, and the difficulties associated with sandboxing applications that use programs this way. Sandboxing issues is something I might examine in a future article.

Program services

A novel idea that inspired me to write this article in the first place. Instead of running a program for each procedure call, like in the curl examples above, the program is started once, and its procedures executed on demand by writing to standard input, causing it to write results to standard output.

To be specific, in the following examples I’ll imagine that I’m using a library called libwork that has the following functions:

string spam(int n): returns a string containing n lines of Spam!.
int length(list l): returns the length of a list.

A call is a series of bencoded values starting with a string that identifies the function, and terminated by a newline that’s not part of a string⁴ The return value is the same, except the first string identifying the procedure is missing and the line may start with a ! to identify an error result. Here’s an example of using libwork interactively:

$ libwork
4:spami5e
30:Spam!
Spam!
Spam!
Spam!
Spam!

6:lengthl1:I2:am1:a4:liste
i4e
5:donno
!23:donno: no such function

Of course, libwork isn’t meant to be used interactively, it’s meant to be used by programs. Well, that’s a little bit awkward, since programming languages don’t support this new standard I just made up very well. I’ll write more here when I’ve figured something out.

I didn’t tell you to drink the Kool-Aid

No, don’t start converting all your libraries into good old unix programs, or whatever I suggested in the previous section. There are things “normal” libraries are good for. Sometimes you really want something that works only in a single language, or something is so simple you save no effort by making it everything independent.

Having said that, if this article exposed you to some new ideas, then I consider it a great success. Here are some other projects that are related to this idea:

ucspi tcp: the easiest way to write tcp client-server applications. tcpserver is a program that does all the boring parts, opening sockets, accepting connections, and runs your programs with the connection on its stdin/stdout.
Any book on unix philosophy, software tools, etc
… write more here

I can’t for the life of it find the source, But it’s said that in the 60s, programmers at Bell labs refused to split programs into subroutines because switching between them was too costly. This made the programs hard to read and maintain, so much that one day Doug mcIlroy told the programmers that subroutines are much faster now and they can start using them. They did start writing subroutines, and when they tested them found out they did in fact slow down, but by that time they were hooked and continued programming with subroutines.↩︎
This is called “linking” and can be either “static”, where the code of the library is included in the program itself, or “dynamic”, where the program just holds instructions for the “dynamic linker” to execute when the program is being started. Specifics of the mechanism are irrelevant to the topic at hand.↩︎
curl is the name of the program, and https://nslisica.neocities.org/sw is the url. Curl has a great manual page where you can look up what the two switches do, but essentially, -s disables the progress bar it shows by default, and -L tells it to follow http redirects.↩︎
Only way embeded newlines can appear in a bencoded value is if it’s a part of a string, but those won’t cause an issue since the length is known and the newline would be skipped.↩︎