blog/content/posts/naive_classes.md
Emin Arslan b9cc39ee7a
Some checks failed
Build & Deploy / build-or-sth (push) Has been cancelled
Added a short post on ansible
2025-02-26 18:41:32 +03:00

15 KiB

+++ title = 'Naive Structs in CL' summary = 'A look into Common Lisp, what I have learnt from SICP, and a naive struct implementation.' date = 2025-01-24T23:53:38+03:00 draft = false +++

When I first started learning Common Lisp, I didn't know what I was getting into. I thought it would just be a fun adventure, maybe a couple weeks of fun, in and out.

Alas, it was not so. I was stunned by the sheer amount of power a couple of features - features that aren't even that crazy by themselves, in retrospect - could provide to a programming language. It was unlike anything I'd ever seen.

Experienced lispers, of course, should know exactly what I'm talking about: Homoiconicity and true macros.

I was in love. I didn't want to admit it, however. Perhaps this is overly dramatic, but our love was a forbidden one. I didn't want to be seen using a language no-one uses. I knew that for the sake of having a carreer, it was best that I stick to programming languages everyone else was using: C, C++, Java, and so on. After all, the millions of people using these languages couldn't be wrong, could they?

In the end, I couldn't do it. I was too weak, or perhaps Lisp's allure too strong. I gave in, and installed SBCL, Emacs and SLIME once more. Thus, I was once again in the vicinity of divinity.

Then, I started working through an absolute classic: Structure and Interpretation of Computer Programs. A book about programming, using Scheme (a dialect of Lisp) as its main language. Not Common Lisp, but a Lisp nonetheless. Scheme has its own goodies too, after all: a hygienic macro system, a thinner standard library (although perhaps a little too thin), tail calls being required by the standard (though most CL implementations provide it anyway)...

In this post, I will ramble/talk about data abstraction and the greatness of macros. I will implement a small, very simple, no-inheritence object system built out of nothing but cons cells and a handful of macros. This system will definitely not be as complete as the Common Lisp Object System. Its only purpose is to demonstrate that such a thing is possible.

I will assume that you know a little bit about Lisp code, or - at the very least - you are willing to try to follow along anyway.

I will not be providing a full introduction to Lisp, but please don't be discouraged.

What is a cons?

Simply put, a cons is a pair. Just a pair of two objects. The first element is called the car, second element the cdr (the names are this way purely for historical reasons). In C terms, a cons is effectively equivalent to:

struct cons {
    OBJECT car;
    OBJECT cdr;
};

Except, with Lisp syntax. So we would make a new cons with the cons function, like (cons 1 2) making a "cons cell" that contains 1 and 2.

The important thing here, is that this satisfies the closure property. Meaning, one (or both) of the elements can themselves be cons cells.

So you could do: (cons 1 (cons 2 (cons 3 nil))) (nil denotes an empty list). You may notice that this structure is suspiciously similar to a singly linked list. The car of a cons is the lists first element, cdr gets you the rest of the list. Indeed, this is how lists are implemented in lisp. They are singly linked lists.

cons cells are deceptively simple. You can build any number of interesting structures out of them, trees, alists, plists etc. In theory, we should be able to implement, say, a C-style struct with this as well.

Implementing structures

Think about what an object is, for a bit. An object is an instance of a class, and a class itself is just an interface for accessing parts of that object, and manipulating it in various ways.

This means that, in theory, you can have any kind of representation "under the hood", as long as your language provides uniform ways to access, manipulate and modify these objects. In C, structs are just descriptions of how to extract information from a particular array of bytes. As I said, however, as long as you're consistent about how you store and retreive the information in a struct, you can implement it however you want.

Notably, since we're using common lisp, all accesses to a field of an object always look like function calls anyway. This is useful for a lot of reasons, but in this particular case, it's useful mainly because field accesses aren't (or don't have to be) a special operation provided by the programming language. They absolutely can be defined as regular functions. (except for setters, which we will define in terms of defsetf, but that's not that much different, promise).

That property is exactly what we will rely on here. We can make a macro, say, mydefstruct, that takes a name for our struct and a list of its fields. Then, if this macro defined a function to create that struct, and accessor functions (getter/setter for those of you in the Java world) for all of its fields, that would be a good-enough implementation of structs. Client code does not have to care that your structures are all linked lists under the hood, their code behaves as if these structs were just an integral part of the language.

Then, we could implement methods by switching on the type of the first element of a defined method, and calling the appropriate actual methods. Voila! Object oriented programming with very little language support. More sophisticated systems can also be built in a similar manner, e.g. read-only fields could be achieved by having the macro not define certain methods based on the input. But that's beyond the scope of this blog post.

First things first

Let's first define a few helper functions for our implementation. For one, we need an easy way to get the symbol for a struct's constructor function:

(defun constructor-name (sym)
    (intern (concatenate 'string "MAKE-" (string sym))))

Similar things for its general accessor (which will be used for getting the value and setting it with setf) and its setter (which will only be used for implement the setf form with defsetf).

(defun accessor-name (name sym)
  (intern (concatenate 'string (string name) "-" (string sym))))

(defun setter-name (name sym)
  (intern (concatenate 'string "SET-" (string name) "-" (string sym))))

Now we can write functions for defining:

  • the constructor, with a function that takes a name, and a list of slots, and returns a form that will define the constructor when evaluated:
(defun constructor (name slots)
  `(defun ,(constructor-name name) ,slots
     (list ,@slots)))
  • the slot accessors. This one will return a list of forms, that will each define an accessor for one of the slots.
(defun accessors (name slots)
  (loop for slot in slots
	for i upfrom 0 collect
	`(defun ,(accessor-name name slot) (obj)
	   (nth i obj))))
  • the slot setters. Note that these won't actually be used by the users of our library. In common lisp, we don't really use separate functions for setters. For example, if you can access a field through (point-x my-point-object), then you usually don't define a set-point-x function, but rather use (setf (point-x my-point-object) some-value) to set it to some-value. setf is another macro that actually expands this code into the appropriate setter function. This provides a unified interface for accessing fields, no matter what the underlying implementation is. Anyway, here's my function for defining the setters:
(defun setters (name slots)
  (loop for slot in slots
	for i upfrom 0 collect
	`(defun ,(setter-name name slot) (obj val)
	   (setf (nth i obj) val))))
  • finally, the aforementioned defsetf forms:
(defun setfers (name slots)
  (loop for slot in slots collect
	`(defsetf ,(accessor-name name slot)
	     ,(setter-name name slot))))

As Common Lisp is a highly interactive language, we can try each of these functions in the REPL with very little effort:

CL-USER> (constructor 'point '(x y))
(DEFUN MAKE-POINT (X Y) (LIST X Y))

CL-USER> (accessors 'point '(x y))
((DEFUN POINT-X (OBJ) (NTH 0 OBJ)) 
 (DEFUN POINT-Y (OBJ) (NTH 1 OBJ)))

CL-USER> (setters 'point '(x y))
((DEFUN SET-POINT-X (OBJ VAL) (SETF (NTH 0 OBJ) VAL))
 (DEFUN SET-POINT-Y (OBJ VAL) (SETF (NTH 1 OBJ) VAL)))

CL-USER> (setfers 'point '(x y))
((DEFSETF POINT-X SET-POINT-X) 
 (DEFSETF POINT-Y SET-POINT-Y))

Wow, the code generated by our functions looks good! Now we just need a macro to tie it all together, and we will have a pretty good first implementation for structs.

As you can see, there isn't any trick to the macro itself, it just takes its (unevaluated) arguments, and generates the code that will be evaluated by calling the functions we defined earlier. (Note the use of ,@ to splice the lists returned by accessors, setters, and setfers).

(defmacro mydefstruct (name &rest slots)
  `(progn
     ,(constructor name slots)
     ,@ (accessors name slots)
     ,@ (setters name slots)
     ,@ (setfers name slots)))

As you may have noticed, perhaps the greates strength of CL macros is that they are, themselves, written in lisp. Which is why we were so easily able to approach the problem of defining structs, as a problem of generating code - and we were able to write regular lisp code that generates the code we want, finally putting it in a macro to achieve our goal.

A demonstration of how to define a struct with this, and use it:

(mydefstruct point x y)
;; this doesn't interact with CL's actual object system, but is still cool.
(defvar origin (make-point 0 0))
(point-x origin)
;; => 0

;; modifying an object
(defvar point1 (make-point 10 100))
(setf (point-x point1) 100) ;=> 100
(setf (point-y point1) 200) ;=> 200
(point-x point1) ; => 100
(point-y point1) ; => 200

Voila. Very, very simple system to define structs, without needing any primitive for combining objects other than a pair.

These structs can have any number of fields, mind you. I just chose a simple one to demonstrate.

Type tags

One problem with this current implementation is that objects have no type information at all. This means you could pass any struct with two elements as a point in the above example. This can be useful in some cases, I'm sure. A broken clock is right twice a day after all... but in general, I think it's safe to say that this behaviour is undesirable.

Instead, we want our getter and setter functions to give an error when passing a value that is not a struct of the expected type. This will prevent many bugs by making sure all type conversions are explicit, and no type is implicitly cast into another unrelated type without the programmer's knowledge.

Of course, it would also help for a programmer to be able to inspect what type a particular object belongs to. This is helpful because you might need to inspect such an object at the REPL, and it might also be helpful in case you need a function to be able to return several different types of objects, and check which one was actually returned.

There are many ways to implement this. We will be using a very simple solution: type tagging. Essentially, we keep an extra element in the list underlying a struct - a tag that indicates its type. We could store this as a string, or perhaps a unique integer generated every time a struct is defined. However, since we're using common lisp, I think its perfectly appropriate for us to use a symbol as the tag. (don't worry, unlike string comparisons, this shouldn't incur much of a performance penalty. symbols are always interned in CL, so this should be just a pointer comparison).

So, we just need to modify the code such that the first element of the list is the type tag.

First, the constructor:

(defun constructor (name slots)
  `(defun ,(constructor-name name) ,slots
     (list ',name ,@slots)))

No groundbreaking changes, really. We just add the name of the struct as a symbol to the front of the list. This means that the first field of the struct now begins at index 1, however, so we need to update our accessor and setters to match that. Since we also want our functions to perform type checking at runtime, we should also add code for that into the generated accessor and setters.

Since every struct created with our new constructors contains type information, I think it would be nice to add a helper function to get the type of an object. This way if we change the implementation later we can just change this function without having to change every piece of code that checks for type.

(defun obj-type (obj)
  (car obj))

Since we're adding type checks, we may as well put in a little more effort and give the user a nice error message telling them what type was expected, and what type was given. For that, we'll make another helper:

(defun make-error-message (real expected)
  (format nil "Accessor called on wrong type! Expected ~a but found ~a"
	  expected real))

Then we can add the type checks to our existing functions, like so:

(defun accessors (name slots)
  (loop for slot in slots
	  for i upfrom 1 collect
	  `(defun ,(accessor-name name slot) (obj)
	    (if (eql (obj-type obj) ',name)
	       (nth ,i obj)
	       (error (make-error-message (obj-type obj) ',name))))))
(defun setters (name slots)
  (loop for slot in slots
	  for i upfrom 1 collect
	  `(defun ,(setter-name name slot) (obj val)
	    (if (eql (obj-type obj) ',name)
	       (setf (nth ,i obj) val)
	       (error (make-error-message (obj-type obj) ',name))))))

We don't really need to change anything else.

With that, our type checking struct implementation is reasonable usable. At least for a primitive system built out of a macro and some lists, it's actually fairly good.

The only thing left is to wrap it in a package, and only export mydefstruct.

(defpackage :my-structures
  (:use :cl)
  (:export #:mydefstruct))

(in-package :my-structures)

There we go. Now our package is very nicely encapsulated, and only the useful stuff is exported out of our package.

Conclusion

Common Lisp's macros are truly amazing. We just created an entire system for automatically defining new abstractions over data - and it looks, behaves and feels just like it is part of the language, rather than something we added. (apart from being rather barebones, and not providing much in the form of printing and reading our structures, this is actually fairly similar to Common Lisp's standard defstruct in terms of what it provides. Of course the standard defstruct is much better than this, but that's besides the point).

Side note: Unfortunately, it doesn't really interact with the Common Lisp Object System at all. This is to be expected, I'm just writing this to prove a point and to demonstrate what I've learnt so far from SICP, not to replace something that needs no replacing.

However, even though this system is not as good as the standard tools for data abstraction, I think it's still a great demonstration of the language's strengths.

The really stunning part for me, is that it was so easy to do. Too easy. I actually hesitated to write about it on my blog, because it wasn't really a challange. I created a replacement system for the language's standard way of creating data structures, and it was so easy to do, I'm hesitating to write about it. It's an inferior replacement, sure, but it's still perfectly functional.

It amazes me to no end that you can straight-up rewrite a significant portion of the language in itself, and you can just change it however you want to.

I hope you were entertained by this attempt at reinventing the wheel. I certainly enjoyed making it.