diff --git a/content/posts/naive_classes.md b/content/posts/naive_classes.md new file mode 100644 index 0000000..2f1daf1 --- /dev/null +++ b/content/posts/naive_classes.md @@ -0,0 +1,376 @@ ++++ +title = 'SICP takeaways' +summary = 'A look into Common Lisp, what I've learnt from SICP, and a naive struct implementation.' +date = 2025-01-24T23:53:38+03:00 +draft = false ++++ + +When I first started learning Common Lisp, I didn't know what I was getting +into. I thought it would just be a fun adventure, maybe a couple weeks of fun, +in and out. + +Alas, it was not so. I was stunned by the sheer amount of power a couple of +features - features that aren't even *that* crazy by themselves, in retrospect - +could provide to a programming language. It was unlike anything I'd ever seen. + +Experienced lispers, of course, should know exactly what I'm talking about: Homoiconicity +and ***true macros***. + +I was in love. I didn't want to admit it, however. Perhaps this is overly dramatic, but +our love was a forbidden one. I didn't want to be seen using a language no-one uses. +I knew that for the sake of having a carreer, it was best that I stick to programming +languages everyone else was using: C, C++, Java, and so on. After all, the millions +of people using these languages couldn't be wrong, could they? + +In the end, I couldn't do it. I was too weak, or perhaps Lisp's allure too strong. +I gave in, and installed SBCL, Emacs and SLIME once more. Thus, I was once again +in the vicinity of divinity. + +Then, I started working through an absolute classic: Structure and Interpretation +of Computer Programs. A book about programming, using Scheme (a dialect of Lisp) +as its main language. Not Common Lisp, but a Lisp nonetheless. +Scheme has its own goodies too, after all: a hygienic macro system, +a thinner standard library (although perhaps a little *too* thin), tail calls +being required by the standard (though most CL implementations provide it anyway)... + +In this post, I will ramble/talk about data abstraction and the greatness of macros. +I will implement a small, very simple, no-inheritence object system built out of +nothing but `cons` cells and a handful of macros. This system will definitely not +be as complete as the Common Lisp Object System. Its only purpose is to demonstrate +that such a thing is ***possible***. + +I will assume that you know a little bit about Lisp code, or - at the very least - +you are willing to try to follow along anyway. + +I will not be providing a full introduction to Lisp, but please don't be discouraged. + +## What is a cons? + +Simply put, a cons is a pair. Just a pair of two objects. The first element +is called the `car`, second element the `cdr` (the names are this way purely +for historical reasons). In C terms, a cons is effectively equivalent to: + +```c +struct cons { + OBJECT car; + OBJECT cdr; +}; +``` + +Except, with Lisp syntax. So we would make a new cons with the `cons` function, +like `(cons 1 2)` making a "cons cell" that contains 1 and 2. + +The important thing here, is that this satisfies the closure property. Meaning, +one (or both) of the elements can themselves be cons cells. + +So you could do: `(cons 1 (cons 2 (cons 3 nil)))` (nil denotes an empty list). +You may notice that this structure is suspiciously similar to a singly linked list. +The `car` of a cons is the lists first element, `cdr` gets you the rest of the list. +Indeed, this is how lists are implemented in lisp. They are singly linked lists. + +`cons` cells are deceptively simple. You can build any number of interesting structures +out of them, trees, alists, plists etc. In theory, we should be able to implement, +say, a C-style struct with this as well. + +## Implementing structures + +Think about what an object is, for a bit. An object is an instance of a class, +and a class itself is just an interface for accessing parts of that object, +and manipulating it in various ways. + +This means that, in theory, you can have *any* kind of representation "under the hood", as long +as your language provides uniform ways to access, manipulate and modify these objects. +In C, structs are just descriptions of how to extract information from a +particular array of bytes. As I said, however, as long as you're consistent +about how you store and retreive the information in a struct, you can implement +it however you want. + +Notably, since we're using common lisp, all accesses to a field of an object always +look like function calls anyway. This is useful for a lot of reasons, but in this particular +case, it's useful mainly because field accesses aren't (or don't have to be) a special +operation provided by the programming language. They ***absolutely can*** be defined +as regular functions. (except for setters, which we will define in terms of `defsetf`, +but that's not that much different, promise). + +That property is exactly what we will rely on here. We can make a macro, say, `mydefstruct`, +that takes a name for our struct and a list of its fields. Then, if this macro +defined a function to create that struct, and accessor functions (getter/setter for those of +you in the Java world) for all of its +fields, that would be a good-enough implementation of structs. Client code does +not have to care that your structures are all linked lists under the hood, +their code behaves as if these structs were just an integral part of the language. + +Then, we could implement methods by switching on the type of the first element +of a defined method, and calling the appropriate actual methods. Voila! Object +oriented programming with very little language support. More sophisticated +systems can also be built in a similar manner, e.g. read-only fields could +be achieved by having the macro *not* define certain methods based on the input. +But that's beyond the scope of this blog post. + +## First things first + +Let's first define a few helper functions for our implementation. For one, +we need an easy way to get the symbol for a struct's constructor function: + +```cl +(defun constructor-name (sym) + (intern (concatenate 'string "MAKE-" (string sym)))) +``` + +Similar things for its general accessor (which will be used for getting the +value and setting it with `setf`) and its setter (which will only be used +for implement the `setf` form with `defsetf`). + +```cl +(defun accessor-name (name sym) + (intern (concatenate 'string (string name) "-" (string sym)))) + +(defun setter-name (name sym) + (intern (concatenate 'string "SET-" (string name) "-" (string sym)))) +``` + +Now we can write functions for defining: +- the constructor, with a function that takes a name, +and a list of slots, and returns a form that will define +the constructor when evaluated: + ```cl +(defun constructor (name slots) + `(defun ,(constructor-name name) ,slots + (list ,@slots))) + ``` +- the slot accessors. This one will return a list of forms, +that will each define an accessor for one of the slots. + ```cl +(defun accessors (name slots) + (loop for slot in slots + for i upfrom 0 collect + `(defun ,(accessor-name name slot) (obj) + (nth i obj)))) + ``` +- the slot setters. Note that these won't actually +be used by the users of our library. In common lisp, we don't +really use separate functions for setters. For example, +if you can access a field through `(point-x my-point-object)`, +then you usually don't define a `set-point-x` function, but rather +use `(setf (point-x my-point-object) some-value)` to set it to `some-value`. +`setf` is another macro that actually expands this code into the appropriate +setter function. This provides a unified interface for accessing fields, +no matter what the underlying implementation is. Anyway, here's +my function for defining the setters: + ```cl +(defun setters (name slots) + (loop for slot in slots + for i upfrom 0 collect + `(defun ,(setter-name name slot) (obj val) + (setf (nth i obj) val)))) + ``` +- finally, the aforementioned `defsetf` forms: + ```cl +(defun setfers (name slots) + (loop for slot in slots collect + `(defsetf ,(accessor-name name slot) + ,(setter-name name slot)))) + ``` + +As Common Lisp is a highly interactive language, we can try each +of these functions in the REPL with very little effort: + +``` +CL-USER> (constructor 'point '(x y)) +(DEFUN MAKE-POINT (X Y) (LIST X Y)) + +CL-USER> (accessors 'point '(x y)) +((DEFUN POINT-X (OBJ) (NTH 0 OBJ)) + (DEFUN POINT-Y (OBJ) (NTH 1 OBJ))) + +CL-USER> (setters 'point '(x y)) +((DEFUN SET-POINT-X (OBJ VAL) (SETF (NTH 0 OBJ) VAL)) + (DEFUN SET-POINT-Y (OBJ VAL) (SETF (NTH 1 OBJ) VAL))) + +CL-USER> (setfers 'point '(x y)) +((DEFSETF POINT-X SET-POINT-X) + (DEFSETF POINT-Y SET-POINT-Y)) +``` + +Wow, the code generated by our functions looks good! Now +we just need a macro to tie it all together, and we will have +a pretty good first implementation for structs. + +As you can see, there isn't any trick to the macro itself, it just +takes its (unevaluated) arguments, and generates the code that +will be evaluated by calling the functions we defined earlier. +(Note the use of `,@` to splice the lists returned by `accessors`, +`setters`, and `setfers`). + +```cl +(defmacro mydefstruct (name &rest slots) + `(progn + ,(constructor name slots) + ,@ (accessors name slots) + ,@ (setters name slots) + ,@ (setfers name slots))) +``` + +As you may have noticed, perhaps the greates strength of CL macros +is that they are, themselves, written *in lisp*. Which is why +we were so easily able to approach the problem of defining structs, +as a problem of generating code - and we were able to write regular +lisp code that generates the code we want, finally putting it in a +macro to achieve our goal. + +A demonstration of how to define a struct with this, and use it: + +```cl +(mydefstruct point x y) +;; this doesn't interact with CL's actual object system, but is still cool. +(defvar origin (make-point 0 0)) +(point-x origin) +;; => 0 + +;; modifying an object +(defvar point1 (make-point 10 100)) +(setf (point-x point1) 100) ;=> 100 +(setf (point-y point1) 200) ;=> 200 +(point-x point1) ; => 100 +(point-y point1) ; => 200 +``` + +Voila. Very, very simple system to define structs, without needing +any primitive for combining objects other than a pair. + +These structs can have any number of fields, mind you. I just chose a simple +one to demonstrate. + +## Type tags + +One problem with this current implementation is that objects have no type information +at all. This means you could pass *any* struct with two elements as a `point` in the above +example. This can be useful in some cases, I'm sure. A broken clock is right twice a day +after all... but in general, I think it's safe to say that this behaviour is undesirable. + +Instead, we want our getter and setter functions to give an error when passing a value +that *is not* a struct of the expected type. This will prevent many bugs by making sure +all type conversions are explicit, and no type is implicitly cast into another unrelated +type without the programmer's knowledge. + +Of course, it would also help for a programmer to be able to inspect what type a particular +object belongs to. This is helpful because you might need to inspect such an object +at the REPL, and it might also be helpful in case you need a function to be able to +return several different types of objects, and check which one was actually returned. + +There are many ways to implement this. We will be using a very simple solution: type tagging. +Essentially, we keep an extra element in the list underlying a struct - a tag that indicates +its type. We could store this as a string, or perhaps a unique integer generated every time +a struct is defined. However, since we're using common lisp, I think its perfectly appropriate +for us to use a symbol as the tag. (don't worry, unlike string comparisons, this shouldn't +incur much of a performance penalty. symbols are always interned in CL, so this *should* +be just a pointer comparison). + +So, we just need to modify the code such that the first element of the list is the type tag. + +First, the constructor: + +```cl +(defun constructor (name slots) + `(defun ,(constructor-name name) ,slots + (list ',name ,@slots))) +``` + +No groundbreaking changes, really. We just add the name of the struct as a symbol to the front of the list. +This means that the first field of the struct now begins at index 1, however, so we need to update +our accessor and setters to match that. Since we also want our functions to perform type checking +at runtime, we should also add code for that into the generated accessor and setters. + +Since every struct created with our new constructors contains type information, I think it would be +nice to add a helper function to get the type of an object. This way if we change the implementation +later we can just change this function without having to change every piece of code that checks +for type. + +```cl +(defun obj-type (obj) + (car obj)) +``` + +Since we're adding type checks, we may as well put in a little more effort +and give the user a nice error message telling them what type was expected, +and what type was given. For that, we'll make another helper: + +```cl +(defun make-error-message (real expected) + (format nil "Accessor called on wrong type! Expected ~a but found ~a" + expected real)) +``` + +Then we can add the type checks to our existing functions, like so: + +```cl +(defun accessors (name slots) + (loop for slot in slots + for i upfrom 1 collect + `(defun ,(accessor-name name slot) (obj) + (if (eql (obj-type obj) ',name) + (nth ,i obj) + (error (make-error-message (obj-type obj) ',name)))))) +(defun setters (name slots) + (loop for slot in slots + for i upfrom 1 collect + `(defun ,(setter-name name slot) (obj val) + (if (eql (obj-type obj) ',name) + (setf (nth ,i obj) val) + (error (make-error-message (obj-type obj) ',name)))))) +``` + +We don't really need to change anything else. + +With that, our type checking struct implementation is reasonable usable. +At least for a primitive system built out of a macro and some lists, +it's actually fairly good. + +The only thing left is to wrap it in a package, and only export `mydefstruct`. + +```cl +(defpackage :my-structures + (:use :cl) + (:export #:mydefstruct)) + +(in-package :my-structures) +``` + +There we go. Now our package is very nicely encapsulated, and only the useful +stuff is exported out of our package. + +## Conclusion + +Common Lisp's macros are truly amazing. We just created an entire system for +automatically defining new abstractions over data - and it looks, behaves and +feels just like it is part of the language, rather than something we added. +(apart from being rather barebones, and not providing much in the form +of printing and reading our structures, this is actually fairly similar to +Common Lisp's standard `defstruct` in terms of what it provides. +Of course the standard `defstruct` is much better than this, but that's +besides the point). + +Side note: +Unfortunately, it doesn't really interact with the Common Lisp Object System +at all. This is to be expected, I'm just writing this to prove a point and to +demonstrate what I've learnt so far from SICP, not to replace something that +needs no replacing. + +However, even though this system is not as good as the standard tools for +data abstraction, I think it's still a great demonstration of the language's +strengths. + +The really stunning part for me, is that it was so *easy to do*. Too easy. +I actually hesitated to write about it on my blog, because it wasn't +really a challange. I created a replacement system for the language's +standard way of creating data structures, and *it was so easy to do*, I'm +*hesitating to write about it*. It's an inferior replacement, sure, +but it's still perfectly functional. + +It amazes me to no end that you can straight-up rewrite a significant +portion of the language in itself, and you can just change it however +you want to. I couldn't imagine doing anything even remotely similar +to that in, say, Java or C. + +I hope you were entertained by this attempt at reinventing the wheel. +I certainly enjoyed making it.