jaratec

functional programming, big data and other things of interest

Follow me onGitHub

Write you a Lisp in Scala

Oct 2013

Let’s build ourselves a Lisp language. It will be a small Lisp, a toy Lisp. It may not end up used, but the most important things are the lessons learned during its construction. For Lispers building a Lisp implementation is something of a rite of passage, an excercise meant to test ones level of Lisp comprehension and programming skills. This particular implementation will be built in Scala, and it is influenced by Scheme and Clojure.

A source of inspiration for this endeavour is Write Yourself a Scheme in 48 Hours which is an implementation of Scheme in Haskell. You can compare and contrast. There is also an implementation of a Lisp in Clojure (video to be published soon).

Last but not least, the motivation to learn and practice parsing techniques comes from a blog by Steve Yegge: Rich Programmer Food. I urge you to go read his post, if you haven’t already done so. Steve shows why knowing parsing techniques is vital in some (more than you might think of) of scenarios.

In the software development lore there is comment on the complexity of software, the Greenspun’s tenth rule which goes like this: Any sufficiently complicated C or Fortran (or Java) program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. Lispers have long time enjoyed programming power in the form of functional programming, macros, continuations, etc. Programmers in under-powered languages faced with mounting complexity were often unknowingly implementing features already available in Lisp dialects. Greenspun’s rule is a scathing remark on the inadequacy of technical choices. Scala is not an under-powered language. We will prove it by showing how “easy” (for some definition of easy) it is to implement a Lisp dialect on top of Scala. We’ll call this dialect SPL (Scala Powered Lisp). Not to be confounded with the action movie Sha Po Lang.

I’ll start with the conclusion: If you want to understand how a technology functions, then the best way to achieve that, is to implement/write/code yourself said technology. So, if you want to understand how a language works and in particular a functional language like Lisp, then the best way to go about it, is to write a Lisp yourself, hence the title of this post.

The plan:

  • DSLs and mini-languages
  • Defining datatypes
  • Parsing code with combinator parsers
  • Defining functions
  • Evaluating expressions
  • Implementing Higher Order Functions, List comprehensions
  • Writing a library of Lisp functions

DSLs and mini-languages

We’ll implement this dialect as an external DSL. Examples of external DSLs are: HTML, SQL, Ant. When used judiciously a DSL offers advantages like expressivity, higher level of abstraction, simplified development.

We can divide the DSLs on complexity (ad-hoc classification): level 1 - text needs to be translated from text to some internal structure: various specialized formats (example: SWIFT - financial messages) level 2 - text needs to be translated (as in level 1) and evaluated on some narrow domain: query languages, rule engines (most domain specific languages are in this category) level 3 - text needs to be translated (as in level 1) and evaluated (as in level 2), but specialized language features are required (higher order functions, list comprehension, pattern matching, constraints, aop). The more mini-languages aquire features, the less narrow becomes their domain. So, they become more general, thus less domain specific.

Our Lisp implementation is obviously on level 3, given the fact that it is a general programming language. So, we’ll need to transform the input text into a tree structure (the abstract syntax tree) and then evaluate it. For the transformation step, we’ll use a specialized library: combinator parsers. There is an implementation in the Scala standard library, the scala.util.parsing.combinator package (and sub-packages). It should be noted that there are alternatives to this approach: ANTLR, XText, bison, yacc. However, before looking at combinator parser, we’ll look at the data structure to use for parsing result.

Datatypes

We will define a trait LispExpr at the top of our hierarchy. This trait represents any Lisp expression in our language. Directly deriving from the LispExpr trait are LispNil (to represent errors), LispAtoms (representing primitives), collections, functions (and closures), function calls, let expresssions, list comprehensions and definitions. Atoms are booleans, characters, strings, numbers (further divided into integers and floats), keywords, identifiers. Collections are lists, sets and dictionaries (maps).

Here are the root of the hierarchy and the basic types + collections (not all methods are shown):

trait LispExpr {
  def isNil = LispBool(false)
  def isAtom = LispBool(false)
  def isNumber = LispBool(false)
}

case class LispNil extends LispExpr {
  override def isNil = LispBool(true)
  override def toString(): String = "nil"
}

trait LispAtom extends LispExpr {
  override def isAtom = LispBool(true)
}

case class LispBool(val b: Boolean) extends LispAtom {
  // method definitions
}

abstract trait LispNumber extends LispAtom with Ordered[LispNumber] {
  override def isNumber = LispBool(true)
  def compare(n: LispNumber): Int
  // method definitions
}

case class LispInt(val i: BigInt) extends LispNumber {
  // method definitions
}

case class LispDecimal(val d: BigDecimal) extends LispNumber {
  // method definitions
}

case class LispChar(val c: Char) extends LispAtom with Ordered[LispChar] {
  // method definitions
}

case class LispString(val s: String) extends LispAtom with Ordered[LispString] {
  // method definitions
}

case class LispKeyword(val key: String) extends LispAtom {
  // method definitions
}

case class LispId(val id: String) extends LispAtom {
  // method definitions
}

case class LispEntry(val key: LispKeyword, val value: LispExpr) extends LispExpr {
  // method definitions
}

trait LispSeq extends LispExpr {
}

case class LispList(val elements: List[LispExpr]) extends LispSeq {
  override def toString(): String = "(" + elements.mkString(" ") + ")"
}

case class LispSet(val elements: Set[LispExpr]) extends LispSeq {
  override def toString(): String = "@(" + elements.mkString(" ") + ")"
}

case class LispDict(val entries: Map[LispKeyword,LispExpr]) extends LispSeq {
  override def toString(): String = "{" + entries.map{_.toString}.mkString(" ") + "}"
}

The primitive types implement some methods (or, and for booleans), basic arithmetic for the numbers, concatenation for strings. All types override the toString method in order to have a correct string representation in the repl. The types look pretty much the same except for the argument passed to the constructor. With the foundation in place, we will built some more sophisticated types. Thses types are function calls (LispCall), functions (LispLambda), definitions, which are named Lisp expressions (LispDef), bindings (LispBinding), let expressions (LispLet), list comprehensions (LispFor)

case class LispCall(val funexp:LispExpr, actuals: List[LispExpr]) extends LispExpr {
  override def toString(): String = "(" + funexp + " " + actuals.mkString(" ") + ")"
}

case class LispLambda(val formals: List[LispId], val body: LispExpr) extends LispExpr {
  override def toString(): String = "(fn [" + formals.mkString(" ") + "] " + body + ")"
}

case class LispDef(val id: LispId, val e: LispExpr) extends LispExpr {
  override def toString(): String = id + " : " + e
}

case class LispBinding(val id: LispId, val e: LispExpr) extends LispExpr {
  override def toString(): String = id + " <- " + e
}

case class LispLet(val bindings: List[LispBinding], val body: LispCall) extends LispExpr {
  override def toString(): String = "(let [" + bindings.mkString(" ") + "] " + body + ")"
}

case class LispFor(val bindings: List[LispBinding], val condition: Option[LispCall], val ret: LispExpr) extends LispExpr {
  override def toString(): String = "(for [" + bindings.mkString(" ") + " :if " + condition + " :yield " + ret + "])"
}

All these types are impelemented as case classes, giving us automatic constructors, equality methods, pattern matching. With the types defined we can turn back to parsing. We’ll see how to transform input text into these types.

Combinator Parsers

Parsing is the technique used to take a string (or a file, or files) and transform it into an internal data structure. What we do with this data structure, we’ll see later. First, let’s look at a particular parsing technique that employs combinators. Definition of combinator (from Wikipedia): A combinator is a higher-order function that uses only function application and earlier defined combinators to define a result from its arguments.

In the Scala standard library, there is an implementation for combinator parsers. We will extend the trait JavaTokenParsers. It provides some basic machinery for writing a parser. The purpose of this trait is to build a parser from some input. It takes care of handling whitespace, by skipping it. It has several methods that return a Parser parametrized with some type T, ParserT. It is up to the programmer to specify the desired type T. Each parser written as a string, like “something”, returns the parsed string itself. The trait has some basic parsing methods for matching numbers (decimalNumber, wholeNumber, floatingPointNumber), for matching literals (literal), for matching regular expressions (regex). And it has methods that allow combining parsers. Sequential composition is performed by the ~ operator, for example we want to express the fact that the beginning of the input will be matched by one parser and the rest of the input will be matched by another parser: parser1 ~ parser2. Alternative composition is expressed with the | operator, which means the input shall be matched by one parser or failing that by an alternative parser: parser1 | parser2. There are other convinient parsers, like rep, which means repatedly match a parser: rep(parser1). For an optional part of an expression, we will use the parser opt, which returns an Option if it succeeded or a None if it failed.

How to run the parser? We will have a class extending the trait JavaTokenParsers, and we will write our parsers as methods of this class (LispParser)

import scala.util.parsing.combinator._

class LispParser extends JavaTokenParsers {
  // ... some methods (parsers) here

  def parse(expr: String): List[LispExpr] = {
    parseAll(lisp, expr) match {
      case Failure(msg, next) => println("Could not parse - " + msg); List()
      case Error(msg, next) => println("Could not parse - " + msg); List()
      case Success(result, next) => result
    }
  }

}

So we run the parser simply by calling the provided method parse. The input is the string we want to parse, the result will be a LispExpr wrapped in a List, since we may have several lisp expression in the input string. The type LispExpr (for Lisp expression) is our tree data type. The result of parsing an input string will be a LispExpr. Notice that in case of an illegal expression in the input string, the result will be an empty string because parsing has failed. We’ll see the lisp parser (first argument of method parseAll) later.

Let’s write a parser that matches identifiers in our language. We will use a regular expression for that.

  def id: Parser[LispId] = """[\p{Alnum}[<=>_!?+*/$#%&|.\-\^]]+""".r ^^ {name => LispId(name)}

Any regular expression can be used as a parser, because of the implicit method regex. Its return is the matched string (wrapped in a Parser). In our case we want a LispId, not just the matched string. For this we use the operator ^^. The ^^ operator transforms the result of a parser. It takes a function as an argument. This function performs the transformation. The return type of the parser is thus transformed. We pass the matched string to the constructor of LispId and we obtain the data structure that we want.

Let’s continue with defining atoms in our language, atoms being identifiers, strings, numbers, booleans, characters and keywords.

  def bool: Parser[LispBool] = ("true" ^^ {_ => LispBool(true)}
                                | "false" ^^ {_ => LispBool(false)}
                              )

  def char: Parser[LispChar] = "'" ~> """\p{Alnum}""".r <~ "'" ^^ {s => LispChar(s.head)}

  def string: Parser[LispString] = stringLiteral ^^ {s => LispString(s.init.tail)}

  def keyword: Parser[LispKeyword] = ":" ~> """\p{Alnum}+""".r ^^ {key => LispKeyword(key)}

  def integer: Parser[LispInt] = wholeNumber ^^ {n => LispInt(BigInt(n))}

  def decimal: Parser[LispDecimal] = floatingPointNumber ^^ {d => LispDecimal(BigDecimal(d))}

  def number: Parser[LispNumber] = decimal | integer

  def atom: Parser[LispAtom] = number | bool | char | string | keyword | id

Note the use of the parsers ~> and <~ . They behave like their counterpart ~ except that they discard the matched part on the left (~>) or on the right (<~).

Particular care should be observed, since combinator parsers are greedy and there is the risk to match to much (some combinators can match allmost everything), therefore the order of calling is important. We can formulate a principle: call narrow-matching combinators before wide-matching combinators. That is the reason to try to parse a number before an identifier (in the definition of the atom parser). The number parser is more specific than the id parser, so it should come before. Otherwise our language would not have numbers. The id parser would match any number before giving a chance to the number parser.

One more note, for the bool parser. The two possible ways to parse a boolean are put on two lines, but in a single expression with paranthesis.

Let’s look at more complex expressions. Here are the collections

  def entry: Parser[LispEntry] = keyword ~ expr ^^ {
    case key ~ value => LispEntry(key, value)
  }

  def list: Parser[LispList] = "'(" ~> rep(expr) <~ ")" ^^ {LispList(_)}

  def set: Parser[LispSet] = "@(" ~> rep(expr) <~ ")" ^^ {elems => LispSet(Set() ++ elems)}

  def dict: Parser[LispDict] = "{" ~> rep(entry) <~ "}" ^^ {elems => LispDict(Map() ++ (for (LispEntry(k,v) <- elems) yield (k,v)))}

  // all collections
  def seq: Parser[LispSeq] = list | set | dict

we can thus define a lisp like this: ‘(1 2 3). And we can define a dictionary(hashmap) like this: {:a 1 :b 2 :c 3}. We haven’t defined a parser for a generic expression, so we’ll do that now.

  def expr: Parser[LispExpr] = atom | seq | lambda | let | gen | funcall

So, we see that an expression can be an atom, a sequence (a collection), a function (lambda), a let expression (let), a list comprehension (for) or a function call (funcall). We’ll define parsers for those.

  def arguments: Parser[List[LispId]] = "[" ~> rep(id) <~ "]"

  def lambda = "(" ~> "fn" ~> arguments ~ expr <~ ")" ^^ {
    case args ~ e => LispLambda(args, e)
  }

  def binding: Parser[LispBinding] = id ~ opt("<-") ~ expr ^^ {
    case name ~ _ ~ e => LispBinding(name, e)
  }

  def let: Parser[LispLet] = "(" ~> "let" ~> "[" ~ rep(binding) ~ "]" ~ funcall <~ ")" ^^ {
    case "[" ~ bindings ~ "]" ~ call => LispLet(bindings, call)
  }

  def gen: Parser[LispFor] = "(" ~> "for" ~> "[" ~ rep(binding) ~ opt(":if" ~> funcall) ~ ":yield" ~ expr ~ "]" <~ ")" ^^ {
    case "[" ~ bindings ~ b ~ ":yield" ~ e ~ "]" => LispFor(bindings, b, e) // b is an Option
  } 

  def funcall: Parser[LispCall] = "(" ~> expr ~ rep(expr) <~ ")" ^^ {
    case fun ~ args => LispCall(fun, args)
  }

A function is written like this: (fn [x y] (+ x y)). We use a reserved word “fn” followed by the function arguments in brackets. The arguments are Lisp identifiers. The function end with the body which can be any expression. Note that the function doesn’t have a name. To name a function we’ll use the def construct. A function call is written like a list (without the quote): (+ 1 2). As for every Lisp, the function call has the function name in the first position (prefix notation) followed by the arguments passed to the function. The let expression is written like this: (let [x 1] (inc x)). We use the reserved word “let”. The bindings for a let expressions are written in brackets. We can have several bindings. A binding is a pair name expression. The let expression is concluded with its body, which is a function call. Bindings are also used in list comprehensions: (for [x <- ‘(1 2 3) :yield (inc x)]). We use the reserved word “for”. Note that for bindings we can use an optional arrow. In list comprehensions it reads better with arrows. We use a reserved keyword “:yield” to define the computation performed on the bound variables. There is also an optional element “:if” used when there is a need to filter the values of the list comprehension.

We round up the parser definitions with well, the definition and the lisp parsers

  def definition: Parser[LispDef] = "(" ~> "def" ~> id ~ expr <~ ")" ^^ {
    case name ~ e => LispDef(name,e)
  }

  def lisp: Parser[List[LispExpr]] = rep(definition | expr)

The defintion is used to name any kind of Lisp expression. We can write a defintion like this: (def primes ‘(2 3 5 7 11)). The lisp parser provides a mean to parse definitions and expressions. With these, our parser is complete.

Evaluation

We have seen how to parse an input string, we have seen how this string is transformed into a datatype (the LispExpr class hierarchy). Now we can move on to doing something usefull with all these lisp expressions. We will evaluate them.

We create a new class, Interpreter.

First, we’ll need a way to store variables, that is values, functions, all kind of lisp expressions. We will use a mutable hashmap. We define a type Env, and a variable of type Env. Initially this hashmap will be empty, but we’ll add variables into this hashmap, variables to be used later. We add some utitlity methods in order to access this global environment.

  type Env = Map[LispId, LispExpr]
  val GLOBAL_ENV: Env = HashMap.empty[LispId, LispExpr]

  def bindToEnv(name: LispId, expr: LispExpr): Unit = {
    GLOBAL_ENV.put(name, expr)
  }

  def isDefined(env: Env, expr: LispExpr) = expr match {
    case id @ LispId(_) => env.contains(id)
    case _ => false
  }

  def lookup(env: Env, s: LispId) = env.getOrElse(s, LispNil())

bindToEnv is used to well, bind a value to the environment. Then, we have a function isDefined to query the existence of a name bound in the given environment and a function lookup to get a value stored in the environment.

And finally, we can move to the evaluation part.

  private case class LispClosure(val env: Env, val lambda: LispLambda) extends LispExpr {
  }

  def eval(expr: LispExpr): LispExpr = eval(GLOBAL_ENV, expr)

  def eval(env: Env, expr: LispExpr): LispExpr = {
    expr match {
      case s @ LispId(name) => if (primitiveFunctions.contains(name)) s else lookup(env,s)
      // other cases
      case LispDef(name,expr) => {
        val e = eval(env,expr)
        e match {
          case LispLambda(_,_) => bindToEnv(name,expr)
          case LispClosure(_,_) => bindToEnv(name,expr)
          case _ => bindToEnv(name,e)
        }
        name
      }
      case _ => expr // catch all clause (atoms, nil evaluate to themselves)
    }
  }

We define the eval funtion that takes an environment and a lisp expression as arguments. The lisp expression is evaluated in the given environment. The environment can be the global one we have just created or a local environment. We’ll see shortly how to build local environments. The eval function is a recursive one, because as you may recall the structure that we have been building in the parsing phase is a tree. We’ll simply walk the tree with this recursive function. We pattern match on the passed expression. We’ll look at three simple cases first. If the expression we are examining is a simple one, an atom, then we just return the expression. A value evalutes to itself. This case is covered by the catch all clause. Next, if the expression is a Lisp definition, then we create a binding in the global environment by adding the name, expression pair to the environment. We evaluate the expression to get the value we want the name to bind to, except when we have a function (LispLambda). In that case we bind the original function (unevaluated) to the name. The reason for that is the fact that a function should be evaluated in a local environment (in a closure) not necessarily in the global environment. Let’s take a look at LispClosure. The closure is built with an environment and a function (LispLambda). A closure is never created in the parsing process. Parsing has no knowledge of an evaluation environment. We’ll come back to this with more explanations. The other case we’re looking at, is the case where the expression is a Lisp identifier. The identifier could be in the environment and we’ll do a simple lookup to get the bound value. Or it is the name of a primitive function. I realize I haven’t introduced primitive functions, so let me do it now. What I call in an ad-hoc manner a primitive function, is a function in the target language (Lisp in our case) implemented in the host language (Scala in our case). So if the expression is the name of a “primitive function” then we return the name because it will get evaluated elsewhere in the eval function.

So we’ll enumerate all these primitive functions (in a list) and provide an implementation for each of them (in the method applyFunction).

  val primitiveFunctions = List("head", "tail", "nil?", "atom?", "number?", "=", "<" , "<=", ">", ">=", "and", "or", "if", "cons", "+", "-", "*", "/", "mod", "rem", "size", "empty?", "list?", "map?", "put", "remove", "get", "keys", "union", "diff", "intersect", "list", "apply", "type")

  def applyFunction(env: Env, fname: String, args: List[LispExpr]): LispExpr = {
    fname match {
      case "head" => if (args.size != 1) LispNil() else eval(env,args.head) match {
        case LispList(elements) => if (elements.size == 0) LispNil() else elements.head
        case LispSet(elements) => if (elements.size == 0) LispNil() else elements.head
        case _ => LispNil()
      }
      case "tail" => if (args.size != 1) LispNil() else eval(env,args.head) match {
        case LispList(elements) => if (elements.size == 0) LispNil() else LispList(elements.tail)
        case LispSet(elements) => if (elements.size == 0) LispNil() else LispSet(elements.tail)
        case _ => LispNil()
      }
      case "=" => if (args.size < 2) LispNil() else {
        val head = eval(env, args.head)
        LispBool(args.tail.map{e => head.equals(eval(env,e))}.foldLeft(true){_ == _})
      }
      case "and" => if (args.size < 2) LispNil() else args.map{eval(env,_)} match {
        case bools: List[LispBool] => bools.foldLeft(LispBool(true)){_ && _}
        case _ => LispNil() // not a list of boolean expressions
      }
      case "or" => if (args.size < 2) LispNil() else args.map{eval(env,_)} match {
        case bools: List[LispBool] => bools.foldLeft(LispBool(false)){_ || _}
        case _ => LispNil() // not a list of boolean expressions
      }
      case "if" => if (args.size < 2 || args.size > 3) LispNil() else eval(env,args.head) match {
        case LispBool(true) => eval(env,args.tail.head)
        case _ => if (args.size == 3) eval(env,args.last) else LispNil()
      }
      case "cons" => if (args.size != 2) LispNil() else eval(env,args.last) match {
        case LispList(elements) => LispList(eval(env,args.head) :: elements)
        case _ => LispNil()
      }
      case "size" => if (args.size != 1) LispNil() else eval(env,args.head) match {
        case LispList(elements) => LispInt(elements.size)
        case LispSet(elements) => LispInt(elements.size)
        case LispDict(entries) => LispInt(entries.size)
        case _ => LispNil()
      }
      case "empty?" => if (args.size != 1) LispNil() else eval(env,args.head) match {
        case LispList(elements) => LispBool(elements.isEmpty)
        case LispSet(elements) => LispBool(elements.isEmpty)
        case LispDict(entries) => LispBool(entries.isEmpty)
        case _ => LispNil()
      }
      case "list?" => if (args.size != 1) LispNil() else eval(env,args.head) match {
        case LispList(_) => LispBool(true)
        case _ => LispBool(false)
      }
      case "list" => LispList(args.map{eval(env,_)})
      case "apply" => if (args.size != 2) LispNil() else eval(env,args.last) match {
        case LispList(elements) => eval(env,LispCall(args.head,elements))
        case _ => LispNil() // not a list
      }
      // other cases not shown here
      case _ => LispNil() // unknown/unimplemented function
    }
  }

Implementing “primitive functions” can be done in different manners. I’ll show a straightforward, “procedural” one at the expense of a more elegant one. The method applyFunction takes as arguments an environment, the function name and a list of LispExpr that are arguments to the function. We pattern match on the function name. For every function in our target language (Lisp) we verify the arity and then we apply the function in Scala with the evaluated arguments. In case of error we return LispNil, meaning a programming erorr. The environment is passed as argument, because the arguments are to be evaluated in the locally scoped environement. So, we have some functions to verify the type of expressions: nil?, number?, atom?, list?, map?. We have functions to implement arithmetic operations: +, -, *, /, mod, rem. Functions for string concatenation: +. Boolean functions: and, or. Comparison functions: =, >, <, >=, <=. Functions for lists: head, tail, cons, list. Functions for sets: union, diff, intersect. Functions for hashmaps: get, keys, put, remove. Function “size” for all sequences. A utility function to help debugging: type. And other functions so important, that we can’t do without: if, apply. Note that arguments are evaluated only when needed (see implementation of “if”). Evaluating a primitive function means identifying which function is called and then letting the Scala code in the corresponding mathing case work on the passed expressions as arguments. So, for example the tail function in the Lisp language is implemented as a Scala call to the tail method of the underlying Scala collection. The return of this type of evaluation is always a lisp expression. The evaluation of the expression (tail ‘(1 2 3)) returns another lisp list ‘(2 3). Noteworthy, is the implementation of “apply”. The lisp expressions are rewritten and subsequently evaluated. Evaluating (apply + ‘(1 2)), would get rewritten to a structure corresponding to expression (+ 1 2) and then that expression is evaluated in turn, returning the number 3.

With the explanation of primitive functions behind us, we can go back to our eval function and see how to put the primitive functions to use.

  def eval(env: Env, expr: LispExpr): LispExpr = {
    expr match {
      case s @ LispId(name) => if (primitiveFunctions.contains(name)) s else lookup(env,s)
      case lambda @ LispLambda(_,_) => LispClosure(env, lambda)
      case LispCall(funexp, actuals) => {
        funexp match {
          case LispId(name) if (primitiveFunctions.contains(name)) => applyFunction(env, name, actuals)
          // ... 2 more cases for LispCall to consider
          }
        }
      }
      // other cases not shown here
    }
  }

So, we have seen that in the case of LispId, when the name is primitive function we return the lisp identifier. Let’s take a look at evaluating lisp calls. A lisp call can be the use of a primitive function like (cons 1 ‘(2 3)). In this case the funexp of the lisp call is the “cons” function. Since cons is a primitive function, the primitiveFunctions list contains it, so simply call the applyFunction method with the appropriate arguments. For LispCall we have two more cases to consider. There could be a function name, but not a primitive one, but a function defined in terms of other primitive functions and bound to a variable in an environment, for example (range 0 5). This function would be implemented using primitive functions and stored in an environment under the name “range”. When evaluating the lisp call, range has to be evaluated, looked up in the environment and the returned function applied in the lisp call. We’ll see that in a moment. The other case is where a lisp call has a lambda expression as a funexp. An example: ((fn [x y] (+ x y)) 1 2). In this case the actuals are the numbers 1 and 2. The funexp is a lambda, so the actual arguments need to be bound first to the formal arguments of the function in a temporary (or locally) environment and then evaluate the body of the lambda using this new environment. This is the reason we need to embed the lambda expression into a closure. So let’s see how to evaluate these two cases.

  def eval(env: Env, expr: LispExpr): LispExpr = {
    expr match {
      case s @ LispId(name) => if (primitiveFunctions.contains(name)) s else lookup(env,s)
      case lambda @ LispLambda(_,_) => LispClosure(env, lambda)
      case LispCall(funexp, actuals) => {
        funexp match {
          case LispId(name) if (primitiveFunctions.contains(name)) => applyFunction(env, name, actuals)
          case _ => {
            // if it is a defined function stored in the environment, evaluate to lookup the underlying expression bound to the id,
            // then evaluate the bound expression
            // else it is a lambda and the evaluation returns a closure, ex: ((fn [x] (...)) arg)
            val e = if (isDefined(env, funexp)) eval(env, eval(env, funexp)) else eval(env, funexp)
            e match {
              case LispClosure(closure_env, LispLambda(formals, body)) => {
                val args = actuals.map{eval(env,_)}
                // arity check -> number of actual arguments should match number of formal arguments
                if (args.size != formals.size) LispNil() else {
                  val new_env = closure_env ++ formals.zip(args).toMap
                  eval(new_env,body)
                }
              }
              // case where the expression e is not a closure, but a primitive function 
              // the case of higher order functions:
              // a primitive function is bound to a variable in the environment and variable is passed around as an argument
              case LispId(name) if (primitiveFunctions.contains(name)) => applyFunction(env, name, actuals)
              case _ => LispNil() // error: neither a closure, nor a primitive function
            }
          }
        }
      }
      // other cases not shown
    }
  }

So, we look in the funexp of the LispCall to see if it is an expression in the environement. If it is, then we evaluate the funexp (which means lookup in the LispId case) and then we evaluate the expression returned from the lookup. If it is not in the environment, the only option left is a lambda, which we evaluate and get back a closure. The second case should return a closure, the first case could return either a closure or a primitive function name. So if we have a primitive function, then we simply call the applyFunction. If we have a closure, then we have a little more work to do. We evaluate actual arguments, we check the arity of actual arguments and the formal arguments of the lambda (which should be the same, the arity that is), then we create a new environment by adding the pairs of formals (names) and actuals (values). Once we finished constructing the new (local) environment, we evaluate the body of the lambda. This is probably the most difficult part.

Next, we’ll look at let expressions.

  def eval(env: Env, expr: LispExpr): LispExpr = {
    expr match {
      case LispLet(bindings, body) => {
        val new_env = bindings.foldLeft(env){(acc_env,bind) => acc_env + (bind.id -> eval(acc_env,bind.e))}
        eval(new_env,body)
      }
      // other cases not shown
    }
  }

They are pretty simple. We evaluate the bindings, and add the pairs to a new environment. Note that each pair is creating a new environment, where the binding is available to the subsequent bindings. Once we get the final environment, we simply evaluate the body of the let expression.

Lastly we look at list comprehensions.

  def eval(env: Env, expr: LispExpr): LispExpr = {
    expr match {
      case LispFor(bindings, condition, ret) => {
        def gen_envs(new_env: Env, bs: List[LispBinding]): List[Env] = {
          if (bs.isEmpty) List(new_env) // finished recursing, return environment
          else {
            val elements = eval(new_env, bs.head.e) match {
              case LispList(es) => es
              case _ => Nil // empty list, if expression evaluates to anything else than LispList
            }
            if (elements.isEmpty) Nil // return empty list
            else elements.flatMap{e:LispExpr => 
              val var_binding = LispBinding(bs.head.id, e)
              gen_envs(new_env + (var_binding.id -> var_binding.e), bs.tail)
            }
          }
        }
        val values = gen_envs(env, bindings).filter{new_env =>
          condition match {
            case Some(c) => eval(new_env, c) match {
              case LispBool(b) => b
              case _ => false
            }
            case None => true
          }
        }.map{new_env =>
          eval(new_env, ret)
        }
        LispList(values)
      }
      // other cases not shown
    }
  }

Let’s break the steps down into simpler steps. “Simpler”, relatively speaking. We will use the bindings to create environments, each environment containing a different combination of bound values. For example, if we have two bindings x <- ‘(1 2) and y <- ‘(3 4), we’ll get four environments, one for each combination of x and y. The construction of these environments is done in the inner function gen_envs. It is a recursive function. We process each binding, and accumulate the possible combination in a new environment. Since, we don’t know beforehand how many bindings we have, we will have to call the recursive function passing a new environment for each value in the list. And since we get a result as a list of lists we have to flatten the result. Now, if you think about it, in Scala a for expression is implemented as a flatMap composed with a map. What we are doing here, is nesting flatMap calls for every binding. Ok, that was the first step, the more difficult one. Next, we have a list of environments, we maybe filter out undesireable combinations by evaluating the optional condition expression, then evaluate the returning expression on each remaining environment and get a list of values, which we wrap in a LispList. This second step is similar to the map (yield) part of a Scala for expression.

Now we are done with the eval function. And we are also done with the Scala code. Our little lisp language has wings now and can take flight.

Functions purely written in Lisp

Next we can write a number of handy functions directly in lisp. We’ll create a library, which we’ll load automatically into the repl once we start the application.

(def not
  (fn [x]
    (if x false true)))

(def id
  (fn [x] x))

(def inc
  (fn [x] (+ x 1)))

(def dec
  (fn [x] (- x 1)))

(def neg
  (fn [x] (* x -1)))

That was easy. Let’s write some more complex functions.

(def flip
  (fn [f]
    (fn [arg1 arg2] (f arg2 arg1))))

(def curry
  (fn [f x]
    (fn [y]
      (apply f '(x y)))))

(def compose
  (fn [f g]
    (fn [x]
      (f (g x)))))

These three functions are somewhat complicated because they return a function. flip returns takes a function of two arguments and inverses the order of the arguments. curry takes a function and its first argument and creates another function taking one argument, the second argument expected by the wrapped function. This version fo curry is not general, it works only on two argument functions. Next is compose which creates a composed function out of two functions. There is a restriction, the functions are only one argument functions. We’ll use curry right away to write some comparison functions.

(def zero?
  (curry = 0))

(def positive?
  (curry < 0))

(def negative?
  (curry > 0))

(def odd?
  (fn [x]
    (= (rem x 2) 1)))

(def even?
  (fn [x]
    (= (rem x 2) 0)))

Now, we’re going to implement some very useful higher order functions: map, filter, fold.

(def map
  (fn [f xs]
    (if (empty? xs) '()
      (cons (f (head xs)) (map f (tail xs))))))

(def filter
  (fn [p xs]
    (if (empty? xs) '()
      (let [x (head xs)
            rest (filter p (tail xs))]
        (if (p x) (cons x rest)
          rest)))))

(def foldl
  (fn [f acc xs]
    (if (empty? xs) acc
      (foldl f (f acc (head xs)) (tail xs)))))

(def foldr
  (fn [f z xs]
    (if (empty? xs) z
      (f (head xs) (foldr f z (tail xs))))))

We could have implemented map and filter by using fold, but it may be more easier to understand this way for the functional programming novice. Note foldl where we accumulate the result of applying the function f to each element going left to right. When we’re done traversing the list, we simply return the accumulated result. For foldr it is somewhat different, we traverse the list first and the computations are left waiting for the evaluation of its arguments, mainly the rest of the list. Once we reach the end of the list we return the initial element z and thus the right hand arguments of the function f get evaluated and f can finally be computed and this time the computation goes in reverse from right to the left of the list.

Let’s put to use our new fold functions.

(def sum
  (fn [xs]
    (foldl + 0 xs)))

(def product
  (fn [xs]
    (foldl * 1 xs)))

(def min
  (fn [xs]
    (foldl (fn [a x] (if (> a x) x a)) (head xs) (tail xs))))

(def max
  (fn [xs]
    (foldl (fn [a x] (if (< a x) x a)) (head xs) (tail xs))))

(def reverse
  (fn [xs]
    (foldl (flip cons) '() xs)))

Note in the implementation of the reverse function, that we are wrapping the cons function with a flip. We do that because in foldl we accumulate the resulting list going left to right, which makes the first argument the (accumulated) list and the second argument the next element in the original list. cons takes as arguments an element and a list, therfore we need to flip the order of the arguments.

We have defined head and tail as primitive functions. Let’s define the counterparts last and init as pure lisp functions.

(def last
  (compose head reverse))

(def init
  (compose reverse (compose tail reverse)))

(def concat
  (fn [xs ys]
    (if (empty? xs) ys
      (concat (init xs) (cons (last xs) ys)))))

(def range
  (fn [m n]
    (if (<= n m) '()
      (cons m (range (inc m) n)))))

(def partition
  (fn [p xs]
    (list (filter p xs) (filter (compose not p) xs))))

(def qsort
  (fn [xs]
    (if (empty? xs) '()
      (let [x (head xs)
            part (partition (curry > x) (tail xs))
            smaller (head part)
            bigger (last part)]
        (concat (qsort smaller) (concat (list x) (qsort bigger)))))))

(def all
  (fn [p xs]
    (apply and (map p xs))))

(def any
  (fn [p xs]
    (apply or (map p xs))))

We also define some convinience functions over lists: concat, partition, qsort, all, any. We also define a range function to build lists.

There are other functions that we have implemented. And of course we could implement many more functions, but we stop here.

Conclusion

We have reached the end of our journey, si I guess I’ll repeat myself: If you want to understand how a technology functions, then the best way to achieve that, is to implement/write/code yourself said technology. So, if you want to understand how a language works and in particular a functional language like lisp, then the best way to go about it, is to write a lisp yourself.

I invite you to implement your own language. You can also use this particular lisp implementation to experiment around. Good luck!

tags: scala combinator_parsers lisp programming_languages

© jaratec - Content is copyrighted and cannot be copied.