Friday 2 September 2011

Rethinking inheritance: promiscuous methods in Minx

TL;DR

Announcing Minx, a statically typed, functional, OOP language which hopes to demonstrate how flexible a static OOP system can be without inheritance or classes. Vaporware alert! The Minx language does not exist in any form at the moment besides this post - it's just a concept, but at the least I hope it will inspire other languages.

  • Design is focused on simplicity and elegance, removing more from OOP than it adds
  • Features are focused on dependency management, with novel systems for isolating and layering functionality
  • Highly decoupled sharing and extension mechanisms
  • All state defaults to being immutable and non-nullable
  • A simple templating system for generic types/functions

About Minx

In a previous post I outlined the features a next generation OOP language would have - that is, a language that includes all the benefits of modern OOP while eliminating a lot of the frustrations. In this post I showcase a candidate language "Minx" which aims to fulfill these aims. Minx is named after my wife, Catherine ;).

Minx (noun): An impudent, cunning, or boldly flirtatious girl or young woman

Minx is for you if:

  • You get frustrated at how easily large applications descend into unmaintainable messes, even with the best of intentions
  • You want the flexibility of a dynamic language, but statically typed
  • You believe dependency management is enough of a concern to be worth building into the language
  • You like simple and elegant solutions, and think modern OOP has too many concepts

Example Code

As an example, here's a Minx implementation of the wikipedia example code for the "Template Method" design pattern. This pattern always struck me as the best cheerleader for classical inheritance because of how well it leverages the "protected" modifier. However, we can do better :). Because dependency management is such a feature of the language I've included a multi-file example.

exclude:

#Default rule
*
  exclude Games.Implementation.*

Games.Public.SampleGames.*
  import Games.Implementation.*

Games.Public.minx:

# Standard turn-based games
Game =
  tryInitializeGame {playerCount int}-> bool,
  makePlay {player int} ->,
  endOfGame -> bool,
  printWinner ->

# Games with a concept of multiple rounds of play; The default
# value means any Game can be used as a RoundBasedGame
RoundBasedGame = Game +
  endOfRound : {roundNo int} ->    

RoundBasedGame.playOneGame = {playerCount int} ->
  unless tryInitializeGame(playerCount)
    # report invalid playerCount
    return

  var! playerNo = 0
  var! roundNo = 0
  until endOfGame()
    makePlay playerNo
    playerNo++
    if playerNo == playerCount
      playerNo = 0
      endOfRound roundNo
      roundNo++

  printWinner()

Games.Testing.minx:

newMockGame = {lastMoveNo int, reportError ({error string} -> )} ->

  int!? numPlayers
  int! nextMoveNo = 0

  return {
    tryInitializeGame : {playerCount int} ->
      if numPlayers?
        reportError "initialized game twice"

      numPlayers = playerCount
      return true,

    makePlay : {player int} ->
      unless numPlayers?
        reportError "must initialize game before first move"

      unless nextMoveNo < lastMoveNo
        reportError "moves were attempted after game has ended"

      int nextPlayer = nextMoveNo % numPlayers

      unless nextPlayer == player
        reportError "players called out of sequence"

      nextMoveNo++,

    endOfGame : ->
      unless numPlayers?
        reportError "must initialize game before checking if ended"  

      return nextMoveNo >= lastMoveNo,

    printWinner : ->
      if nextMoveNo < lastMoveNo
        reportError "cannot print winner until game has ended"

    } as Game

Games.Public.Playable.minx:

Playable =
  playOneGame {playerCount int} ->

# helper for next method
{count int}.times = {callback (->)} ->
  callback() for i in [1..count]

Playable.playMultipleGames = {playerCount int, gameCount int} ->
  gameCount.times -> 
    playOneGame playerCount

Playable.playKnockout = {playerCount int} ->
  playOneGame numPlayers for numPlayers in [playerCount..2] if playerCount >= 2

Games.Implementation/Monopoly.minx:

MonopolyGameData = 
  chanceCards ChanceCard[],
  playerPositions int[]! : [],
  isGameOver bool!

MonopolyGameData.tryInitializeGame = {playerCount int} ->
  # Initialize players
  # Initialize money
  return true

MonopolyGameData.makePlay = {player int} ->
  # Process one turn of player

MonopolyGameData.endOfGame = ->
  return isGameOver

MonopolyGameData.printWinner = ->
  # Display who won

newMonopolyGameData = ->
  return { 
    isGameOver bool! : false,
    chanceCards : [...]
  } as MonopolyGameData

Games.Implementation/Chess.minx:

ChessPiece = 
  isBlack -> bool,
  pieceDescription -> string

ChessGameData = 
  board ChessPiece?[]!

ChessGameData.tryInitializeGame = {playerCount int} ->
  return false unless playerCount == 2
  # Initialize players
  # Put the pieces on the board
  return true

ChessGameData.makePlay = {player int} ->
  # Process a turn for the player

ChessGameData.isEndOfGame = ->
  return isInStaleMate or isInCheckmate

ChessGameData.printWinner = ->
  # Display the winning player

newChessGameData = ->
  return { board : [...] } as ChessGameData

Games.Public.SampleGames.minx:

newChessGame = ->
  return newChessGameData() as Game

newMonopolyGame = ->
  return newMonopolyGameData() as Game

Main.minx

newMonopolyGame().playMultipleGames {playerCount: 4, gameCount: 2}

Basic Syntax

Improving code readability is an important cause. The most readable language I've come across is CoffeeScript, so I've pretty much gone with its syntax throughout.

The main change I've made is to make it case-insensitive: the developer should not be delayed because they typed "OkButton" rather than "OKButton". In other languages, case-sensitivity is needed to overload the same term to use it as property, field, and parameter all at once - which Minx addresses with its choice of features instead, i.e. no properties or constructor functions. The coding standards, on the other hand, are PascalCase for interface names, camelCase for everything else (after JavaScript). These will be implemented as compiler warnings rather than errors, as the readability gains from consistency are not worth a direct productivity cost.

Data Structures

Traditional class-based OOP has monolithic classes, consisting of private fields, private and public methods that can access these fields, and constructor functions that initialize instances. This mixes concerns for method call semantics:

  • Only methods which are part of the class's "Single Responsibility" should be defined within the class
  • Only methods defined within the class get obj.method() semantics
  • Therefore only methods which are part of the class's Single Reponsibility get desirable semantics; other responsibilities either get second-class semantics, or get erroneously lumped into the class definition, making maintenance harder

Minx replaces classes with data structures containing fields and separate methods that operate on them, after Go. Methods can be defined for any data structure or interface, after C# extension methods, and these methods can be used to satisfy interfaces (structural typing - or "static duck typing"), after Go. Minx takes both Go and C#'s features to their logical conclusion, allowing methods to bind to a data structure, satisfy interfaces, and allow new methods to bind. Compilation will not be fast. For a discussion of the effect this has on encapsulation, see the section on the Open/Closed Principle.

JSON is one of JavaScript's great gifts to the world and is fast becoming the serialization format of choice - for good reason, as it is succinct yet readable. Minx data structures are strongly typed JSON. The {name : value, ... } syntax becomes {name type : value, ... }, with types/values optional in some circumstances. Note we have dropped into the convenient JavaScript parlance of calling data structure fields "properties" (with no ambiguity, as we have ruled out OOP properties). Data Structures can be extended/combined with the + operator, providing they share no members with the same name.

A Minx interface is a data structure with missing values. Any values defined in an interface are defaults - for example, {name: "Minx"} as {name string, age : 0} binds the value to the interface to produce {name : "Minx", age: 0}. Note that the C# as keyword has been re-purposed for compile-time binding, rather than run-time. This system is similar to self's prototypical inheritance : the duality between classes and object instances is eliminated. The difference is that it is statically typed (like Omega), can be used to hide as well as add members, and has no clean separation between the original and "cloned" or bound object - they are instead different views of the same data.

Functions

In Minx, functions are actions (verbs), concerned with 3 categories of data, each represented by an interface:

  • The target of the action (in the grammatical sense - this or Me in other languages). Omit this for standalone functions, rather than methods
  • The parameters of the action. Omitted this becomes {}, the empty object
  • The result of the action (inferred from the body)

All the properties specified by the target and parameters are loaded into the initial scope when the function is called. To rephrase, we always explode the target into its constituents: there is no equivalent to this for accessing the whole. This means we have to choose an abstraction level to work with at the definition of the function - gone is the carte blanche for mixing field access, low-level method access and high-level method access. This is probably the most opinionated part of Minx: Complex objects should be layered, and every operation should explicitly choose a level to operate at. For instance, the example code is cleanly layered into:

  • Data structure definition
  • Methods which interact directly with the data structure (and together meet the Game interface)
  • Automatic binding to the RoundBasedGame interface
  • playOneGame which orchestrates the elements in the RoundBasedGame interface, meeting the Playable interface
  • Higher-order methods like playKnockout which can act on any Playable

As a result, Minx should hopefully be relatively resistant to big-ball-of-mud disease. If any of these data structures contains a single property, an instance of the type of the property can be supplied instead (assuming no ambiguity). Binding a method to a data structure produces a stand-alone function, with type denoted (parameterType -> resultType). The () can be omitted where unambiguous. Overloading is allowed, but method names within a single namespace must be unique. Minx also has anonymous functions whose "target" is the current scope. These can be used to create an object satisfying an interface on the fly (see MockGame in the example).

ValidRange=
  min int,
  max int

{value int}.isInRange = ValidRange ->
  return value <= max and value >= min

bool inRange = 5.isInRange {min: 3, max: 7}

# equivalent to:    
bool inRange = {value: 5}.isInRange {min: 3, max: 7}

# also equivalent to
(ValidRange -> ) isFiveInRange = 5.isInRange
bool inRange = isFiveInRange {min : 3, max : 7}



{title : "", firstName string, lastName string}.getFullName = ->
  if title == ""
    return "#{firstName} #{lastName}" 
  else 
    return "#{title} #{firstName} #{lastName}"

var teacher =
  title : "Miss", 
  firstName : "Agatha", 
  lastName : "Trunchbull", 
  position : "Headmistress"

var student =
  firstName : "Bruce", 
  lastName : "Bogtrotter", 
  likes : "cake"

teacher.getFullName()
student.getFullName()

Various languages have over the years taken different approaches to de-limiting code blocks - {/}, BEGIN/END, Then/End If etc. Eventually, the creators of the language B2 noted that human-readability required that the block itself be indented, and replaced the delimiters with indentation, codifying what was already common practice but not enforced. This is a very important idea - elevating a best practice into a language feature. Just as B2 (or in the mainstream, Python) exchanged indentation freedom for less error-prone block delimiters, we exchange naming freedom for more flexible method-binding. It is a best practice to name consistently, so method binding is reliant upon it.

Modifiers

Immutability

Consider C++ and Scheme: two polar opposites of language design, united in the significance of immutability for code safety and compiler optimization. All data in Minx default to being immutable to emphasize immutability as a best practice. You must add the ! modifier to a type to creatable a mutable instance.

Nullability

When attempting to parse an integer from text, what should you return if no integer can be extracted? Fundamentally we need to communicate the presence of a "special value", but many languages' best practises disallow this due to historical poor choice of values (i.e. -1) that were the source of many bugs. IMHO, null or "absence of a value" is the correct special value in the majority of cases. But how do ensure it's intuitive? In Minx all data defaults to non-nullable. By changing the default, the presence of nullability becomes clear documentation of the author's intention to use null as a special value. Add a ? modifier to the end of types to signify nullability - e.g. when parsing integers, int? allows an integer if one can be parsed, or null if not. Use the variable? operator to check whether a nullable value has a value, and variable| to access the value of a non-null nullable.

Aside: hopefully you - like me - look at the examples below and go "Ugh"! This is the intention - that to introduce mutable or nullable state you must uglify your code very slightly; to make wrong code look wrong. This should provide the correct incentive - to avoid either wherever possible, and so be pushed by the syntax towards more correct and efficient programs.

# use "var" to declare a local where the type can be inferred
var a = "a"

# add modifiers to var
var?! c = "c"
if c?
  var d = c|
c = null

# if the type cannot be inferred it must be specified;
# null can be omitted
string? b 

# immutable
var d = {name : "Minx", age : 0}
# compiler error:
d.name = "Chewbacca"

# we must go out of our way to specify "name" and "e" as 
# mutable properties.
var! e = {name string! : "Minx", age : 0}
e.name = "Chewbacca"

# compiler error as age was not specified mutable.
e.age = 1


Counter = 
  value int,
  increment ->

# callers cannot alter the value property directly
# but have the ability to increment it via the method
getCounter = ->
  var! count =
    value int! : 0

  return count + {
    increment ->
      count.value++
  } as Counter

Dependency Management via Namespacing

In Minx, the unit of functionality is the namespace, which is taken from the combination of filename and relative path; . or / characters in filenames and paths become . characters in namespaces - so the code with namespace System.StringHandling.Uri could be located in any of the following locations:

  • [ProjectDirectory]/System.StringHandling.Uri.minx
  • [ProjectDirectory]/System.StringHandling/Uri.minx
  • [ProjectDirectory]/System/StringHandling.Uri.minx
  • [ProjectDirectory]/System/StringHandling/Uri.minx

The choice is left to the developer, and allows projects to slowly evolve in size without at any point forcing them into a structure they are too big/small for. As you may have noticed in the example code minx code files are completely unaware of the namespaces they are contained within and are dependent upon. This is the fundamental requirement for dependency management - that your code has no hard-coded dependencies to any one thing. Sure it refers to particular named interfaces and requires certain methods - but if you supplied alternatives with the same signature it would never know the difference right? Now we can re-wire deep aspects of our applications's functionality just by changing which files have other files in scope.

But how should we specify what files can be in scope? I was inspired by a mailing-list post by Joe Armstrong about the possibilities of a "Key-value database" of Erlang functions. Decomposing classes back into a sea of functions opens the possibility of doing the same with object-oriented code. The "unique names" for each function are the fully qualified (i.e. with the namespace) function names; If we match these names with wildcards we have a basic query system for this database!

import  a.*
exclude a.b.*
import  a.b.c.*
exclude a.b.c.d

This system is very flexible and requires few statements to specify the locus of functionality required. It is a compiler error to specify these in an order where later statements eclipse previous statements (e.g. import a.b.*; exclude a.*;). In the example above, member d in namespace a.b.c is excluded - but the rest of the a.b.c namespace is included. Dependencies are handled by the three configuration files: import, export, and exclude.

import

import specifies a list of libraries we want to import functionality from, and what namespaces we want to import from them. If omitted, the project imports nothing from any libraries. Lets say I want to import all the functionality from Standard.dll, except for the System.StringHandling.UriEscaping namespace, which I want to import from Alternative.dll. I want to define this in one place so the rest of the application can be unaware of its dependencies - and so I can swap it out with ease when needed. Think about what a system like this can do for dependency injection:

Standard.dll
  import  *
  exclude System.StringHandling.UriEscaping

Alternative.dll
  exclude *
  import  Alternative.AlternativeUriEscaping

export

export controls the output of the project. Usually this will be a single executable or library - but there may be scenarios where that's undesirable. In circumstances where other systems may require you to batch your code up into multiple projects in order to separate out different concerns, Minx allows you to direct different functionality into different targets. It even allows you to inline to the max by incorporating your imported libraries into a single executable if you so wish. At compilation it's verified that your outputs are non-overlapping, have no circular references, and that every output has access to the functionality it is dependent upon (whether this is in another output or one of the imported libraries).

#Default rule
*
  exclude System.*

MyApp.Database.dll
  export MyApp.DataAccess.*

MyApp.BusinessLogic.dll
  export MyApp.Logic.*
  export MyApp.Calculations.*

MyApp.Main.dll
  export MyApp.*
  exclude MyApp.DataAccess.*
  exclude MyApp.Logic.*
  exclude MyApp.Calculations.*

exclude

This is where the exciting stuff happens. exclude specifies at the namespace level what code should be isolated from each other. It is an explicit self-documenting architecture for the project:

#Default rule
*
  exclude *

MyApp.Controller.*
MyApp.UI.*
MyApp.Logic.*
MyApp.Database.*
  import  *

MyApp.UI.*
MyApp.Logic.*
MyApp.Database.*
  exclude MyApp.*

MyApp.UI.*
  import MyApp.UI.*

MyApp.Logic.*
  import MyApp.Logic.*

MyApp.Database.*
  import MyApp.Database.*

The first rule ensures that if we mis-type a namespace anywhere, rather than just getting carte blanche to ignore these scoping rules, the feature has access to nothing. Features in the Controller namespace have access to everything, and the UI, Database and Logic layers are completely segregated from everything in the app except themselves. Code always has access to it's own namespace but not sub-namespaces so extra rules are needed. It is a compile error to specify rules in an order where later patterns are less specific than earlier patterns because of the effect on readability.

We can exert far more control than just this however. Want to swap out the sockets implementation throughout the whole application? Done with a couple of lines. Want to exchange a single method in a library for one of your own? Done. Want to mock a component for developer-testing? Again, done. I'm pretty sure that exclude will quickly grow to be huge, in which case it will be necessary to have a namespace system of its own (e.g. MyApp.Database/exclude contains the sub-rules for the Database namespace) and it will be desirable to have different exclude files for different scenarios - but it would take real-world usage to work out how best to extend this.

Generics/Templates

Interfaces/methods can be templated through the use of unknown types. "Don't care" types can be represented by a single _ (or omitted where unambiguous), and types that occur in multiple places can be matched with _Type patterns. Moving into the realm of "things I'm pretty sure should be possible", the latter also bring a variable Type into scope which can be queried for meta-programming. To inject source code at compile-time, use a ~{} block in the same way you would inject code into a string with #{}. Unlike C# or C++, the type parameters are inferred from usage:

GenericList = 
  getItem {index int} -> _ItemType,
  setItem {index int, item _ItemType} -> ,
  count -> int

# usage implies method only valid for ItemType = string
GenericList.forEach
{withEach {item string} ->} ->
  withEach (getItem(i)) for i in [0..count - 1]


# currying functions
setupLocalServer = 
{parameters} -> 
  setupServer parameters + {ipAddress : "127.0.0.1"}


# static "methodMissing" that maps between two interfaces.
Interface1 = 
  name string,
  age int,
  address string

Interface2 = 
  getName -> string,
  getAge -> int,
  getAddress -> string

Interface1._methodName = ->
  return ~{ methodName.substring(3).toLower() }


# flexible method-call semantics
_.isInRange
_ -> bool
  return value <= max and value >= min

5.isInRange {max: 7, min: 3}

{max: 7, min: 5}.isInRange 6

Minx and the SOLID design principles

The Single Responsibility Principle

To quote the wikipedia article on the subject:

...every object should have a single responsibility, and that responsibility should be entirely encapsulated by the class... [Robert C Martin] defines a responsibility as a reason to change, and concludes that a class or module should have one, and only one, reason to change.

Namespaces (i.e. files) are the Minx unit of functionality, so the equivalent rule should be that every namespace should satisfy exactly one concern, and therefore only have a single reason to change.

The Open/Closed Principle

This principle states objects should be open for extension, closed for modification. The traditional interpretation under inheritance is to make class members containing key logic unmodifiable, while allowing others to be overridden so behaviour can be extended. However in Minx where all encapsulation ("closing") is instead achieved via scope, the principle encourages returning abstracted interfaces which hide internals and allow extension by permuting/ adding to the visible methods. The traditional approach is epitomized by the Template Method pattern, and I hope my example code demonstrates how Minx allows a much more elegant and flexible implementation.

The Liskov Substitution Principle

This principle states that any object which is a "sub-type" of another should be interchangeable with it, without it affecting program correctness. Minx makes satisfying this principle easy: on supplying an object to a function, any members or metadata which are not part of the requested interface are hidden from scope. There is no dynamic type-checking, so different instances satsifying an interface are completely indistinguishable. This was in fact one of the motivating reasons for the language (see my previous post).

The Interface Segregation Principle

This principle states that interfaces should have fine granularity so your callers are less dependent on changes to your API. Minx's design makes callers immune to this to an extent, as they can create their own sub-interface to depend upon instead. Minx also allows more sophisticated scenarios where methods which act on lower-level interfaces can be used to constuct higher-level interface instances, avoiding the need to lump all methods into a single interface.

The Dependency-Inversion Principle

A. High-level modules should not depend on low-level modules. Both should depend on abstractions.

B. Abstractions should not depend upon details. Details should depend upon abstractions.

Minx makes this easier, as it allows splitting functionality into de-coupled layers without compromising on the semantics. Dependency-injection scenarios are trivial using the namespace functionality (see namespaces). Choose the appropriate abstraction level for your purposes.

Undecided

static

Does static modifiable data have any place in a well-constructed application? I want to say "no", because it causes so many problems and muddies a very clean design. However there are bound to be circumstances where it is the best tool for the job. As it stands, Minx allows it with the same visibility rules as all other members. To stop this getting out of hand I'd be tempted to break the symmetry of the design and make modifiable static variables private to their containing file just to reign in the scope for wrong-doing. Kludge.

arrays vs sequences

You will have noticed the use of the [] notation for collections in the example code. I'm toying with making this primitive a memoizing sequence rather than an array as would be traditional in OOP. Then we could define methods that return iterators, or infinite sequences of natural numbers like [1..] while still defining a nice efficient array from 1 to 10 like [1..10]. I'm sure we can find a way for arrays to still be a fast special case in this system, and increase the functional power dramatically at the same time.

Concurrency

I've deliberately avoided discussing concurrency as it's not something that's come up a lot in my career so I have no strong views on how to add it in

Exceptions

I wussed out on exceptions in the end. I seem to be in a minority in thinking that Java was onto something with checked exceptions: exceptions are part of a methods signature, and static typing should be able to ensure that calling code has taken these exceptions into account. You could minimize the burden by inferring what exceptions are thrown, meaning you'd only have to explicitly specify them on virtual method calls, i.e. methods declared on interfaces - but you could declare a method as throwing no exceptions if you wanted the compiler to enforce it for you (e.g. on application entry points). Of course even Java created unchecked exceptions too, as there will always be exceptions that you cannot reasonably anticipate - running out of memory for example - so I relented, rather than cripple the language

Conclusions

Hopefully I've shown you that statically typed OOP without inheritance is desirable and feasible, and that in general OOP has a lot of fat that can be trimmed without compromising its utility. What do you think? Would you like to write a large-scale application in Minx? Are there any huge flaws/glaring omissions? How would you go about building a system like it - it hasn't escaped my notice that it's bloody hard to build a statically typed languages this flexible. Can it be built? If not do you know of a quick counter-proof?

No comments:

Post a Comment