Bodging alternatives to full-on compilers– and the true meaning of OOPness

“You say, well if we were using Haskell for this then we would just do this thing; can we adapt that to what we’re doing here? Not saying you have to use Haskell to solve it, but could you use that kind of approach?” — Damian Conway

I’ve been experimenting with parsing for a while now. I’m on the verge of digging into tokenizing and grammars. Before I hunker down and educate myself, I just want to see how far you can go with less industrial-strength approaches. I think it’s pretty cool how combining a single good regex with a closure gets you what is for all intents and purposes a mini-compiler. This is a pretty handy trick; filing away such dynamically generated functions in a hash table can lead to some pretty powerful techniques, too. A dispatch table like that can even be used to create a simple programming language on the fly. Tokenizing a string is just another algorithm… and on first glance, grammars appear (in a way) to be fancy dispatch tables. How do you bring them all together? Some kind of deep magic from the dawn of time– something I would never figure out by trial and error or intuition. I’m getting there. But just for fun, let’s generalize some of these ideas and (at the same time) temporarily ignore them. Let’s go entirely orthogonal to this plane of thinking for a moment.

OOP as it gets applied in ‘the real world’ is primarily a defensive exercise: “I don’t like you and I don’t like the way you code. Let me just wall off on my little corner and do my thing. Let me work with your stuff without having to understand your code. Don’t break the contract, either, bub. Somehow, I don’t think this is where Alan Kay intended us to go with this….

Think about the difference between Checkers and Chess for a second. The permutations of moves are effectively infinite in both games. But the space of possibilities have different ‘shapes’. If you were, like some sort of Paul Atreides, to look out at the limitless patterns in the games, for chess you’d see an expansive plane of cascading options. Checkers would look more like an infinitely deep well.

The way we tend to do OOP is more like Checkers than it is like Chess. The calls between and among the various objects seems brittle and constricted. Like talking to someone through an incredibly long pipe instead of face to face…. So how to open things up and make things more fluid and fluent? My idea is to swing the emphasis in OOP back much more toward the message passing concept, steal a few idioms from Lisp, and generalize heavily on the overall theme of DSL’s. (I always thought it was cool how in Lisp practically everything had a straight-forward text representation. The reader transforms what you type into a tidier, more consistent representation… and then shows it to you as the response.)

What if in designing a programming language, you divided it up into various objects or components. To extend the language you just drop in new objects. The classes that the objects derive from each come (after a fashion) with their own DSL. Each object would ‘know’ how to generate its own source code. While working with an object via a REPL, you could modify the object… then view and/or save the modified source without dropping back to the IDE or text editor. The object source code should understand the concept of testing and incorporate test code into its specification. When you talk to code via a REPL, you end up doing lots of little tests anyway. You should be able to incorporate those off-the-cuff tests into the code without thinking about it. (Any inconvenience in automated testing development just makes it that much more likely that it doesn’t get done. Get rid of everything that stands in the way of it.)

So each object in the system would know how to do stuff like read, source, save, and load. Each object would have its own mini-language that you can focus on separately without having to think of all of the parsing at once. Your collection of objects would get wrapped in a unified reader. The main reader would pass messages to each object-group in the deck until something understood the message. (And just like “library lists” on the AS/400, you could rearrange the reading order on the fly.) Simple! Now you can build a programming language without having to think about the whole thing at once– and you get modularity and testability benefits in the process. If you don’t have time to write ‘real’ compilers and DSL’s, this might be enough to get you by for a while. Of course, it might be enough to make you buckle down and learn some real computing techniques, too. Judge for yourself.

If you’d like to play with this (or see more specifically part of what I’m trying to describe), I’ve implemented the gist of the idea in Perl. There’s nothing fancy in it; I’m just using the bottom 30% of the language for the most part. There’s plenty of places the overall idea could be further generalized and tightened up. (I think I’m part way through reinventing Smalltalk, but this is just for fun anyway.)

Here’s a couple of sample screen shots plus the code:

Do what I mean; Who cares about joins?

Do what I mean; Who cares about joins?

Some Lame Syntactic Sugar Hacks

Some Lame Syntactic Sugar Hacks

###################################################################################################
{
    
package Functions;
use Moose;

has Functions => (isa => 'HashRef', is => 'rw', default => sub { {} } );
has Source => (isa => 'HashRef', is => 'rw', default => sub { {} } );
has Environment => (isa => 'Environment', is => 'rw'); ### required only if implementing syntactic sugar for table refernces

sub set {
    my ($self, $key, $text) = @_;
    my ($ref, $code, $r);
    $code = $text;
    $code =~ s/(\d+[dD]\d+)/\$self->roll_maker('$1')->()/g; ### the regex could be improved-- missing some cases....
    $code =~ s/!(\w+)/\$self->Functions->{$1}->()/g; ### should this call Environment or the local deck?   [YES!]
    $r = '[^~\s;]';
    $code =~ s/($r+)~($r+)~($r+)/\$self->Environment->Tables->cell($1, $2, $3)/g;
    $code = "sub { $code }";
    print "The actual code is $code\n";
    $ref = eval $code;
    if ($@){
        print "Error setting $key: $@" if $@;
    } else {
        $self->Source->{$key} = $text;
        $self->Functions->{$key} = $ref;
    }
}

sub call {
    my ($self, $key) = @_;
    if ($self->Functions->{$key}){
        eval { $self->Functions->{$key}->(); };
        print "Error calling $key: $@" if $@;
    } else {
        print "No such function '$key'.\n";
    }
}

sub show {
    my ($self, $key) = @_;
    print "\n\n";
    if ($key){
        if ($self->Source->{$key}) {
            my $source = $self->Source->{$key};
            print "set $key: $source\n";
        } else {
            print "No source available for $key.\n";
        }
    } else {
        foreach $key (sort keys %{$self->Source}) {
            my $source = $self->Source->{$key};
            print "set $key: $source\n";
        }
    }
    print "\n";
}

sub read {
    my ($self, $line) = @_;
    if ($line =~ /^set (\w+):\s*(.+)/){
        if (defined($self->set($1, $2))) {
            print "Function $1 set.\n";
        }
    } elsif ($line =~ /^call (\w+)$/){
        $self->call($1);
        print "\n";
    } elsif ($line =~ /^show\s*(\w*)/){
        if ($1) {
            $self->show($1);
        } else {
            $self->show();
        }
    } elsif ($line =~ /^load\s+([\w.]+)/){
        $self->load($1);
    } elsif ($line =~ /^save\s+([\w.]+)/){
        $self->save($1);
    } else {
        return 0;
    }
    return 1;
}

sub load {
    my ($self, $file) = @_;
    open FILE, "< $file";
    while (<FILE>){
        chomp;
        $self->read($_);
    }
    close FILE;
}

sub save {
    my ($self, $file) = @_;
    open FILE, "> $file";
    foreach my $key (sort keys %{$self->Source}) {
        my $source = $self->Source->{$key};
        print FILE "set $key: $source\n";
    }
    close FILE;
}

sub roll_maker ($) {
    my($self, $line, $num, $type);
    $self = shift;
    $line = shift;
    
    return undef unless $line =~ m/^(\d+)?[dD](\d+|%)/;

    $num    = $1 || 1;
    $type   = $2;

    $type  = 100 if $type eq '%';

    return sub {
        my $result;
        for( 1 .. $num ) {
            $result += int (rand $type) + 1;
        }
        return $result;
    }
}

}
###################################################################################################
{    

package Callback;
use Moose;

sub choose {
    my ($self, $text) = @_;
    my @options = split /;/, $text;
    #print "the text is $text\n";
    my $i = 0;
    print "\n";
    foreach(@options){
        $i++;
        print "$i: $_\n";
    }
    
    my $answer;
    until ($answer) {
        print ">> ";
        chomp(my $line = <STDIN>);
        if ($line =~ /^\d+$/) {
            $answer = $options[$line - 1];
        } else {
            foreach(@options){
                if(/^$line/i){
                    $answer = $_;
                    last;
                }
            }
        }
    }
    return $answer;
}

}
###################################################################################################
{

package Table;
use Moose;

my $show_debug2 = 0;

has Name => (is => 'rw', isa => 'Str', default => 'Bob');
has 'Rows' => (isa => 'HashRef', is => 'rw', default => sub { {} } ); 
has 'Columns' => (isa => 'ArrayRef[Str]', is => 'rw', default => sub { my @a; return \@a; } );
has 'ColumnRegexes' => (isa => 'ArrayRef[Str]', is => 'rw', default => sub { my @a; return \@a; } );

sub initialize {
    my ($self, $table, $text) = @_;
    $self->Name($table);
    my ($columns, $column_regexes) = parse_column_header($text);
    $self->Columns($columns);
    $self->ColumnRegexes($column_regexes);
    my %rows;
    $self->Rows(\%rows);
}

sub load_line {
    my ($self, $text) = @_;
    my ($rowkey, $row) = parse_row_detail($self->Name(), $text, $self->Columns());
    $self->Rows->{$rowkey} = $row;
}

sub cell {
    my ($self, $row, $column) = @_;
    return $self->Rows->{$row}->{$column};
}

### pass a comma delimited header line from a table definition
### and get two array references describing the table structure
sub parse_column_header {
    my ($line) = @_;
    my @fields = split /,/, $line;
    my $column_number = 0;
    my @columns;
    my @regexes;
    print "reading columns to table: $line\n" if $show_debug2;
    foreach(@fields){
        my $field = $_;
        $field =~ s/^\s+|\s+$//g; # trim field
        if($field =~ /^([^\/]*)\/([^\/]*)\//){
            $field = $1;
            $regexes[$column_number] = $2;
        }
        $columns[$column_number] = $field;
        $column_number++;
    }
    return (\@columns, \@regexes);
}

### pass a table name and a comma delimited header line from a table definition
### and also pass a reference to an array of column names...
### and get the row's key and a hash of detail data
sub parse_row_detail {
    my ($table, $line, $columns) = @_;
    my @fields = split /,/, $line;
    print "reading rows to table $table: $line\n" if $show_debug2;
    my %row;
    my $column_number = 0;
    my $rowkey;
    my $didit;
    foreach(@fields){
        my $field = $_;
        $field =~ s/^\s+|\s+$//g; # trim field
        # Need to allow some keys to be '0'!
        if ($didit){
            $row{$columns->[$column_number]} = $field;
        } else {
            $rowkey = $field;
        }
        $column_number++;
        $didit = 1;
    }
    return ($rowkey, \%row);
}

sub show_keys {
    my ($self) = @_;
    my $key;
    my $table = $self->Name();
    my $rs = $self->Rows();
    foreach $key (sort keys %{$rs}){
        print "$table row    : $key\n";
    }
    my $cs = $self->Columns();
    foreach (@{$cs}){
        print "$table column : $_\n";
    }
}

sub show {
    my ($self) = @_;
    print $self->text();
}

sub text {
    my ($self) = @_;
    my $text;
    my $x;
    
    $text =  "table $self->{Name}:\n";
    $text .= "head: ";
    foreach(@{$self->Columns}){
        $text .=  ", " if $x;
        $text .=  $_;
        $x = 1;
    }
    $text .=  "\n";
    foreach my $key (sort keys %{$self->Rows}){
        $x = 0;
        $text .=  "row: ";
        foreach(@{$self->Columns}){
            $text .=  ", " if $x;
            $text .=  $key unless $x;
            $text .=  $self->cell($key, $_) if $x;
            $x = 1;
        }
        $text .=  "\n";
    }

    return $text;
}

}
###################################################################################################
{

package Tables;
use Moose;

has Tables => (isa => 'HashRef[Table]', is => 'rw', default => sub { {} } );
has ReadAsTable => (isa => 'HashRef[Str]', is => 'rw', default => sub { {} } );
has Current => (isa => 'Str', is => 'rw');
has Callback => (isa => 'Callback', is => 'rw', default => sub { Callback->new(); } );

sub cur {
    my ($self, $table) = @_;
    $self->Current($table);
    if ($self->Tables->{$table}) {
        print "Table $table is the active table.\n";
    } else {
        print "Ready to initialize table $table.\n";
    }
}

sub cell {
    my ($self, $table, $row, $column) = @_;
    my $T = $self->Tables->{$table};
    if ($T){
        my $value = $T->cell($row, $column);
        $value = $self->private_check_other_tables($table, $row, $column, $T) unless $value;        
        if ($value){
            $value = $self->ReadAsTable()->{$value} if $self->ReadAsTable()->{$value};
            if ($value =~ /;/){
                $value = $self->Callback->choose($value);
            }
            return $value;
        } else {
            print "Row $row Column $column not found in table $table.\n";
            return undef;
        }
    } else {
        print "Table $table does not exist.\n";
        return undef;
    }        
}

### if value not found, loop through each table
### if a table contains the column *and* the key is contained in the specified table
### then return the foreign value!
sub private_check_other_tables {
    my ($self, $table, $row, $column, $T) = @_;
    foreach my $key (sort keys %{$self->Tables}) {
        unless ($key eq $table){
            my $OTHER = $self->Tables->{$key};
            foreach(@{$OTHER->Columns}){
                #print "col-- $_\n";
                if ($_ eq $column){
                    ### does the first column match the table we're looking at?
                    foreach(@{$T->Columns}){
                        #print "mycol-- $_\n";
                        if ($_ eq $OTHER->Columns->[0]){
                            my $foreign_key = $T->cell($row, $_);
                            my $foreign_cell = $OTHER->cell($foreign_key, $column);
                            #print "Found it! (foreign key $foreign_key) (value $foreign_cell)\n";
                            return $foreign_cell;
                            #last; ### this doesn't really *last* it.
                        }
                    }
                }
            }
        }
    }           
}

sub show {
    my ($self, $key) = @_;
    print $self->text($key);
}

sub text {
    my ($self, $key) = @_;
    my $text;
    
    $text = "\n\n";
    if ($key){
        $text .= $self->Tables->{$key}->text();
    } else {
        foreach $key (sort keys %{$self->Tables}) {
            $text .= $self->Tables->{$key}->text();
            $text .= "\n\n";
        }
    }
    $text .= "\n";

    return $text;
}

sub read {
    my ($self, $line) = @_;
    my $table = $self->Current();
    if ($line =~ /^table (\w+):?/){
        $self->cur($1);
        #print "Table $1 is now active.\n";
    } elsif ($line =~ /head: (.*)/) {
        my $T = new Table();
        $T->initialize($table, $1);
        $self->Tables->{$table} = $T;
        print "Table $table initialized.\n";
    } elsif ($line =~ /row: (.*)/) {
        my $T = $self->Tables->{$table};
        if ($T) {
            $T->load_line($1);
            print "Table $table line loaded.\n";
        } else {
            print "Table not found.\n";
        }
    } elsif ($line =~ /^(\w+)~(\w+)~(\w+)/){
        my $cell = $self->cell($1, $2, $3);
        if (defined($cell)) {
            print "$cell\n";
        } else {
            print "Cell not found.\n";
        }
    } elsif ($line =~ /^readas ([\w\s]+)\s+->\s+(.+)/){
        $self->ReadAsTable->{$1} = $2;
        print "Reading '$1' as '$2'.\n";
    } elsif ($line =~ /^show\s*(\w*)/){
        if ($1){
            if ($self->Tables->{$1}){
                $self->show($1);
            } else {
                print "Table not found.\n";
            }
        } else {
            $self->show();
        }
    } elsif ($line =~ /^show/){
        $self->show();
    } elsif ($line =~ /^load\s+([\w.]+)/){
        $self->load($1);
    } elsif ($line =~ /^save\s+([\w.]+)/){
        $self->save($1);
    } else {
        return 0;
    }
    return 1;
}

### same routine as in Functions class
sub load {
    my ($self, $file) = @_;
    open FILE, "< $file";
    while (<FILE>){
        chomp;
        $self->read($_);
    }
    close FILE;
}

sub save {
    my ($self, $file) = @_;
    open FILE, "> $file";
    print FILE $self->text();
    close FILE;
}

}
###################################################################################################
{

package Environment;
use Moose;

has Functions => (isa => 'Functions', is => 'rw', default => sub { new Functions() } );
has Tables => (isa => 'Tables', is => 'rw', default => sub { new Tables() } );
has Settings => (isa => 'HashRef', is => 'rw', default => sub { my %hash; $hash{WorkingWith} = 'Functions'; return \%hash; } );

### This should be in the constuctor
sub hack {
    my ($self) = @_;
    $self->Functions->Environment($self);
}

sub read {
    my ($self, $line) = @_;

    if ($line =~ /^work functions/){
        print "Working with functions.\n";
        $self->Settings->{WorkingWith} = 'Functions';
        return;
    }

    if ($line =~ /^work tables/){
        print "Working with tables.\n";
        $self->Settings->{WorkingWith} = 'Tables';
        return;
    }

    ### Refactor Me Sometime
    if ($self->Settings->{WorkingWith} eq 'Tables'){
        unless ($self->Tables->read($line)){
            unless ($self->Functions->read($line)){
                print "Unknown command.\n";
            }
        }        
    } else {
        unless ($self->Functions->read($line)){
            unless ($self->Tables->read($line)){
                print "Unknown command.\n";
            }
        }
    }
}

}
###################################################################################################
{
    
package Test;
use strict;
use warnings;
my $T = new Tables();
my $F = new Functions();
my $E = new Environment(Tables => $T, Functions => $F);

print "\n>> ";
while(<STDIN>){
    chomp;
    $E->read($_);
    print "\n>> "
}

}
###################################################################################################
Advertisements

2 Responses to “Bodging alternatives to full-on compilers– and the true meaning of OOPness”

  1. Mark Miller Says:

    Your example looks Lisp-like in the sense that you set a context and then everything you work with afterwards can be in that context.

    I’ve been trying to inspire myself lately to get more into developing my own language, not just for the sake of doing it, but in a larger context, like Jeff Moser’s “Meta-FizzBuzz” post. In the comments to that post Moser revealed something interesting Alan Kay said: “If you understand something well enough, you should be able to create a language for it.”

    A thought I’ve had is I need to just get a feel for doing it, just tinkering around, not doing the whole enchilada just yet, since It’s been a long time since I’ve done this sort of thing.

    You mentioned in your post that maybe you’re creating your own Smalltalk, and you described objects having their own language. So I thought, “Okay. Let’s see if I can do this in Smalltalk.” So I spent a few hours yesterday trying to work it out. In the end I think I made it more complicated than it needed to be, but I got it done. Without revealing the full source code, here’s code and output from a workspace session using what I built. The following code was evaluated, and the results are to the right of each of the bottom two rows, the result of doing a “PrintIt” operation (alt-P) in Squeak:

    | testing colors |
    Table clearTables.
    testing := Table new.
    testing setHeader: #(Key Description ColorID).
    testing addRow: {‘ABC’. ‘Hello World!’. 42}.
    colors := Table new.
    colors setHeader: #(ColorID Color).
    colors addRow: {42. #Red}.

    (testing rowByKey: ‘ABC’) getValue: #Color. #Red
    (testing rowByKey: ‘ABC’) getValue: #Description. ‘Hello World!’

    I used a cross-referencing technique similar to what you show going on in that if the “Color” field doesn’t exist in “testing”, it looks for other tables that have that field, using any foreign keys that are available in the source row.

    The way I did the tables was to have Table objects that contained a collection of Row objects, which had a collection of Field objects. A pretty traditional OOP style of design. It didn’t start out that way, but I gradually settled on this because I felt it made updating field values easier. Today I thought maybe it would be better if I used a more Smalltalk-oriented structure that facilitated the cross-referencing better. I haven’t tried it out yet.

  2. lispy Says:

    I’m playing around with language ideas– as opposed to actually *engineering* a language.

    The obvious abstraction of the above (which just sorta emerged from something random I was trying to do) is to have a base class in the language system that comes with its own dispatch table. The standard command in the system would be “COMMAND HASH-KEY: ARGUMENTS”. Each language object would override the list of standard commands. The “library list” thing would determine with language object-group was currently active… or which order to call them.

    That would clean out the ponderous and repetitive strings of if-elsif statements and pin down some common code between the Tables and Functions classes.

    The cross-reference trick lets you have relational type concepts without having to deal with joins. SQL is great with large business databases… but for small data-driven applications, I want all the DWIM I can stand. (And given that only a fraction of spreadsheet power users ever graduate to even a desktop database, I want language elements that are as simple as possible.)

    Database devs really like working from *inside* their respective databases. That’s the common theme between the Access, Foxpro, and even RPG coding. Just like Perl has shell commands and regex’s seamlessly integrated into the language… I want all ‘database’ interaction to be that way, too.

    This is pretty simple, but much easier to implement than the last Parser I made up. I’m wondering where in this scheme real parsing/compiling makes the most since. This design is probably sufficient for the trivial application I have in mind….

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: