“You say, well if we were using Haskell for this then we would just do this thing; can we adapt that to what we’re doing here? Not saying you have to use Haskell to solve it, but could you use that kind of approach?” — Damian Conway
I’ve been experimenting with parsing for a while now. I’m on the verge of digging into tokenizing and grammars. Before I hunker down and educate myself, I just want to see how far you can go with less industrial-strength approaches. I think it’s pretty cool how combining a single good regex with a closure gets you what is for all intents and purposes a mini-compiler. This is a pretty handy trick; filing away such dynamically generated functions in a hash table can lead to some pretty powerful techniques, too. A dispatch table like that can even be used to create a simple programming language on the fly. Tokenizing a string is just another algorithm… and on first glance, grammars appear (in a way) to be fancy dispatch tables. How do you bring them all together? Some kind of deep magic from the dawn of time– something I would never figure out by trial and error or intuition. I’m getting there. But just for fun, let’s generalize some of these ideas and (at the same time) temporarily ignore them. Let’s go entirely orthogonal to this plane of thinking for a moment.
OOP as it gets applied in ‘the real world’ is primarily a defensive exercise: “I don’t like you and I don’t like the way you code. Let me just wall off on my little corner and do my thing. Let me work with your stuff without having to understand your code. Don’t break the contract, either, bub. Somehow, I don’t think this is where Alan Kay intended us to go with this….
Think about the difference between Checkers and Chess for a second. The permutations of moves are effectively infinite in both games. But the space of possibilities have different ‘shapes’. If you were, like some sort of Paul Atreides, to look out at the limitless patterns in the games, for chess you’d see an expansive plane of cascading options. Checkers would look more like an infinitely deep well.
The way we tend to do OOP is more like Checkers than it is like Chess. The calls between and among the various objects seems brittle and constricted. Like talking to someone through an incredibly long pipe instead of face to face…. So how to open things up and make things more fluid and fluent? My idea is to swing the emphasis in OOP back much more toward the message passing concept, steal a few idioms from Lisp, and generalize heavily on the overall theme of DSL’s. (I always thought it was cool how in Lisp practically everything had a straight-forward text representation. The reader transforms what you type into a tidier, more consistent representation… and then shows it to you as the response.)
What if in designing a programming language, you divided it up into various objects or components. To extend the language you just drop in new objects. The classes that the objects derive from each come (after a fashion) with their own DSL. Each object would ‘know’ how to generate its own source code. While working with an object via a REPL, you could modify the object… then view and/or save the modified source without dropping back to the IDE or text editor. The object source code should understand the concept of testing and incorporate test code into its specification. When you talk to code via a REPL, you end up doing lots of little tests anyway. You should be able to incorporate those off-the-cuff tests into the code without thinking about it. (Any inconvenience in automated testing development just makes it that much more likely that it doesn’t get done. Get rid of everything that stands in the way of it.)
So each object in the system would know how to do stuff like read, source, save, and load. Each object would have its own mini-language that you can focus on separately without having to think of all of the parsing at once. Your collection of objects would get wrapped in a unified reader. The main reader would pass messages to each object-group in the deck until something understood the message. (And just like “library lists” on the AS/400, you could rearrange the reading order on the fly.) Simple! Now you can build a programming language without having to think about the whole thing at once– and you get modularity and testability benefits in the process. If you don’t have time to write ‘real’ compilers and DSL’s, this might be enough to get you by for a while. Of course, it might be enough to make you buckle down and learn some real computing techniques, too. Judge for yourself.
If you’d like to play with this (or see more specifically part of what I’m trying to describe), I’ve implemented the gist of the idea in Perl. There’s nothing fancy in it; I’m just using the bottom 30% of the language for the most part. There’s plenty of places the overall idea could be further generalized and tightened up. (I think I’m part way through reinventing Smalltalk, but this is just for fun anyway.)
Here’s a couple of sample screen shots plus the code:
################################################################################################### { package Functions; use Moose; has Functions => (isa => 'HashRef', is => 'rw', default => sub { {} } ); has Source => (isa => 'HashRef', is => 'rw', default => sub { {} } ); has Environment => (isa => 'Environment', is => 'rw'); ### required only if implementing syntactic sugar for table refernces sub set { my ($self, $key, $text) = @_; my ($ref, $code, $r); $code = $text; $code =~ s/(\d+[dD]\d+)/\$self->roll_maker('$1')->()/g; ### the regex could be improved-- missing some cases.... $code =~ s/!(\w+)/\$self->Functions->{$1}->()/g; ### should this call Environment or the local deck? [YES!] $r = '[^~\s;]'; $code =~ s/($r+)~($r+)~($r+)/\$self->Environment->Tables->cell($1, $2, $3)/g; $code = "sub { $code }"; print "The actual code is $code\n"; $ref = eval $code; if ($@){ print "Error setting $key: $@" if $@; } else { $self->Source->{$key} = $text; $self->Functions->{$key} = $ref; } } sub call { my ($self, $key) = @_; if ($self->Functions->{$key}){ eval { $self->Functions->{$key}->(); }; print "Error calling $key: $@" if $@; } else { print "No such function '$key'.\n"; } } sub show { my ($self, $key) = @_; print "\n\n"; if ($key){ if ($self->Source->{$key}) { my $source = $self->Source->{$key}; print "set $key: $source\n"; } else { print "No source available for $key.\n"; } } else { foreach $key (sort keys %{$self->Source}) { my $source = $self->Source->{$key}; print "set $key: $source\n"; } } print "\n"; } sub read { my ($self, $line) = @_; if ($line =~ /^set (\w+):\s*(.+)/){ if (defined($self->set($1, $2))) { print "Function $1 set.\n"; } } elsif ($line =~ /^call (\w+)$/){ $self->call($1); print "\n"; } elsif ($line =~ /^show\s*(\w*)/){ if ($1) { $self->show($1); } else { $self->show(); } } elsif ($line =~ /^load\s+([\w.]+)/){ $self->load($1); } elsif ($line =~ /^save\s+([\w.]+)/){ $self->save($1); } else { return 0; } return 1; } sub load { my ($self, $file) = @_; open FILE, "< $file"; while (<FILE>){ chomp; $self->read($_); } close FILE; } sub save { my ($self, $file) = @_; open FILE, "> $file"; foreach my $key (sort keys %{$self->Source}) { my $source = $self->Source->{$key}; print FILE "set $key: $source\n"; } close FILE; } sub roll_maker ($) { my($self, $line, $num, $type); $self = shift; $line = shift; return undef unless $line =~ m/^(\d+)?[dD](\d+|%)/; $num = $1 || 1; $type = $2; $type = 100 if $type eq '%'; return sub { my $result; for( 1 .. $num ) { $result += int (rand $type) + 1; } return $result; } } } ################################################################################################### { package Callback; use Moose; sub choose { my ($self, $text) = @_; my @options = split /;/, $text; #print "the text is $text\n"; my $i = 0; print "\n"; foreach(@options){ $i++; print "$i: $_\n"; } my $answer; until ($answer) { print ">> "; chomp(my $line = <STDIN>); if ($line =~ /^\d+$/) { $answer = $options[$line - 1]; } else { foreach(@options){ if(/^$line/i){ $answer = $_; last; } } } } return $answer; } } ################################################################################################### { package Table; use Moose; my $show_debug2 = 0; has Name => (is => 'rw', isa => 'Str', default => 'Bob'); has 'Rows' => (isa => 'HashRef', is => 'rw', default => sub { {} } ); has 'Columns' => (isa => 'ArrayRef[Str]', is => 'rw', default => sub { my @a; return \@a; } ); has 'ColumnRegexes' => (isa => 'ArrayRef[Str]', is => 'rw', default => sub { my @a; return \@a; } ); sub initialize { my ($self, $table, $text) = @_; $self->Name($table); my ($columns, $column_regexes) = parse_column_header($text); $self->Columns($columns); $self->ColumnRegexes($column_regexes); my %rows; $self->Rows(\%rows); } sub load_line { my ($self, $text) = @_; my ($rowkey, $row) = parse_row_detail($self->Name(), $text, $self->Columns()); $self->Rows->{$rowkey} = $row; } sub cell { my ($self, $row, $column) = @_; return $self->Rows->{$row}->{$column}; } ### pass a comma delimited header line from a table definition ### and get two array references describing the table structure sub parse_column_header { my ($line) = @_; my @fields = split /,/, $line; my $column_number = 0; my @columns; my @regexes; print "reading columns to table: $line\n" if $show_debug2; foreach(@fields){ my $field = $_; $field =~ s/^\s+|\s+$//g; # trim field if($field =~ /^([^\/]*)\/([^\/]*)\//){ $field = $1; $regexes[$column_number] = $2; } $columns[$column_number] = $field; $column_number++; } return (\@columns, \@regexes); } ### pass a table name and a comma delimited header line from a table definition ### and also pass a reference to an array of column names... ### and get the row's key and a hash of detail data sub parse_row_detail { my ($table, $line, $columns) = @_; my @fields = split /,/, $line; print "reading rows to table $table: $line\n" if $show_debug2; my %row; my $column_number = 0; my $rowkey; my $didit; foreach(@fields){ my $field = $_; $field =~ s/^\s+|\s+$//g; # trim field # Need to allow some keys to be '0'! if ($didit){ $row{$columns->[$column_number]} = $field; } else { $rowkey = $field; } $column_number++; $didit = 1; } return ($rowkey, \%row); } sub show_keys { my ($self) = @_; my $key; my $table = $self->Name(); my $rs = $self->Rows(); foreach $key (sort keys %{$rs}){ print "$table row : $key\n"; } my $cs = $self->Columns(); foreach (@{$cs}){ print "$table column : $_\n"; } } sub show { my ($self) = @_; print $self->text(); } sub text { my ($self) = @_; my $text; my $x; $text = "table $self->{Name}:\n"; $text .= "head: "; foreach(@{$self->Columns}){ $text .= ", " if $x; $text .= $_; $x = 1; } $text .= "\n"; foreach my $key (sort keys %{$self->Rows}){ $x = 0; $text .= "row: "; foreach(@{$self->Columns}){ $text .= ", " if $x; $text .= $key unless $x; $text .= $self->cell($key, $_) if $x; $x = 1; } $text .= "\n"; } return $text; } } ################################################################################################### { package Tables; use Moose; has Tables => (isa => 'HashRef[Table]', is => 'rw', default => sub { {} } ); has ReadAsTable => (isa => 'HashRef[Str]', is => 'rw', default => sub { {} } ); has Current => (isa => 'Str', is => 'rw'); has Callback => (isa => 'Callback', is => 'rw', default => sub { Callback->new(); } ); sub cur { my ($self, $table) = @_; $self->Current($table); if ($self->Tables->{$table}) { print "Table $table is the active table.\n"; } else { print "Ready to initialize table $table.\n"; } } sub cell { my ($self, $table, $row, $column) = @_; my $T = $self->Tables->{$table}; if ($T){ my $value = $T->cell($row, $column); $value = $self->private_check_other_tables($table, $row, $column, $T) unless $value; if ($value){ $value = $self->ReadAsTable()->{$value} if $self->ReadAsTable()->{$value}; if ($value =~ /;/){ $value = $self->Callback->choose($value); } return $value; } else { print "Row $row Column $column not found in table $table.\n"; return undef; } } else { print "Table $table does not exist.\n"; return undef; } } ### if value not found, loop through each table ### if a table contains the column *and* the key is contained in the specified table ### then return the foreign value! sub private_check_other_tables { my ($self, $table, $row, $column, $T) = @_; foreach my $key (sort keys %{$self->Tables}) { unless ($key eq $table){ my $OTHER = $self->Tables->{$key}; foreach(@{$OTHER->Columns}){ #print "col-- $_\n"; if ($_ eq $column){ ### does the first column match the table we're looking at? foreach(@{$T->Columns}){ #print "mycol-- $_\n"; if ($_ eq $OTHER->Columns->[0]){ my $foreign_key = $T->cell($row, $_); my $foreign_cell = $OTHER->cell($foreign_key, $column); #print "Found it! (foreign key $foreign_key) (value $foreign_cell)\n"; return $foreign_cell; #last; ### this doesn't really *last* it. } } } } } } } sub show { my ($self, $key) = @_; print $self->text($key); } sub text { my ($self, $key) = @_; my $text; $text = "\n\n"; if ($key){ $text .= $self->Tables->{$key}->text(); } else { foreach $key (sort keys %{$self->Tables}) { $text .= $self->Tables->{$key}->text(); $text .= "\n\n"; } } $text .= "\n"; return $text; } sub read { my ($self, $line) = @_; my $table = $self->Current(); if ($line =~ /^table (\w+):?/){ $self->cur($1); #print "Table $1 is now active.\n"; } elsif ($line =~ /head: (.*)/) { my $T = new Table(); $T->initialize($table, $1); $self->Tables->{$table} = $T; print "Table $table initialized.\n"; } elsif ($line =~ /row: (.*)/) { my $T = $self->Tables->{$table}; if ($T) { $T->load_line($1); print "Table $table line loaded.\n"; } else { print "Table not found.\n"; } } elsif ($line =~ /^(\w+)~(\w+)~(\w+)/){ my $cell = $self->cell($1, $2, $3); if (defined($cell)) { print "$cell\n"; } else { print "Cell not found.\n"; } } elsif ($line =~ /^readas ([\w\s]+)\s+->\s+(.+)/){ $self->ReadAsTable->{$1} = $2; print "Reading '$1' as '$2'.\n"; } elsif ($line =~ /^show\s*(\w*)/){ if ($1){ if ($self->Tables->{$1}){ $self->show($1); } else { print "Table not found.\n"; } } else { $self->show(); } } elsif ($line =~ /^show/){ $self->show(); } elsif ($line =~ /^load\s+([\w.]+)/){ $self->load($1); } elsif ($line =~ /^save\s+([\w.]+)/){ $self->save($1); } else { return 0; } return 1; } ### same routine as in Functions class sub load { my ($self, $file) = @_; open FILE, "< $file"; while (<FILE>){ chomp; $self->read($_); } close FILE; } sub save { my ($self, $file) = @_; open FILE, "> $file"; print FILE $self->text(); close FILE; } } ################################################################################################### { package Environment; use Moose; has Functions => (isa => 'Functions', is => 'rw', default => sub { new Functions() } ); has Tables => (isa => 'Tables', is => 'rw', default => sub { new Tables() } ); has Settings => (isa => 'HashRef', is => 'rw', default => sub { my %hash; $hash{WorkingWith} = 'Functions'; return \%hash; } ); ### This should be in the constuctor sub hack { my ($self) = @_; $self->Functions->Environment($self); } sub read { my ($self, $line) = @_; if ($line =~ /^work functions/){ print "Working with functions.\n"; $self->Settings->{WorkingWith} = 'Functions'; return; } if ($line =~ /^work tables/){ print "Working with tables.\n"; $self->Settings->{WorkingWith} = 'Tables'; return; } ### Refactor Me Sometime if ($self->Settings->{WorkingWith} eq 'Tables'){ unless ($self->Tables->read($line)){ unless ($self->Functions->read($line)){ print "Unknown command.\n"; } } } else { unless ($self->Functions->read($line)){ unless ($self->Tables->read($line)){ print "Unknown command.\n"; } } } } } ################################################################################################### { package Test; use strict; use warnings; my $T = new Tables(); my $F = new Functions(); my $E = new Environment(Tables => $T, Functions => $F); print "\n>> "; while(<STDIN>){ chomp; $E->read($_); print "\n>> " } } ###################################################################################################