Archive for the ‘Regular Expressions’ Category

Using Regex Functions in Excel

October 17, 2008

This is a public service message.

The other day some folks in the office were asking how to tell if a certain character appeared in a cell in Excel.  I was shocked when the traditional VB function failed miserably.  The call of instr(1, “blub”, “b”) worked fine from the immediate window… but =instr(1,A1,”b”) would die in a cell.  Was my face red!  The computer “genius” sure looked like a chump just then.

Never again, I tell you.  Never again.

Allright, so you want to be able to do this:

Matching with regex's in Excel

Matching with regex

To make it happen, hit alt-F11 to go to the code screen.  Click “Insert | Module” to add a new code module.  Then click”Tools | References” so you can check the “Microsoft VBScript Regular Expressions 5.5” box that will make this work:

set those references!

set those references!

Now paste in this code in your module and you’re good to go!

Function M(Value As String, Pattern As String, Optional IgnoreCase As Boolean = False)
    Dim r As New VBScript_RegExp_55.RegExp
    r.Pattern = Pattern
    r.IgnoreCase = IgnoreCase
    If r.Test(Value) Then
        M = "Matches '" & Pattern & "'"
        M = ""
    End If
End Function
Function StartsWith(Value As String, Pattern As String, Optional IgnoreCase As Boolean = False)
    Dim r As New VBScript_RegExp_55.RegExp
    r.Pattern = "^" & Pattern
    r.IgnoreCase = IgnoreCase
    If r.Test(Value) Then
        StartsWith = "Starts with '" & Pattern & "'"
        StartsWith = ""
    End If
End Function

Function EndsWith(Value As String, Pattern As String, Optional IgnoreCase As Boolean = False)
    Dim r As New VBScript_RegExp_55.RegExp
    r.Pattern = Pattern & "$"
    r.IgnoreCase = IgnoreCase
    If r.Test(Value) Then
        EndsWith = "Ends with '" & Pattern & "'"
        EndsWith = ""
    End If
End Function

Function S(Value As String, Pattern As String, ReplaceWith As String, Optional IgnoreCase As Boolean = False)
    Dim r As New VBScript_RegExp_55.RegExp
    r.Pattern = Pattern
    r.IgnoreCase = IgnoreCase
    r.Global = True
    S = r.Replace(Value, ReplaceWith)
End Function

Lisp at Work: a macro for managing replace-regexp routines

November 5, 2007

The following commands have all become essential in my day to day Emacs usage:

C-M-% (query-replace-regexp)
C-x ESC ESC (repeat-complex-command)
M-x re-builder
C-x C-e (eval-last-sexp)

I use them to define regular expressions for use in editing text documents. Over time, I begin to collect quite a few of these, so it makes sense to think more carefully about my key bindings: normally I just use a Function key, but there’s other stuff I want to hot key from there now…. The Emacs manual says that the combination of C-c followed by a plain letter is reserved for the user, so I can put my custom routines there with a single letter mnemonic to help me remember what’s what.

Here’s a macro (with an example of its use) for setting up these sorts of text processing routines:

(defmacro defreplacer (name description search-for replace-with chord) 
     (defun ,name (n) 
       (interactive "p") 
       (query-replace-regexp ,search-for 
                             (if (and transient-mark-mode mark-active) 
                             (if (and transient-mark-mode mark-active) 
     (global-set-key (kbd ,chord) ',name)))   

(defreplacer pull-text-from-semi-colons 
  "Remove text from between two semi-colon signs."
  "[ ]*;\\([a-z]*\\);[ ]*" ; use double back slash to 'escape' in quotes 
  "C-c ;")

In the example above, we’re replacing lower case text inside a pair of semi-colons (and surrounded with any number of spaces on each side) with just the lower case text. The command chord to trigger that replace routine is “C-c ;”. This is a pointlessly simple example, but it should give you the basic idea of how to use the macro.

Does the “defmacro” really do much more for us than “defun” would otherwise do? The main savings you get with the macro is that it defines the key binding at the same time that the replacement function is defined– having a naming type there caused me a minor headache when I was wondering why my hot-keys weren’t working once. With “defmacro” you eliminate the chance of this kind of confusion occurring. On the other hand, if you change the definition of the macro after a file has been loaded, you will not change the operation of the existing functions– the macro only affects the environment at compile time. So there are trade offs either way. In this case I went with a macro because once I get my regular expression from re-builder, I wanted to be able to write the code for everything as quickly as possible. With “defreplacer”, all I needed was four arguments and I was good to go.

My First Emacs Hack: A Jumpstart Guide

October 11, 2007

While it’s nice to have a function key to bring up an IDE so that we can automate our applications, we see that Emacs goes even further by allowing code to be edited and run from any window. There is no special, separate application that is used for extending or modifying the program. This is strange and unusual– and very fun. This short tutorial will show how to use one of Emacs’ most versatile commands, how to retrieve the code for executing complex commands, how to put them into function definitions, and how to attach them to hot-keys. (This more comprehensive introduction is very good if you need extra information.)

A basic out-of-the-box feature of Emacs is the C-M-% query-replace-regexp command. You can use this to search and replace patterns of text. I try to explain this to people and it doesn’t quite register: they think, “oh yeah, I’ve got find/replace built into my IDE, what are you talking about?!” (Evidently, you can get a Master’s degree in Computer Science and not know what a regular expression is….) I’m not talking about searching and replacing not just specific strings, but patterns as well.

Suppose you have inherited a project that has lots of data hard coded into visual basic files. You might have something that looks like this:

strOptions = “OptionOne, OptionTwo, OptionThree,”
strOptions = strOptions & “OptionFour,” ‘ Don’t forget this one!
strOptions = strOptions & “OptionFive, OptionSix”

How can we get rid of all the code and get ourselves a nice simple list of options?

What we first want to do is search for lines that have a series of upper and lowercase letters, equal signs, and/or ampersands… followed by upper and lower case letters and commas inside a pair of double quotes… followed, perhaps, by a code comment. We could search for that specific pattern of text and then replace it with just the stuff that’s inside the double quotes.

To do that, just use C-M-% to search for this:

[A-Za-z=& ]*”/([A-Za-z ,]*/)”[‘ A-Za-z!.,()@#$%^&*]*

And replace it with this:


The [A-Za-z=& ]* part represents the first part of the pattern. The double quotes represent (surprise!) double quotes. The [A-Za-z ,]* represents the stuff inside the double quotes. The /( and /) markers are used to tag regions of text so that we can refer to them later. The [‘ A-Za-z!.,()@#$%^&*]* represents the comment. The /1 is a reference back to the text that was between the double quotes.

After we apply query-replace-regexp with these two arguments, we should end up with this:

OptionOne, OptionTwo, OptionThree,
OptionFive, OptionSix

(Note that to apply the replace to a section of highlighted text, press the space bar. To apply it to every line after the cursor, press the “!” key. Note that the search begins at the place that the cursor is….)

To get rid of the commas and get a nice simple list of options, we’ll need to know a little Emacs trick. To enter a carriage return as part of an argument to a command like query-replace-regexp, you need to hit C-q C-j. (I’ll use a ~ character to denote that key chord.) So to remove the commas and line up our options we, search for:

,[ ~]*

And replace it with “nothing.” (Just hit enter when prompted for what to replace it with.) Running the command should produce the following:


Now… you may want to reuse these sorts of commands that have complex arguments… or you may want to be able to save and modify them some more when you find out they don’t work quite like you want them to. To see these commands as Emacs sees them, hit C-x esc esc. This will display in the mini-buffer the last complex command you’ve entered. You can then hit enter to rerun it or cut and paste it to somewhere else.

The skeleton below shows a basic outline for an elisp function. You can paste complex command “stolen” with C-x esc esc into the place marked <<<PUT YOUR CODE HERE>>>:

(defun your-function (n)
“Put your function’s documentation here”
(interactive “p”) ; n = value passed by the C-u “universal argument”

The “interactive p” part means that the command can be run with from the mini-buffer by typing M-x, entering the name of the function, and then hitting “enter”. For that to work, though, you must first get the function into the environment. You can do this by moving the cursor just to the right of the very last parenthesis in your function definition. From there hit C-x C-e. The name of your function should appear in the mini-buffer when you do this. In fact, you can evaluate any lisp expression that way. Type “(+ 30 12)” and hit C-x C-e after the final parentheses and you should see “42” appear in the mini-buffer!

To wire in a hot-key, evaluate a function like this:

(global-set-key [f8 ‘your-function)

You can collect your code into a text file marked “.el”. Files with that extender will be highlighted properly when you load them into the editor. To execute all of the code in a file, type M-x load-file and then enter the path and/or filename of the code you want running in your environment. Here’s an example file with the code from our tutorial set to run with the function keys f3 and f4.

With regular expressions and the above commands, you can automate just about any complicated text manipulation. In some cases, it can take a while to figure out a good regex to solve a problem, but that’s generally more fun than editing loads of text files by hand.


Using “Query replace regexp” to remove blank lines from a file in Emacs

June 7, 2007

This guy took years just to discover how to use a carriage return in an EMACS regex search. Yep… just do C-M-s to fire up the search… then ^ C-Q C-J and you’re rocking. Nifty.  For some reason I didn’t think it would work for a while, but the same trick does indeed work from regex search and replace (C-M-%) as well.  How long does it take me to figure out how to apply this trick to an editing problem? It ended up taking… well… more than an hour. (Grumble grumble….) Never mind how long it took for me to realize that this was the key to solving the problem! (Grumble grumble grumble….)

In the hopes of preventing similar pains in other peoples lives, I will explain my discoveries….

What I wanted to do is delete all of the blank lines in a file. C-c C-o just kills the next chunk of whitespace that follows the line you’re in. I’d like it do the entire file….  (Picky me.)  Many a google a search will lead you to bark up the wrong tree, but Derek Slager’s cool screencast was the place to go all along.

Drop into regex search and replace with C-M-% and for the “where” part, type C-q C-j C-q C-j +. (That’s two carriage returns followed by a plus sign.) For the “with” part type C-q C-j and press return. (That’d be a single carriage return.) What we’re doing is replacing each chunk of 2 or more carriage returns with a single carriage return.

An alternative solution would be to replace ^ C-q C-j with “nothing”.  (The caret matches all lines that match the text that follows it.)

Pretty cool!

I forgot what I was trying to do before I got sucked into this, but now that I’ve solved this minor gotcha there’s nothing to prevent me from mastering Emacs in another ten years or so! Yee-ha!

Too bad I’ve got carpal tunel syndrome….  (Oh the pain!)