Non-capturing groups in a regexp
Bridge to engine room, geek factor nine. Those of you who read this for musings about the world, stop reading now.
Alex at work has just alerted me to the existence of non-capturing groups in regular expressions. I had no idea these existed, and they’re pretty useful if you’re doing RE matching. If you’re trying to match a string which might be “fish, chips and ketchup”, might be “fish, chips and peas”, and might not contain the “and chips” at all, and what you care about is what’s last on the list (the “peas” or “ketchup”) then I’d have used a regexp like /fish(, chips)? and (.*)$/. Matching that against “fish, chips and peas” will give you back a three-item tuple, ("fish, chips and peas", ", chips", "peas"). (Test with JavaScript) You need the brackets around “, chips” in the regexp because you want to treat it as a group. However, it ends up in the results, and that’s really irritating.
Now I know about non-capturing groups, I’d do this: /fish(?:, chips)? and (.*)$/. The ?: after the opening bracket of the group means “don’t capture this group”. So now the results you get back are ("fish, chips and peas", "peas") — the chips, which we don’t care about, are not mentioned! (Test with JavaScript)
Another useful little trick to add to my toolbox. Cheers, Alex. Everyone who is reading this and thinking “I knew about this ages ago”, why didn’t you tell me?
Because you beat us when the obvious things are pointed out to you.
20 minutes later
chuck: ha! Touche. You are right to bring that up, and well put. What can I say other than: I won’t beat you for pointing out stuff like this.
21 minutes later
Heh, that’s geek factor seven at best. Geek factor nine doesn’t start before mastering lookahead and lookbehind regular expressions.
2 hours later
I want this every time i use REs, and I just assumed such a thing didn’t exist. This is definitely a contender for the most useful thing I’ve learned this month.
2 hours later
I use this all the time, and have done so for years. And so does everyone once they find out about it.
In fact, Perl 6 regexps acknowledges that this is such a common case that (?:) does not make any sense huffman-wise and shortens it to (). Or at least that’s how I remember it. Capturing groups aren’t shafted all the way down to (?:) though but something smarter.
I don’t really remember, as Perl 6 development seemed to die years ago. Too bad, because the design had a lot of great thinkings like that, which other languages could have picked up even if you don’t like Perl - it’s always been one of the leader languages in that way.
2 hours later
http://www.ilovejackdaniels.com/regular_expressions_cheat_sheet.pdf
It’s only one page, we assumed that your attention span is big enough for 1 page.
3 hours later
eTM, thanks for the cheatsheet! very useful indeed
19 hours later
Perl 6 actually still uses ( ) as a capturing group - non-capturing groups got promoted to [ ], which of course means character classes lost that prime position, but they were only demoted to which fits quite nicely with the other things you can put inside which all seem to be subrule-ish.
Also, Perl 6 development never died, and I don’t see why people are saying it seems to have died when it’s currently going faster than ever. Parrot’s in pretty good shape, and Rakudo, the Perl 6 compiler for Parrot, is progressing quite rapidly now that most of the language is defined and some nice people have donated funding for part-time development.
41 hours later
But its not longer cool. Therefore it died.
2 days later
I spent a few hours reading this all the way through. Need to practice it, but I’m at least at Geek Factor 8 now.
http://www.regular-expressions.info/tutorial.html
12 weeks later