Wednesday, April 29, 2009

Global Search and Replace with Perl

I've used this article for reference many times before, but it is no longer available. I've reproduced it from an archive.org copy.

(source: http://hacks.oreilly.com/pub/h/73)



There are a couple of switches that make Perl a very useful command-line editing tool. Learn these switches, and you too can learn how to mumble magic Perl one-liners to confound your friends (and frighten your project manager).

The first is -e. Give Perl a -e followed by a line of code, and it will run it as if it were an ordinary Perl script:

rob@catlin:~$ perl -e 'print "Hi, Ma!\n"' 
Hi, Ma!


Note that a trailing ; isn't needed on one-liners. It's generally a good idea to wrap your line in single quotes to prevent the shell from attempting to interpret special characters (like the ! and \ above).

The next switch is a little more complicated, it but becomes second nature once you start using it: -p. From perldoc perlrun:

-p causes Perl to assume the following loop around your
program, which makes it iterate over filename arguments somewhat like sed:

LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}


The line in question ($_) is automatically printed on every iteration of the loop. If you combine -p and -e, you get a one-liner that iterates over every file on the command line or on the text fed to it on STDIN. For example, here's a complicated cat command:

rob@catlin:~$ perl -pe 1 /etc/hosts 
#
# hosts This file describes a number of hostname-to-address
# mappings for the TCP/IP subsystem.
#

# For loopbacking.
127.0.0.1 localhost


The 1 is just a return code (as if you entered 1; in a Perl script, which is equivalent to return 1;). Since the lines are printed automatically, we don't really need the program we specify with -e to do anything.

Where it gets interesting is in providing a bit of code to manipulate the current line before it gets printed. For example, suppose you wanted to append the local machine name to the localhost line:

rob@catlin:~$ perl -pe 's/localhost/localhost $ENV{HOSTNAME}/' /etc/hosts 
#
# hosts This file describes a number of hostname-to-address
# mappings for the TCP/IP subsystem.
#

# For loopbacking.
127.0.0.1 localhost catlin.nocat.net


or maybe you'd like to manipulate your inetd settings:

rob@caligula:~$ perl -pe 's/^(\s+)?(telnet|shell|login|exec)/# $2/' \  /etc/inetd.conf


That will print the contents of /etc/inetd.conf to STDOUT, commenting out any uncommented telnet, shell, login, or exec lines along the way. Naturally, we could redirect that back out to a file, but if we just want to edit a file in place, there's a better way: the -i switch.

-i lets you edit files in place. So, to comment out all of the above lines in /etc/inetd.conf, you might try:

root@catlin:~# perl -pi -e 's/^(\s+)?(telnet|shell|login|exec)/# $2/' /etc/inetd.conf


or better yet:

root@catlin:~# perl -pi.orig -e 's/^(\s+)?(telnet|shell|login|exec)/# $2/' /etc/inetd.conf


The second example will backup /etc/inetd.conf to /etc/inetd.conf.orig before changing the original. Don't forget to HUP inetd to make your changes take.

It's just as easy to edit multiple files in place at the same time. You can specify any number of files (or wildcards) on the command line:

rob@catlin:~$ perl -pi.bak -e 's/bgcolor=#ffffff/bgcolor=#000000/i' *.html


This will change the background color of all html pages in the current directory from white to black. Don't forget that trailing i to make the match case insensitive (to match bgcolor=#FFFFFF or even BGColor=#FfFffF).

What if you're in the middle of working on a CVS project, and need to change the CVS server that you'd like to commit to? It's easy, if you pipe the output of a find through an xargs running a perl -pi -e:

schuyler@ganesh:~/nocat$ find -name Root | xargs perl -pi -e 
's/cvs.oldserver.com/cvs.newserver.org/g'


Then reset your $CVSROOT and do your CVS check in as normal, and your project will automagically end up checked into cvs.newserver.org.

Using Perl from the command line can help you do some powerful transformations on the fly. Study your regular expressions, and use it wisely, and it can save piles of hand edits.

No comments: