File UtilitiesTable of ContentsRenameRegex
ReplaceRegex

File Find-Replace Using Regular Expressions

Replaceregex command line program finds a given string in a batch of files, replaces them with another string and places the output files in a separate directory if such directory is specified. The file size must be small enough to fit in available memory.

Command line parameters are:

-srcdir   adirectory    directory of the original files. By default, this is the current directory.

-destdir   adirectory    destination directory to save modified files. By default, this is the current directory. The source files will be over written.

-find     atext        this is the simple text to replace.

-regex     atext       this is the regular expression pattern to replace. You can use capturing groups as () and you can refer to these groups using $1, $2 etc syntax in the replacement text. You can also have escaped characters like \t, \r, \n.

-replace   atext       replace the text that was found with this one. Captured regular expression groups can be used as $1, $2 and so on.

-fname     apattern    the file name pattern to search, for example *.*

-casesens presense of this flag means search is case sensitive.

-quotes   atext is used to represent double quotes in the -find, -regex and -replace parameters. This is to help escaping quotes inside your -find, -regex and -replace parameters.

-tab       atext is used to represent tab (\t) character in the -find and -replace parameters. This is to help escaping tab inside your -find and -replace parameters. Use regular expression escapes for -regex.

-cr       atext is used to represent carriage return (\r) character in the -find and -replace parameters. This is to help escaping carriage return inside your -find and -replace parameters. Use regular expression escapes for -regex.

-lf       atext is used to represent new line character (\n) in the -find and -replace parameters. This is to help escaping new line inside your find parameters. Use regular expression escapes for -regex.

-r         recursively process sub directories.

ReplaceRegex.exe is part of Bestcode File Utilities tool set.

Case Study - Fixing Html img tag in all Html files using ReplaceRegex and Regular Expressions

An example of using ReplaceRegex tool is to automatically edit html files to fix img tags to point to correct image file locations. In our case, we are using a tool that is a WYSIWYG WebSite editor NetObjects Fusion. This tool creates html img tags that points to the current directory (or another relative directory) such as <img src=�./button1.jpg�>. This assumes that the image file is in the same folder as the html page, for example �posters.php�. We later use this posters.php file to dynamically serve thousands of pages as in this art and photography posters website.

There are categories such as:

posters/entertainment/movies
posters/entertainment/music
posters/entertainment/sports/traditional-sports/basketball
posters/entertainment/sports/traditional-sports/football
posters/entertainment/sports/traditional-sports/baseball
 

and so on.

Here, each one of these pages are in fact based on 1 script, posters.php. This posters.php fetches the most popular posters from an endless database and displays them in a beautiful way. The only problem is that, if the posters.php contains button images like <img src=�./button1.jpg�>, the browser will think that for the above 5 categories, there are 5 different button images (each one in a different directory) where as they are all in fact the same image that are being mapped via url rewrite of apache even though their url�s do look different to the browser:

posters/entertainment/movies/button1.jpg
posters/entertainment/music/button1.jpg
posters/entertainment/sports/traditional-sports/basketball/button1.jpg
posters/entertainment/sports/traditional-sports/football/button1.jpg
posters/entertainment/sports/traditional-sports/baseball/button1.jpg
 

When the browser sees that each image url is different, it does not cache them on the client side. This creates extra load to serve the same image repeatedly. To reduce server load, we send http header to enable cache and we fix our php/html files to point to root directory for all image resources.

/button1.jpg

So, when all categories point to /button1.jpg instead of categoryname/button1.jpg, browser is able to cache the same image resulting in major bandwidth savings.

We achieve this fix by using ReplaceRegex.exe. Before we publish our files to the web server, we run our tool with the following command:

ReplaceRegex.exe -r -srcdir C:\nof_publish_viewposters -regex "<img src=qq\./(.*)qq" -replace "<img src=qq/$1qq" -fname *.php -quotes qq

Note that our regular expression to search for is: <img src=�\./(.*)�

qq is used in place of quotes � due to Windows command line restrictions.

-quotes qq specifies that.

Inside one of our files, we see that following line:

<TD><img src="./clearpixel.gif" WIDTH=26 HEIGHT=1 BORDER=0 ALT=""></TD>

was changed to:

<TD><img src="/clearpixel.gif" WIDTH=26 HEIGHT=1 BORDER=0 ALT=""></TD>

Here is screen shot:

Regular Expression search replace in file

If you have many files and if you need to repeat this kind of regular expression find replace, then ReplaceRegex.exe is the ideal timesaver friend you need.

Case Study 2 - Add rel = nofollow to all html

Nowadays search engines ignore links that contain nofollow attribute. This prevents the so called Google juice leak from your html page to the page that you link to. This was designed to reduce link spamming especially in blogs. Some people think that it is good practice to use this most of the time to increase your web pages Google page rank. So, how do you achieve this if you have 1000 static html pages? It would take forever to search each one and fix the links to contain this attribute. A simpler and smart way is to use ReplaceRegex tool and fix the links automatically.

Here is the regular expression to search for:

<a[ ]+href[ ]*=[ ]*"http:\/\/(.*?)".*?>

and here is replacement:

<a href="http://$1" rel="nofollow" target="_blank">

On the command line, we type it as below:

Regular Expression find replace in file

ReplaceRegex enables us to find all http links inside our files and convert them to the form that we like with the help of powerful regular expression syntax. Regular expression backreferences makes it possible to capture groups from the found text and use them in our replacement text with syntax like $1, $2.

ReplaceRegex.exe is part of Bestcode File Utilities tool set.

 

webmaster@gobestcode.com