File Find-Replace Using Regular Expressions
Replaceregex command line program finds a given string in a batch of files, replaces them with another string and places the output files in a separate directory if such directory is specified. The file size must be
small enough to fit in available memory.
Command line parameters are:
-srcdir adirectory directory of the original files. By default, this is the current directory.
-destdir adirectory destination directory to save modified files. By default, this is the current directory. The source files will be over written.
-find atext this is the simple text to replace.
-regex
atext this is the regular expression pattern to replace. You can use capturing groups as () and you can refer to these groups using $1, $2 etc syntax in the replacement text. You can also have escaped characters like \t, \r, \n.
-replace atext replace the text that was found with this one. Captured regular expression groups can be used as $1, $2 and so on.
-fname apattern the file name pattern to search, for example *.*
-casesens presense of this flag means search is case sensitive.
-quotes atext is used to represent double quotes in the -find, -regex and -replace parameters. This is to help escaping quotes inside your -find, -regex and -replace parameters.
-tab
atext is used to represent tab (\t) character in the -find and -replace parameters. This is to help escaping tab inside your -find and -replace parameters. Use regular expression escapes for -regex.
-cr
atext is used to represent carriage return (\r) character in the -find and -replace parameters. This is to help escaping carriage return inside your -find and -replace parameters. Use regular expression escapes for -regex.
-lf
atext is used to represent new line character (\n) in the -find and -replace parameters. This is to help escaping new line inside your find parameters. Use regular expression escapes for -regex.
-r recursively process sub directories.
ReplaceRegex.exe is part of Bestcode File Utilities tool set.
Case Study - Fixing Html img tag in all Html files using ReplaceRegex and Regular Expressions
An example of using ReplaceRegex tool is to automatically edit html files to fix img tags to point to correct image file locations. In our case, we are using a tool that is a WYSIWYG WebSite editor NetObjects Fusion.
This tool creates html img tags that points to the current directory (or another relative directory) such as <img src=�./button1.jpg�>. This assumes that the image file is in the same folder as the html page,
for example �posters.php�. We later use this posters.php file to dynamically serve thousands of pages as in this art and photography posters website.
There are categories such as:
posters/entertainment/movies posters/entertainment/music posters/entertainment/sports/traditional-sports/basketball posters/entertainment/sports/traditional-sports/football posters/entertainment/sports/traditional-sports/baseball
and so on.
Here, each one of these pages are in fact based on 1 script, posters.php. This posters.php fetches the most popular posters from an endless database and displays them in a beautiful way. The only problem is that, if
the posters.php contains button images like <img src=�./button1.jpg�>, the browser will think that for the above 5 categories, there are 5 different button images (each one in a different directory) where as
they are all in fact the same image that are being mapped via url rewrite of apache even though their url�s do look different to the browser:
posters/entertainment/movies/button1.jpg posters/entertainment/music/button1.jpg posters/entertainment/sports/traditional-sports/basketball/button1.jpg
posters/entertainment/sports/traditional-sports/football/button1.jpg posters/entertainment/sports/traditional-sports/baseball/button1.jpg
When the browser sees that each image url is different, it does not cache them on the client side. This creates extra load to serve the same image repeatedly. To reduce server load, we send http header to enable
cache and we fix our php/html files to point to root directory for all image resources.
/button1.jpg
So, when all categories point to /button1.jpg instead of categoryname/button1.jpg, browser is able to cache the same image resulting in major bandwidth savings.
We achieve this fix by using ReplaceRegex.exe. Before we publish our files to the web server, we run our tool with the following command:
ReplaceRegex.exe -r -srcdir C:\nof_publish_viewposters -regex "<img src=qq\./(.*)qq" -replace "<img src=qq/$1qq" -fname *.php -quotes qq
Note that our regular expression to search for is: <img src=�\./(.*)�
qq is used in place of quotes � due to Windows command line restrictions.
-quotes qq specifies that.
Inside one of our files, we see that following line:
<TD><img src="./clearpixel.gif" WIDTH=26 HEIGHT=1 BORDER=0 ALT=""></TD>
was changed to:
<TD><img src="/clearpixel.gif" WIDTH=26 HEIGHT=1 BORDER=0 ALT=""></TD>
Here is screen shot:
If you have many files and if you need to repeat this kind of regular expression find replace, then ReplaceRegex.exe is the ideal timesaver friend you need.
Case Study 2 - Add rel = nofollow to all html
Nowadays search engines ignore links that contain nofollow attribute. This prevents the so called Google juice leak from your html page to the page that
you link to. This was designed to reduce link spamming especially in blogs. Some people think that it is good practice to use this most of the time to
increase your web pages Google page rank. So, how do you achieve this if you have 1000 static html pages? It would take forever to search each one and fix
the links to contain this attribute. A simpler and smart way is to use ReplaceRegex tool and fix the links automatically.
Here is the regular expression to search for:
<a[ ]+href[ ]*=[ ]*"http:\/\/(.*?)".*?>
and here is replacement:
<a href="http://$1" rel="nofollow" target="_blank">
On the command line, we type it as below:
ReplaceRegex enables us to find all http links inside our files and convert them to the form that we like with the help of powerful regular expression syntax.
Regular expression backreferences makes it possible to capture groups from the found text and use them in our replacement text with syntax like $1, $2.
ReplaceRegex.exe is part of Bestcode File Utilities tool set.
|