GREP -- Find Regular Expressions in Files
User Guide

program release 6.9 of 22 December 2001
Copyright © 1986-2001 by Stan Brown, Oak Road Systems

GREP is a filter that searches input files, or the standard input, for lines that contain matches for one or more patterns called regular expressions and displays those matching lines. GREP can also search binary files and display records or buffers that contain matches.

This document is a user guide, providing an overview of GREP. Details of the command-line options and the use of regular expressions are in the reference manual. (A full revision history is also available.)

This user guide is sometimes revised between software releases. You may want to check for revisions at http://oakroadsystems.com/sharware/grep.htm.


Contents


       Why GREP?
Getting Started
       System Requirements
Installation
Evaluation, License, and Warranty
Uninstall
User Instructions
Input Files
File and Path Names
Binary Files and Text Files
Wild Card Expansion
Subdirectory Searches
Options
How to Specify Options
Environment Variable
Regular Expressions (Regexes)
Regexes by Example
Regex Language Summary
Return Values (ERRORLEVEL)
Limitations
Troubleshooting and How-to
What's New in 6.1?

 

Why GREP?


The DOS filter FIND is useful for finding a given string in one or more files. But what if you want to find the word the in caps or lower case, without also finding other, There, then, and so on? You don't really want to search for a specific string. Rather, what you're looking for is a regular expression or regex, namely the preceded and followed by something other than a letter. GREP to the rescue!

GREP takes one or more regexes, matches them against input files, and reports any lines that match.

GREP combines most features of UNIX grep, egrep, and fgrep. GREP has many other advantages over FIND besides using regular expressions:


Getting Started


System Requirements

The 16-bit version, GREP16, runs under DOS 2.0 or higher, including a DOS box under Windows. The 32-bit version, GREP32, requires a DOS box under Windows 98, Win95, or Win NT 4.0. (I fully expect it to run in Windows ME and Windows 2000, but have not tested it.)

The two executables operate the same and have the same features, except that you need GREP32 for long filenames, for extended regular expressions, and for character mapping. If you typically run GREP in a DOS box under Windows 9x or later or NT, GREP32 is the one you want.

Installation

There is no special installation procedure. Simply move GREP16.EXE, GREP32.EXE, or both to any convenient directory in your path.

A demo file is included; just type DEMO after unZIPping the archive.

You may wish to rename the executable you use more often to the simpler GREP.EXE. All the examples in this user guide will assume you've done that. Otherwise, just substitute GREP16 or GREP32 wherever you see GREP in the examples.

Evaluation, License, and Warranty

GREP is shareware. If you use it past a 30-day evaluation period, you are morally and legally bound to register and pay for it. Please see the file LICENSE.TXT for full details, including support and warranty information.

When you run the unregistered version, it displays a three-line reminder to register. But there is no time delay and you don't have to press any extra keys.

The registered version offers these improvements over the evaluation version:

Uninstall

There is no special uninstall procedure; simply delete the GREP files. GREP doesn't write any "stealth" files or modify the Windows registry.


User Instructions


For a summary of operating instructions, type

        grep /? | more
The full command form is either of
        grep [options] [regex] [<inputfile] [>outputfile]
        grep [options] [regex] inputfiles [>outputfile]
In the first form, GREP is a filter, taking its input from the standard input (most likely piped from some other command). In the second form, GREP takes its input from any number of input files, possibly specified with paths and wild cards.

In both forms, the optional outputfile will receive the matching lines (or other output, depending on the output options). For output to the screen, omit > and outputfile.

regex is a regular expression; see Regular Expressions below. A regex is normally required on the command line; however, if you use the /F option, regexes will be taken from a file or the keyboard instead of the command line.

The command-line options, and the values returned through ERRORLEVEL, are explained below. You can actually put options anywhere on the command line, not just before the regex. All the options are processed before any files are scanned, so it doesn't matter whether a given option comes before the files, after the files, or among the file specs.

Example:

        grep /I pic[t\s] \proj\*.cob >prn
will examine every COBOL source file in the PROJ directory and print every line that contains a picture clause ("pic" followed by either "t" or a space) in caps or lower case (the /I option).
        grep /I /S pic[t\s] \*.cob >prn
will examine every COBOL source file in all directories on the current disk (the /S option).
 

Input Files


As mentioned earlier, you can use GREP as a standard filter, either by piping from another command with "|" or by redirecting input from a file with "<". If you don't specify any regular files or any redirection for input, GREP will simply wait. You can end the GREP run by pressing Control-Z and Enter, or Control-Break.

GREP can read text files just fine, whether lines are separated by the DOS-style carriage return plus line feed, the UNIX-style line feed only or the Mac-style carriage return only. See below for binary files.

At the end of execution, GREP will display a warning message for any filespec that didn't match any files. That warning is suppressed like the rest if you specify the /Q3 option.

File and Path Names

When telling GREP to read input files, specify them in the normal way with paths and wild cards. For example:

        grep regex ..\*.c *.h d:\dir1\dir2\orich?.htm
The separator between directories in a path can be a backslash "\" or forward slash "/".

If input file names or paths contains spaces, you must enclose them in double quotes. This is a DOS restriction, not a feature only of GREP. For instance,

        grep regex c:\Program Files\My Office\*
contains three file specs, namely c:\Program, Files\My, and Office\*. That's probably not what you meant. Double quotes preserve your intended meaning:
        grep regex "c:\Program Files\My Office\*"

GREP thinks that anything that starts with a hyphen is an option. So if a file name starts with a hyphen, use the standard DOS syntax for "current directory". For example, to search file -omega.txt, type

        grep regex ./-omega.txt
or
        grep regex .\-omega.txt

Binary Files and Text Files

Upgrade note: Up through release 6.0, GREP offered just one /R option. That binary mode mixed elements of free-format binary and record-oriented binary. Now you choose a specific binary mode with the /R2 or /R3 option, or use the /R-1 or /R-2 option to tell GREP to choose based on file contents.

GREP was originally written with plain text files in mind, but you can also use it quite well with binary files. What's the difference?

DOS doesn't mark a file as text or binary; the program that reads the file just has to know. GREP "knows" files are binary when you tell it via the /R2 or /R3 option; otherwise it treats input files as text. If GREP reads a file in text mode but the file is actually binary, some matches may be missed. It's important, therefore, to scan binary files in binary mode.

Registered users can use the /R-1 or /R-2 option to have GREP examine each file and decide whether it's text or free-form binary; I recommend /R-1. Please see the /R option description for details on how GREP decides.

Here's a comparison of the three ways GREP can read input files.

line-oriented text
(/R0)
record-oriented binary
(/R2)
free-form binary
(/R3)
(/R0) The file is read a line at a time. Any line bigger than the /W option value is read in chunks with each chunk treated as a line. (/R2) The file is read a record at a time; the record length is given by the /W option. (/R3) The file is read in overlapping half-buffers. The /W option gives the buffer size; see that option description for recommended buffer size.
(/R0) A line ends with a carriage return or line feed (ASCII 13 or 10) or both. (/R2, /R3) ASCII 13 and 10 have no special meaning.
(/R0) Control-Z (ASCII 26) marks the end of file. (/R2, /R3) The file length is given by the directory entry. Control-Z is just another character.
(/R0, /R2) The regex characters ^ and $ mean the start and end of a line or record. (/R3) The characters ^ and $ in an extended regex match a newline (ASCII 10). In a basic regex they don't match anything useful.
(/R0, /R2) The /V option looks for lines or records that don't contain a match. (/R3) The /V option makes no sense with free-form binary processing, unless you use it with the /L option to report files that contain no matches at all to the regex.

The file format obviously affects how the file is read, but it also affects how matches are displayed:

line-oriented text
(/R0)
record-oriented binary
(/R2)
free-form binary
(/R3)
(/R0, /R2) When a match is found, the matching line or record is displayed, unless you used the /C option, /J option, or /L option. (/R3) The /C option, /J option, or /L option is strongly recommended. But if you don't use any of them, then when a match is found, GREP will display the matching buffer.
(/R0, /R2) With the /N option, GREP displays the line or record number with each line that contains a match (/R3) With the /N option, GREP displays the starting byte number with each buffer that contains a match. The first byte in the file is numbered 0.
(/R0) Matching lines are output as character streams. If they contain control characters like form feed (ASCII 12) and backspace (ASCII 8), they will be output to the terminal, and the output may appear strange. (/R2, /R3) Printable characters are displayed normally, and non-printable characters are displayed by their hexadecimal values, such as <18> for Control-Z (ASCII 26, or 18 hex). GREP16 considers characters 0-31 and 127-255 as non-printing characters; in GREP32 that is the default but you can change it by setting a character mapping with the /M option.
(/R0, /R2) The /P option specifies how many lines or records from the file to display before and after each line or record that contains a match. (/R3) The /P option is ignored.

Wild Card Expansion

There are several important things to bear in mind about how GREP expands file names containing ? or *:

GREP uses DOS function calls to expand wild cards. This may lead to surprising behavor, at least in Windows 95. A Win95 user has reported that, as expected, "DIR *." returns only files that have no filename extension, but "GREP "*." examines all files because that's how the DOS functions treat the wildcard "*.". This discrepancy, caused either by DOS itself or by Microsoft's run-time C code, does not exist in Win98. When in doubt about which files GREP is scanning, you can use the /B option to make GREP tell you the name of every file it examines.

Subdirectory Searches

If you specify the /S option, GREP will search not only the files indicated on the command line, but also the files in subdirectories.

For example, with the command

        grep /S regex \hazax*.* *.c g:\mumble\*.htm
GREP will examine all files on the entire current drive whose names start with hazax; then it will look at all C source files in the current directory and all subdirectories under it; finally it will look at all HTML files in directory g:\mumble and all subdirectories under it.

Perhaps a more realistic example is this: you have a document about Vandelay Industries somewhere on your disk, but you can't remember where. This command should find it:

        grep Vandelay /S \*.*
(You can abbreviate \*.* to \* with GREP32.) You may also want to use the /I option if you can't remember whether "Vandelay" was capitalized normally.

Subdirectory search follows the normal file-searching rules: hidden and system subdirectories are normally ignored. (Yes, you have them if you have Windows 9x.) The /A option also applies during subdirectory search: with /S and /A together, GREP will search every subdirectory. There's no way to search every subdirectory but only normal files, or to search only normal subdirectories but to search for hidden files in them.

You may want to know in what order GREP examines files when the /S option is set. Ordinarily, GREP examines all files in the first file argument, including the subdirectory tree, then proceeds to the second file argument, and so on. However, when you use the /S option and none of the file arguments contains a path, GREP will look first for all those files in the current directory, then for all of them in the first subdirectory, and so on.

(The /S option is fully functional in the registered version, and will search all the way to the bottom of a directory tree. In the evaluation version, GREP will search the named or implied directories and all directories immediately below them, but no further in any one execution.)

The /D option will show you every directory and wild-card search as GREP performs it. The output also contains lots of other stuff, but the records of file visits all contain the string "GX:".


Options


The reference manual describes the options in detail. Here's a one-line summary of what each option does, each one a hyperlink to the full description of that option in the reference manual:

Option and Effect UNIX grep *
DOS FIND *
A -- Include hidden and system files when expanding wildcards.    
B -- Display a header for every file, even if it contains no matches.    
C -- Display count of matches instead of matching lines.   -c   /C
D -- Display debugging output.    
E -- Select extended regular expressions or strings.   (-E)  
F -- Read regular expression(s) from file.   (-f)  
H -- Don't display filenames in output.   -h  
I -- Ignore case in matching.   -i   /I
J -- Display just the part of each line that matches the regex.    
L -- Report names of files that contain matches, not matching lines.   -l  
M -- Specify character mapping or locale.   -l  
N -- Prefix line numbers to matching lines.   -n   /N
P -- Show context lines around matching lines.   (-A, -B, -C)  
Q -- Suppress program logo and some or all warnings.   (-s)  
R -- Read and display input files as binary or text.   -U, (-a)  
S -- Scan files in subdirectories too.   -r  
U -- Prefix filenames to matching lines.    
V -- Display lines that don't contain a match.   -v   /V
W -- Specify line width or binary block length.    
Y -- Multiple regular expressions AND instead of OR.    
Z -- Reset all options.    
0 -- Set ERRORLEVEL = 0 if any matches were found.    
1 -- Set ERRORLEVEL = 1 if any matches were found.   (-v)  
? -- Display help for options and regexes.   --help   /?
* UNIX grep options are case sensitive; GREP and FIND options are not. (An option is shown in parentheses if the GREP option's effect is similar but not identical.)

How to Specify Options

On the command line, options can appear anywhere, before or after the regex and the file specs. All options are processed before any files are read.

You have a lot of freedom about how you enter options: use a leading hyphen or slash, use upper- or lower-case letters, and leave spaces between options or combine them. For instance, the following are just some of the different ways of turning on the /P3 and /B options:

        /p3 -b    /b/P3    /p3B    -B/P3    -P3 -b
This user guide will always use capital letters for the options, to make it easier to distinguish letter l and figure 1.

For clarity, you should always use a hyphen or slash before the numeric /0 option or /1 option. /E0 means the /E option with a value of 0, but /E/0 means the /E option with no value specified, followed by the /0 option.

Environment Variable

If you use certain options frequently, with the registered version of GREP you can put them in the ORS_GREP environment variable. You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.

Only options can be put in the environment variable. If you want to store a regex, put it in a file and put /Ffile in the environment variable.

Example: If you prefer to have GREP sense the type of each file (/R-1 option) and you prefer UNIX-style output (/U option) with line numbers (/N option), then you want to set the environment variable as

        set ORS_GREP=/R-1UN

The reference manual gives more information about the environment variable, including instructions for overriding a particular option on the command line.


Regular Expressions (Regexes)


A regular expression or regex is a pattern of characters that will be compared to lines from one or more input files. A line from an input file is a match if the line, or part of it, agrees with the pattern in the regex.

A regex can be a simple text string, like mother, or something more complex. (If you want to search only for simple strings, use the /E0 option and ignore all this regex stuff.)

Regexes by Example

Example 1: If you want both the English and American spellings of the word for the color between white and black, use gr[ea]y as your regex.

Example 2: The basic regex for any word starting with "moth" is moth[a-z]*, which is the letters "moth" followed by any number of letters a through z. Yes, that regex does match "moth" itself: see * or + for Repetition in the reference manual.

Example 3: A word in double quotes would be matched by "[a-z]+". Read that regex as "a double quote mark, followed by one or more letters, followed by another double quote mark."

Example 4: A U.S. local telephone number has the basic regex

        [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]
That is three digits, followed by a hyphen, followed by four digits. (You could express it more simply with an extended regex: [0-9]{3}-[0-9]{4} or even \d{3}-\d{4}.)

Regex Language Summary

A regex, then, is essentially a string of characters with a bunch of operators thrown in to express possibilities like "any of these characters" and "repeated". Here's a quick summary of the characters that have special meaning in a regex. Each of them is hyperlinked to the section of the reference manual where you'll find a full description.

which regexes?description
       Characters with special meaning outside square brackets:
\ backslash all treat any of these special characters as normal
\ backslash extended (1) character types like \w for a word character;
(2) simple assertions like \b for a word boundary;
(3) back references to parenthesized subexpressions;
(4) character encoding for odd characters like \x3c for <
. period all matches any character
* asterisk all matches 0 or more occurrences of the preceding
+ plus sign all matches 1 or more occurrences of the preceding
? question mark extended matches 0 or 1 occurrence of the preceding
{ left brace extended repetition count, like {3,} matches three or more occurrences of the preceding
[ left square bracket all start a character class, e.g. [abcde] to match any one of a, b, c, d, e
^ caret all match start of line in text mode or start of record in binary mode
$ dollar sign all match end of line in text mode or end of record in binary mode
| vertical bar extended alternatives, e.g. mother|father to match "mother" or "father"
(...) parentheses
or round brackets
extended subexpressions, e.g. (&nbsp;)+ to match one or more occurrences of "&nbsp;"
       Characters with special meaning inside square brackets:
- minus sign or hyphen all character range, e.g. [a-z] to match any lower-case English letter
^ caret all negate the character class, e.g. [^a-z] to match any character except a lower-case English letter
\ backslash all treat next character as normal
\ backslash extended character encoding
[: left square bracket
followed by colon
extended introduce a named character class, e.g. [[:punct:]0-9] for any punctuation character or a digit
] right square bracket all end the character class

 

Return Values (ERRORLEVEL)


By default, GREP will return one of the following values to DOS, and you can test the return value with IF ERRORLEVEL in a batch file.

     255bad option, or other error on the command line or regex
 254specified file not available
 253insufficient memory: try reducing values specified with the /P option or /W option, or use GREP32 if possible
 128program error in expanding a regex
 2help message displayed (/? option, or nothing specified on the command line)
 0program ran to completion (whether or not there were any matches)
 
You might want to use GREP in a batch file or a makefile and take different actions depending on whether matches were found or not. To do this, use the /0 or /1 option. With the /1 option, GREP returns these values of ERRORLEVEL:
 0no matches were found in any file
 1one or more matches were found in at least one file
 2-255as above
 
/0 is the opposite: it returns these ERRORLEVEL values:
 0one or more matches were found in at least one file
 1no matches were found in any file
 2-255  as above

In other words, the /0 or /1 option lets you tell GREP which value to return if matches are found.


Limitations


GREP16 is limited by its 64 KB data segment. You may run into trouble if you use large values for both the /W option and the before number of the /P option.

For basic regexes, GREP is limited to 127 characters compiled into no more than 511. The "compiled" basic regex is GREP's internal representation, after character ranges have been expanded and so on.

For extended regexes, the maximum compiled size is 65,539 (sic) bytes. There can be no more than 65,536 capturing subpatterns, and all kinds of subpatterns can be nested no more than 200 levels deep.


Troubleshooting and How-to


Please share any questions that had you scratching your head. They'll be added to a future version of this user guide, space permitting

  1. GREP is missing matches in my Word or Word Perfect files, even though I know they're in there!

    Binary files, including most word-processing files, may contain ASCII 26 (Control-Z) characters. These have no special meaning in a binary file but signal the end of a file being read as text. To read such files, use the /R3 option. Better yet, if you register GREP you can use the /R-1 or /R-2 option and let GREP figure out the type of each file automatically.

  2. How do I find all lines that contain "this" but not "that"?

    Use GREP as a filter and execute it twice, the first time to find all lines that contain "this" and the second time with the /V option to filter out any lines that contain "that":

            grep "this" file1 file2 file3 | grep /v "that"
    
  3. GREP is reporting too many matches! I searched for "plain" but I'm also getting lines with "explain", "plains", etc.

    GREP searches for lines that contain the string of characters represented by your regex. If you want that string of characters only as a whole word, you have to tell GREP. For techniques to do this with basic or extended regexes, please see the Lengthy Example in the reference manual.

  4. I used the -r option, but GREP won't scan files in subdirectories.

    You need the -s option for subdirectories, not the -r option. GREP diverges from UNIX in this respect.

  5. I got the message "insufficient memory".

    Are you specifying large values for the /W option or /P option or both? GREP has to reserve a block of memory that is about equal to (9 plus the /W number of bytes) times (1 plus the first /P number, the before number). Try reducing either or both.

    Or, if you're running GREP16 under 32-bit Windows, try running GREP32 instead.

  6. I typed my GREP command and hit the Enter key, and it just sat there.

    Is the disk light on your computer flashing? GREP is reading lots of input but not finding any matches.

    Did you forget to specify input files? GREP is waiting for input from standard input. You can halt it by pressing Control-Z.

    Did you enter an extended regex with the | character? DOS interprets that character as a pipe, so it's waiting for GREP to finish and then DOS will run GREP's output through the "second command". Press Control-Z to end GREP. Some systems, like 4DOS, will accept the | if you enclose the whole regex in double quotes " ". Otherwise, use the /F- option and enter your regex from the keyboard; or see Backslash for Character Encoding (extended regex) or Special Rules for the Command Line in the reference manual.

  7. I've got a bunch of backslashes in my regex, and I don't think GREP is interpreting it the way I want.

    You can use the /D option to reveal what GREP is doing with your regex. The output can voluminous, but you can cut it down to size. Repeat your command with this added at the end:

            /D-|grep "grep GX:"
    
    You'll see only the interpretation of the regex.

    If the displayed original regex is different from what you typed, then either DOS or the Microsoft 32-bit startup code has altered some of your characters. Use the /F- option and enter your regex from the keyboard; or see Special Rules for the Command Line in the reference manual.

    If you see a line about a "massaged" regex, you're probably running afoul of the Special Rules for the Command Line. Try entering your regex from keyboard or file with the /F option.

    Other possibilities: check whether you entered extended regex characters but didn't specify the /E2 option to tell GREP you're using extended regexes.

  8. I'm trying to GREP for a character like (, ?, or {, but it doesn't work.

    These have special meanings in extended regular expressions but not in basic regexes. Make sure you have not turned on extended regexes; or use a backslash \ to make GREP match them as normal characters.

  9. GREPping on a word boundary with \< and \> doesn't work.
    or: My subpattern with \( doesn't work!
    or: \| doesn't work for alternatives!

    With extended regular expressions (/E2 option), GREP uses PERL-style regexes: \b for a word boundary, ( ) for subexpressions, and plain | for alternatives.

  10. /w, [:alpha:], and similar only take account of English letters. I need to work with 8-bit letters.

    In GREP32, use the /M option to select an appropriate character mapping. In GREP16, your only choice is to code the extra letters explicitly as shown in the character range example.

  11. When I enter a character like é in my regex, the search doesn't seem to work.

    This is a problem (in GREP32 only) with how Microsoft's startup code processes the command line. Here are three ways to get around this problem:

     

What's New in 6.9?


Only the more important changes are listed here. As always, the complete revision history is available as a separate document.

GREP release 6.9:

Note: This is a pre-release of 7.0. Registered users are being polled for desired features, and their responses will help determine any additional changes for 7.0. If you would like to participate in the poll, send e-mail to the author.

GREP release 6.0 added extended regular expressions (in GREP32), as well as the ability to search for literal strings (in both versions), with the new /E option. Other changes included a numeric /Q option and some improved diagnostics for possible user errors.


[ on to the reference manual ]