CMP -- Compare Text or Binary Files
User Guide

release 4.9 (5.0 beta), document revised 8 Feb 2001
Copyright © 1994-2001 by Stan Brown, Oak Road Systems
 

CMP will compare text or binary files (or groups of files) and report any differences. Output is suitable for piping, or processing by other programs. A value returned in ERRORLEVEL lets batch files take action based on whether files are the same or differ.

              
 1. Why CMP?
 2. Getting Started
  2.1  System Requirements
  2.2  Installation
  2.3  Evaluation, License, and warranty
 3. User Instructions
  3.1  Comparing Single Files
  3.2  Comparing Groups of Files
 4. Overview: How CMP Compares Files
  4.1  File Selection
  4.2  The Input Stage
  4.3  Difference Blocks and Look-Ahead
  4.4  Reporting Difference Blocks
  4.5  Summary Report
 5. Options
  5.1  How to Specify Options
  5.2  Options by Category
  5.3  Alphabetical List of Options
    /0 /1 /? /A /B /D /E /F /I /L /M /N /Q /R /S /U /W /Z
  5.4  Environment Variable
 6. Return Values (ERRORLEVEL)
 7. What's New?

 

1. Why CMP?


CMP works your way, much more than the DOS utilities COMP.COM and FC.EXE.


 

2. Getting Started



2.1  System Requirements

The 16-bit program CMP16 runs under plain DOS, or in a DOS box under Windows. The 32-bit program CMP32 requires a DOS box under Windows 98, Win95, or Win NT 4.0. (I fully expect it to run in Windows 2000 and Windows ME, but have not tested it.)

CMP16 and CMP32 operate the same and have the same features, with two exceptions:

If you typically run CMP in a DOS box under Windows 9x or NT, CMP32 is the one you want.


2.2  Installation

There is no special installation procedure. Simply move CMP16.EXE, CMP32.EXE, or both to any convenient directory in your path. Each executable is completely self contained.

You may wish to rename the executable you use more often, CMP32.EXE or CMP16.EXE, to the simpler CMP.EXE. All the examples in this user guide will assume you've done that. Otherwise, just substitute CMP16 or CMP32 wherever you see CMP in the examples.


2.3  Evaluation, License, and warranty

CMP is shareware. If you use it past a 30-day evaluation period, you are morally and legally bound to register and pay for it. Please see the file LICENSE.TXT for full details, including support and warranty information.

The registered version offers some enhancements over the evaluation version:


 

3. User Instructions


For a quick summary of operating instructions and options, type

        cmp /? | more

The basic command form is

        cmp options file(s) otherfile_or_directory

Differences are normally listed on the screen, but you can send them to a file with normal DOS redirection (>reportfile). If you don't want to see the differences at all, but just want CMP to test whether the files are the same, >NUL will do that, or you can use the /Q3 option.

Options can actually be specified anywhere on the command line, not just before the first file spec, and they can be stored in an environment variable. All options will be scanned and will apply to all files, no matter where they appear on the command line

File specs may contain wild cards; see Comparing Groups of Files below.

There are two special rules for file specs that contain certain characters:


3.1  Comparing Single Files

To compare one file to another:

        cmp [options] filespec1 filespec2 [>reportfile]

Example 1: Compare two files in the current directory.

        cmp /w100 mywords.txt herwords.txt

Example 2: Compare MYWORDS.TXT in directory D:\ORDINAL to OURWORDS.TXT in the current directory.

        cmp /w100 d:\ordinal\mywords.txt ourwords.txt

Example 3: Compare FERNS.TXT in the ORDINAL directory of disk D to a file of the same name in the BACKUP directory of the current disk. As you can see, when the two files have the same name, you need only type the file name once.

        cmp /w100 d:\ordinal\ferns.txt \backup
        cmp /w100 d:\ordinal\ferns.txt \backup\ferns.txt

Example 4: Compare LIZARD.CPP in the ORDINAL directory of disk D to a file of the same name in the current directory of the current disk. Remember that "." means "current directory" in DOS commands.

        cmp /w100 d:\ordinal\lizard.cpp .
        cmp /w100 d:\ordinal\lizard.cpp .\lizard.cpp

3.2  Comparing Groups of Files

To compare groups of files, specify only a disk and/or directory as the last filespec:

        cmp [options] [path\]files path [>reportfile]

Example 1: Compare all files in the current directory with extension .TXT to files of the same names in directory D:\OTHER.

        cmp *.txt d:\other

Example 2: Compare three named files in the current directory to files of the same names in directory D:\PLACATE.

        cmp sheep.txt goat.txt eland.txt d:\placate

Example 3: Compare the three named files in directory F:\FIRST to files of the same name in directory SECOND, a subdirectory of the current directory.

        cmp f:\first\sheep.txt goat.txt eland.txt second

When comparing multiple files, only the first file may have a disk or directory indicated. The same path will be applied to the other files automatically.

Example 4: Compare all the .DOC files in the current directory of drive A, plus XX.HTM in the current directory of the drive A, to files of the same names in the current directory of drive B.

        cmp a:*.doc xx.htm b:

 

4. Overview: How CMP Compares Files


You can customize almost every aspect of CMP's operation, but all those choices can be bewildering. This section gives you an overview of how CMP operates, so that you can understand the various options in context. There is a lot here, so by all means feel free to skip right to the options when you're first getting started.


4.1  File Selection

Wild Card Expansion

Please be aware that CMP16 and CMP32 expand wild cards slightly differently because CMP32 supports long filenames. Thus CMP32 would expand abc* to include all files, with any extension or none, whose names start with abc; with CMP16 you need abc*.* to get the same result. This matches the way DOS commands like DIR operate.

When expanding wild cards, CMP will consider hidden and system files for possible matches.

Single Compare

If you are comparing one specified file to another specified file -- no wild cards, no subdirectory search -- then CMP considers it an error if either one can't be found. In that case, CMP displays an error message and stops.

Multiple Compare and Missing Files

If you are comparing multiple files, CMP will process the file specs in order, comparing each one against any matching files in the other directory. If you used the /S option to search subdirectories, CMP will match all file specs from the command line against each directory before moving on to the next subdirectory.

To the extent reasonable, as explained in the following paragraphs, CMP will warn you about missing files; you can suppress the warnings with the /Q3 option. On the other hand, use the /D option if you want full details about every attempted file match, whether it succeeded or failed.

With multiple compare, CMP may not be able to warn you if some of your intended target files are missing, lest it also warn you about files you never intended to compare. Consider this example:

        cmp file1 file2*.c d:\otherdir

If FILE1 doesn't exist, CMP will warn you, and likewise if FILE1 exists but D:\OTHERDIR\FILE1 does not. CMP will also warn you if no files match FILE2*.C.

But say FILE2A.C, FILE2B.C, and FILE2C.C all exist, but only some of them also exist in D:\OTHERDIR. Should CMP warn you about the non-matches, or did you intend only to compare when the same file name was found in both places? To avoid annoying you with spurious messages, CMP will warn you only if none of the existing files FILE2*.C have any counterparts in D:\OTHERDIR:
      cmp warning: .\file2*.c exists but d:\otherdir\ has no matches

If you specified the /S option for subdirectory search, there are no warnings for missing files. Consider this modified example:

        cmp file1 file2*.c d:\otherdir  /s

If CMP can't find FILE1 in the current directory, perhaps it will find files with that name in subdirectories. But by the time it opens the subdirectories, it no longer knows what was going on in the current directory. So there is no point at which it has all the information it needs to know whether to issue a warning.

The bottom line: with multiple compare, use the /D option if you want warnings about every possible file match and mismatch, use the /Q3 option to suppress all warnings about files not found, and use neither to let CMP do the best it can with limited warnings.


4.2  The Input Stage

After scanning the environment variable and the command line, CMP reserves computer memory as needed for the look-ahead buffer, according to the values set by the /L option and /W option. Then CMP begins reading lines from the (first) two files to be compared.

The /Wwidth option is important in reading files. For binary files, CMP reads chunks of width characters at a time. For text files, CMP reads a line at a time, but ignores the excess on any line that is longer than width characters. CMP will let you know about every such line; use the /Q1 option to suppress such warnings.

After reading each line of a text file, CMP immediately discards any spaces and tabs at the end. Therefore, if two lines are the same except that one of them has some trailing spaces or tabs and the other does not, CMP considers them to be the same. Those trailing spaces and tabs do count against the maximum width, however.

Empty lines are normally treated the same as any other lines, but you can use the /E option to tell CMP to discard empty or blank lines. They will not be used in comparison and will not appear in difference reports. (They will still be counted, so that line numbers in the output will be correct.)

"Massaged" Lines

Upon reading each line, CMP may store it in memory as it was read from the file, in a "massaged" form, or both. If the /I option is set, CMP will massage the line by changing all letters A-Z to lower case. If the /B option is set (for text files), CMP will massage the line by changing all runs of spaces and/or tabs to a single space. (If the /B option and /I option are not set, there is no massaging of lines.) It is the massaged lines that CMP will compare between files.

When lines are massaged, CMP normally stores both forms, so that the difference reports will show the lines exactly as they were in the files. But if the /M option is set then CMP will not save the original lines for display. In that case the difference reports will show the massaged lines, but you will have more memory available for look-ahead.

As long as corresponding lines from the two files are the same (after any massaging), CMP discards them and keeps reading. But when they don't match, CMP recognizes the beginning of a difference block.


4.3  Difference Blocks and Look-Ahead

When the next line from the first file doesn't match the next line from the second file, CMP recognizes the beginning of a difference block. Now CMP keeps reading lines trying to resynchronize the files.

The /Llook-ahead,resync option limits this look-ahead procedure. CMP may accumulate up to look-ahead lines from each file, trying to find lines that match again. CMP considers that it has resynchronized the files if resync consecutive lines are the same between the two.

An example may help clarify the significance of look-ahead and resync. Suppose CMP finds, after the first 31 lines of the two files match, that line 32 of file 1 doesn't match line 32 of file 2. In this case, CMP has to look ahead at line 33 of file 1 and line 33 of file 2.

              file 1               file 2
        ------------------   ------------------
        (two files identical to this point)
        (31) line d          (31) line d
        (32) line e          (32) something different

Maybe the two lines 33 will match, or maybe line 32 of file 1 will match line 33 of file 2 (meaning that line 32 of file 2 is new in that file and doesn't exist in file 1). Maybe there are 25 new lines in file 2, and line 32 of file 1 will match line 57 of file 2. CMP needs to keep looking ahead until it does find a match.

If CMP can't resynchronize files within the specified look-ahead depth (/L option), it will display the message

        ** look-ahead lines from both files with no match **

and then report the differing lines. Then it will proceed to the next files to be compared, if there are any.

The Resync Value

After a difference, finding just one line from file 1 that matches a line from file 2 may not be enough. This is where the resync value of the /L option comes in. CMP will not consider the two files resynchronized until that number of lines from the two are the same.

This is quite a long section. If you find yourself getting bored, you may prefer to skip it and simply try come compares, experimenting with different values for both numbers on the /L option.

(Note: Difference blocks can be reported in UNIX diff format or traditional format, depending on the /U option. Example 1 shows traditional format, and Example 2 shows UNIX diff format.)

Example 1. Consider the lilies of the field, and this scenario:

             file 1                 file 2
        -----------------    -------------------
        (80) Four score       (90) Four score
        (81) and seven        (91) and seven
                              (92) (i.e., 87)
        (82) years ago,       (93) years ago,
        (83) our fathers      (94) somebody
        (84) brought forth    (95) brought forth
        (85) upon this        (96) on this
        (86) continent a      (97) continent a
        (87) new nation,      (98) mighty nation,
        (88) conceived in     (99) born in
        (89) liberty and     (100) liberty and
        (90) dedicated to    (101) dedicated to

As you can see, a number of edits have been made in this one paragraph of Lincoln's famous speech. Do you want each edit reported as a separate difference, or would you prefer to see the differences in this paragraph as one connected change? Pre-5.0 releases of CMP had a fixed resync of 1 and reported each of the above changes separately, like this:

        2.92>(i.e., 87)

        1.83>our fathers
        2.94>somebody

        1.85>upon this
        2.96>on this

        1.87>new nation,
        1.88>conceived in
        2.98>mighty nation,
        2.99>born in

With resync = 1, as shown above, CMP sees that file 1 line 82 matches file 2 line 93, so it considers that difference block at an end; then one line later it starts a new difference block because file 1 line 83 is different from file 1 line 94.

But you may prefer that CMP not consider the files matched up again until it finds two consecutive matching lines, like lines 89-90 and 100-101 in the above example -- in other words, longer difference blocks but fewer of them. With resync set to 2, CMP reports a single connected series of edits on the above passage:

        1.82>years ago,
        1.83>our fathers
        1.84>brought forth
        1.85>upon this
        1.86>continent a
        1.87>new nation,
        1.88>conceived in
        2.92>(i.e., 87)
        2.93>years ago,
        2.94>somebody
        2.95>brought forth
        2.96>on this
        2.97>continent a
        2.98>mighty nation,
        2.99>born in

Higher values of resync are allowed, but for the above passage higher values would have the same effect as resync = 2.

Which resync value is right? CMP sets resync = 2 by default, but which resync value is "right" depends on the specifics of the data. If CMP seems to be reporting a lot of differences for files that have cluster changes like the above, you may find the report is more useful if you set a higher resync value with the /L option option. But don't make resync too large: you'll slow CMP down without making the reports better. Probably you'll never want to set resync greater than about 5.

CMP must be able to find resync identical lines within the look-ahead limit. For instance, suppose you specify the /L option as /L10,2. You are telling CMP to look ahead ten lines at a time, but of those ten lines two must resynchronize. That means that, with /L10,2, any difference block longer than 10-2 = 8 lines will cause CMP to give up on those two files and move on to the next files, if any.

Example 2. Here's a clearer example of how things can go awry if you change resync to 1 using the /L,1 option. The problem is especially likely to crop up if you have sections of text separated by blank lines, because then CMP resynchronizes on the blank lines instead of actual matching text. Consider this excerpt:

             file 1                 file 2
        -----------------    -------------------
        (20) line A           (20) line A
        (21)                  (21)
        (22) line B           (22) line D
        (23)                  (23) line E
        (24) line D           (24)
        (25) line E           (25) line G
        (26)                  (26) line H
        (27) line G           (27)
        (28) line H           (28) same after this
        (29)
        (30) same after this

As you see, file 1 lines 22-23 do not exist in file 2. With the default resync = 2 and the /E option not selected, CMP will recognize this and report just one difference, file 1 line 22-23:

        22,23d21
        < line B
        <

Here is where resync comes into play. Suppose you set resync = 1. Then CMP will see that the two lines 22 don't match, and will read ahead to file 1 line 23 and file 2 line 24, which do match; so CMP will report a difference block of file 1 line 22 and file 2 lines 22-23. But now the next paragraphs don't match: file 1 lines 24-25 are different from file 2 lines 25-26. So CMP reports another difference, which ends at the blank lines file 1 line 26 and file 2 line 27. Then there's another mismatch, and so on right down the file. CMP never does resynchronize:

        22c22,23
        < line B
        ---
        > line D
        > line E
        24,25c25,26
        < line D
        < line E
        ---
        > line G
        > line H

and so on for a very long report from just one real difference.

The problem was setting the /L,1 option but leaving the /E option turned off, so that the blank lines counted as matches, If resync was left at 2, CMP would resynchronize properly. Or if the /E option were turned on, CMP would ignore blank lines, see the match between file 1 line 24 and file 2 line 22, report only file 1 line 22 as a difference, and realize that the remainder of the files was the same:

        22d21
        < line B

The bottom line: if you need to reduce resync to 1 for some reason, you probably need to turn on the /E option option as well.

Look-Ahead and Memory Use

The look-ahead buffer uses your computer's memory. CMP32 can use all memory including virtual memory, but CMP16 can use only DOS memory.

The look-ahead buffer uses, in bytes, either roughly the look-ahead value in the /L option times the width in the /W option, or double that product. Why double? If you set an option that causes input lines to be massaged, CMP stores two copies of each line, one massaged for comparison and one original for display in difference reports. In that case, you can free up memory for the look-ahead buffer by setting the /M option.

You don't need to remember all this. If you exceed the available memory with the combined options, CMP will display a message suggesting you try lower values for /L or /W, or turn on the /M option if that would help.

Look-Ahead and Program Run Times

In a difference block, CMP has to compare each new line from each file with all the non-matching lines from the other file. This means that the number of compares grows as the square of the number of different lines, so the program may run rather slowly on files that have very long difference blocks. For instance, if you set /L2500, you are telling CMP that whenever it finds a difference between the two files, it should look ahead as far as 2500 lines in each file to try to resynchronize. If in fact the next 2499 lines of the two files are different, CMP will be doing roughly 2499² = over 6 million comparisons on that block alone. (The number of lines in the file is not an issue, just the number of consecutive lines that are actually different.) If you have files with long runs of differing lines, you can make CMP run faster by using a smaller look-ahead value.


4.4  Reporting Difference Blocks

CMP normally reports each difference block to the screen; you can add >reportfile on the command line to send this output to a file instead. You can use the /A option to limit CMP to reporting a certain number of difference blocks. You can prevent completely prevent CMP from reporting difference blocks with the /Q3 option; then CMP will report just one line for each pair of files, to tell whether they were the same or different.

CMP gives you significant control over how difference blocks are reported. The biggest choice is between UNIX diff format or traditional format; if you chose traditional format there are additional options for line numbers and separators.

For either format, if you have compressed runs of white space with the /B option or chosen to ignore case with the /I option, the original lines will ordinarily be displayed. To reduce use of computer memory, use the /M option. This tells CMP to display the "massaged" lines in difference reports, and frees up extra memory for a larger look-ahead buffer.

UNIX diff format (the /U option) shows the lines without line numbers, but precedes each difference block with the numbers of the lines added, changed, or deleted, like this:

        1a2,5
        >                  SHERLOCK HOLMES
        >         THE ADVENTURE OF THE SPECKLED BAND
        >             by Sir Arthur Conan Doyle
        >
        8,10c12,14
        < the acquirement of wealth, he refused to associate
        < with any investigation which did not tend
        < towards the unusual, and even the fantastic. Of
        ---
        > the acquisition of wealth, he refused to associate
        > himself with any investigation which did not tend
        > toward the unusual, and even the fantastic. Of
        52,54d59
        <   "My dear fellow, I would not miss it for
        < anything."
        <

By contrast, the traditional CMP report form shows the differing lines from file 1 and file 2 with their line numbers, like this:

        2.2>                 SHERLOCK HOLMES
        2.3>        THE ADVENTURE OF THE SPECKLED BAND
        2.4>            by Sir Arthur Conan Doyle
        2.5>

        1.8>the acquirement of wealth, he refused to associate
        1.9>with any investigation which did not tend
        1.10>towards the unusual, and even the fantastic. Of
        2.12>the acquisition of wealth, he refused to associate
        2.13>himself with any investigation which did not tend
        2.14>toward the unusual, and even the fantastic. Of

        1.53>  "My dear fellow, I would not miss it for
        1.54>anything."
        1.55>

With the traditional report form, a block of added lines is shown by leading 2s with no leading 1s, changed lines have some of each, and deleted lines have leading 1s with no corresponding leading 2s.

You can customize the traditional report form in several ways:


4.5  Summary Report

For each pair of files, CMP will normally report the number of lines in each file and the number of difference blocks found:

        ** Time: 0.2 s    Lines in file 1: 120   file 2: 125
        ** The files are significantly different.  Blocks reported: 8

If the files compare the same, you will see a message like this one:

        ** Time: 0.2 s    Lines in file 1: 120   file 2: 120
        ** The files are identical.

If you have the /B option, /E option, or /I option set, you have indicated that some actual differences are not significant. In this case, if the files compare equal the message will say

        ** Time: 0.2 s    Lines in file 1: 120   file 2: 124
        ** The files are effectively identical for the options chosen.

Note that the files compare equal even though they have different numbers of lines. This can happen when empty lines are suppressed with the /E option.

Finally, if some lines have been truncated according to the /W option, the message will say "effectively identical within the /W width".

Though CMP normally reports the number of lines in each file, the /A option or the /Q2 or /Q3 option tells CMP not to display that line of the summary report.

Final Truncation Warning

CMP reports truncation as it reads individual lines, but does not summarize truncation for each pair of files. However, because the individual truncation messages may be overlooked or suppressed, CMP also reports a final truncation message at the very end:
 
      cmp warning: lines were truncated -- use /W265 for complete comparison
 
You can see that CMP tells you the longest line it read in any file. If you want to re-run the comparison and have each file compared to the very end of each line, use the suggested value for the /W option. (Even this message is suppressed by the /Q3 option.)


 

5. Options


CMP's operation can be modified by quite a number of options, either on the command line or in an environment variable (described later in this user guide).

Because there are a great many options, they are presented below both by category and alphabetically. Here are quick hyperlinks to each option:

/0   /1   /?   /A   /B   /D   /E   /F   /I   /L   /M   /N   /Q   /R   /S   /U   /W   /Z


5.1  How to Specify Options

You have a lot of freedom about how you enter options:

For instance, the following are just some of the different ways of turning on the W100 and B options:
 
      /w100 /b    /w100-b    /w100/b    /w100B    -W100-B    -W100 /b


5.2  Options by Category

(Some options are listed in multiple categories to make them easier to find.)

These options affect file input:

These options affect the comparison process:

These options affect output:

These are general program options:


5.3  Alphabetical List of Options

/?
Display a help message and option summary, then exit with no further processing. You can redirect or pipe this information. For instance, you can display the help text one screen at a time by typing
        cmp /? | more
or print the help text with the command
        cmp /? >prn
/0 and /1
These options let you control the values that CMP returns to DOS.
 
/0   Return 0 in ERRORLEVEL if there are any differences in any files, or 1 if every pair of files compares equal.
 
/1   Return 1 in ERRORLEVEL if there are any differences in any files, or 0 if every pair of files compares equal.
 
neither   Return 0 in ERRORLEVEL.

Regardless of these options, CMP will return a higher value in ERRORLEVEL for premature termination. For more details, see Return Values later in this user guide.
 
/An
Stop comparing after reporting n difference blocks. If you expect some files to have lots of differences, you can use this option to limit the output and make CMP run faster.
 
The default is to read every file to the end and report all difference blocks; that's equivalent to /A0. If you just want to know whether files are the same or different without seeing the actual differences, see the /Q3 option rather than the /A option.
 
The parameter n limits the number of difference blocks reported, not the number of different lines. And it applies to each pair of files. Example:
        cmp code\*.cpp \bkup /A4
compares all .CPP files in the CODE subdirectory to files of the same names in the BKUP root directory. No more than 4 difference blocks between any one pair of files will be reported. This would be a good choice when you think most files are the same or nearly the same, but a few have lots of differences.
 
Dependencies: When the /Q3 option is set, the /A option is ignored and A1 is implied.
 
/B
Compress all runs of blanks and/or tabs in text files to a single blank, for purposes of comparison and display. With the /B option, CMP considers "a    b", "a b", "a{tab}b", and "a  {tab} b" identical.
 
Runs of spaces and/or tabs are compressed to a single space, not completely removed. Thus CMP will always consider "ab" (with no space between "a" and "b") different from "a b" (any spaces or tabs between "a" and "b").
 
Regardless of this option, CMP will always ignore spaces and tabs at the ends of lines in text files. Some more details are given above in "Overview: The Input Stage".
 
Dependencies: The /B option is ignored when the /R option (binary files) is set.
 
/Dfile   or   /D   or   /D-
Display debugging information. This includes whether you're running CMP16 or CMP32, whether this program is registered, the contents of the environment variable, the values of all options specified or implied, the files specified, and details of every file scanned. This information is normally suppressed, but you may find it helpful if CMP seems to behave in a way you don't expect.
 
Since the debugging information can be voluminous, if you want to see it at all you will usually want to specify an output file. The file must follow the D with no intervening space, and the filename ends at the next space. CMP will append to the file if it already exists.
 
A plain /D sends debugging information to the standard error output (normally the screen). Be careful not to specify any other options between /D and the next space, or they'll be taken as a filename. Finally, /D- sends debugging information to the standard output, which you can redirect (>) or pipe (|). This intersperses debug information with the actual output of CMP.
 
You can weed through the debugging output to some extent. CMP writes the following unique strings on most lines of output, so you can send debug output to a file and then grep the file for  
/E
Ignore any empty lines, or lines that contain only blanks and tabs. Without the /E option, CMP will keep track of blank lines and report added or deleted blank lines as differences.
 
The /E option can make CMP do a much better job on some text files, because it keeps CMP from resynchronizing on a blank line. Please see Example 2 in the overview.
 
Dependencies: The /E is ignored when the /R option (binary files) is set.
 
/Fn
Format line numbers in a field of n columns when reporting difference blocks in traditional format.
 
The /F option lets you ensure that reported differences all line up. (You might wonder why CMP doesn't just figure the necessary width on its own. To do that, CMP would have to read each file an extra time, just to count lines. That would slow the program down significantly.)
 
n is a minimum field width. If you specify /F4, line numbers for any differences in lines 1 through 9999 will be right justified in a four-character field. Any larger line numbers will take additional positions to the right, like this:
        1.  98>text1a
        2.  99>text1b
        2. 100>text1c

        1.2398>text2a
        2.2399>text2b

        1.23468>text3a
        1.23469>text3b
        2.23469>text3c
If you prefer to left justify line numbers in a field of stated width, put a minus sign before n. For instance, the output under the /F-4 option would line up like the above, but spaces would appear after the short line numbers instead of before, like this:
        1.98  >text1a
        2.99  >text1b
        2.100 >text1c

        1.2398>text2a
        2.2399>text2b

        1.23468>text3a
        1.23469>text3b
        2.23469>text3c
The default is the same as /F0, which displays each line number with no padding, as shown in the sample difference report.
 
Dependencies: The /F option is ignored when either the /U option (UNIX diff-style output) or the /Q3 option (don't display difference blocks) is set.
 
/I
Ignore case; treat letters A-Z the same as a-z for comparison.
 
Because of limitations in the MSVC library, the /I option affects only the English letters A through Z. Non-English lower-case letters are always considered different from the corresponding upper-case letters.
 
/Llookahead,resync   or   /Llookahead   or   /L,resync
When the files are different, CMP will look ahead as many as lookahead lines in each file to find where the files become the same again, and will consider that the files are the same again only when resync lines from the two files are the same. (If the /E option is set, empty lines will not count against either resync or look-ahead.) Please see the explanation and examples under Overview: Difference Blocks and Look-Ahead.
 
The default is /L20,2 in CMP16 and /L100,2 in CMP32. As the option forms above show, you can specify either resync or lookahead without changing the other.
 
resync can be 1 or greater; lookahead must be at least 2 greater than resync. lookahead may not exceed 32000, but other factors may restrict that.
 
Even if CMP and available memory will let you set lookahead as large as 32000, values greater than a few hundred are not recommended. Please see the note on run times in the Overview.
 
/M
Display lines as massaged according to the /B option or the /I option, not as they appear in the files.
 
CMP normally retains copies of the original lines from file for display in reporting difference blocks. But this roughly doubles the computer memory needed for the the look-ahead buffer. If you're willing to see approximate versions of the original lines in the difference reports, set the /M option and you increase the space available for look-ahead.
 
Dependencies: The /M option has effect only if you have turned on the /B option, the /I option, or both.
 
/Nstr
Separate line numbers from lines by str instead of the default > character when reporting difference blocks in traditional form.
 
You can specify a string of up to six characters; the string is terminated by the next space or tab. Don't use quotes with this option unless you want them in the output.
 
If you want certain characters like =, |, <, or space in your separator, you can't simply type them because DOS gives them special meanings. Use special "numeric escape sequences" to represent those characters in the /N option. For example, to make your output look like this:
        1. 98 : text1a
        2. 99 : text1b
        2.100 : text1c

        1.398 : text2a
        2.399 : text2b
use the sequence \32 to represent the space character, like this:
        cmp /N\32:\32 /F3 file1 file2
The numeric escape sequences are a backslash (\) followed by the numeric value of the character, up to three decimal digits. A leading 0 denotes octal; a leading 0x or 0X denotes hexadecimal. Here are some sample sequences:
 
instead ofuse any of            
(space)\32  \0x20 \040
(tab)\9   \0x09 \011
< (less)\60  \0x3C \074
= (equal)\61  \0x3D \075
> (greater)\62  \0x3E \076
| (vertical bar)\124 \0x7C \0174
" (double quote)        \34  \0x22 \042
 
The above are only examples: you can enter any character as a numeric sequence. For example, capital A would be \65, \0x41, or \0101.
 
Dependencies: The /N option is ignored when either the /U option (UNIX diff-style output) or the /Q3 option (don't display difference blocks) is set.
 
/Qlevel       (registered program only)
Set the quietness level, to suppress some output that you may not want. Please see the Overview for discussion of the normal output from CMP.
 
/Q0   (default) Display all normal messages and warnings.
 
/Q1 Suppress the program logo, any warning messages about individual truncated lines, and the final display of line counts for the two files. If any lines were truncated, a single message will still appear at the end of processing.
 
/Q2 Suppress the items mentioned for /Q1 plus the blank lines between difference blocks. Also, send the headers (file names) and footers (count of difference blocks or message that files are equal) to stderr (the error output, normally your screen) rather than stdout (standard output, which can be redirected with > or piped with |).
 
This lets you redirect the output of CMP and get only the difference lines from the two files. You still get line numbers, but by using the /F option you can force them to a fixed format that is easily stripped away. Example:
            cmp /Q2 /F6 file1 file2 >report
will send just the different lines to the file called REPORT, suppressing all non-essential messages. Essential messages will appear on your screen because they are written to stderr and are not redirected. Assuming each file has fewer than a million lines, each line redirected to the REPORT file will have a 9-character prefix: file number (1 or 2), a period, a six-digit line number field, and the separator character >.
 
/Q3 Suppress the program logo and all output, even the summary truncation warning and warnings about questionable use of options. Error messages and warnings about missing files will still be displayed.
 
For each pair of files compared, CMP will display just one line of output consisting of the file names and the comparison status, "identical", "identical/massaged" (if the /B option, /E option, or /I option was set), "identical/truncated" (if lines were truncated because of the /W option width setting), or "different".
 
This is handy when you have two sets of files to compare and don't care about the actual differences, only which files are different between the two sets.
 
/Q without a following number is normally the same as /Q1. The old /QQ option still works and is about the same as /Q2. For historical reasons, a plain /Q after any previous /Q option will reset the quietness level to 0.
 
Dependencies: When /Q2 or /Q3 is set, the /A option and the /U option are ignored.
 
/R
Compare files as binary. This is useful for non-text files such as word-processing files, spreadsheets, databases, and executable programs.
 
A text file has lines ending with carriage return (ASCII 13), line feed (ASCII 10), or both; and the first Control-Z (ASCII 26) marks the end of file. Also, a text file doesn't contain any NUL characters (ASCII 0). Binary files, on the other hand, may have NUL and Control-Z characters in the middle, and often don't have "lines" separated by anything.
 
DOS doesn't mark files as binary or text, and therefore CMP has no way to know which a given file may be. By default it reads all files as text, but if you specify the /R option then CMP will read all files as binary.
 
When CMP reads files in binary mode, there's no such thing as a line, so CMP reads files in blocks of characters. The block size is given by the /W option.

The choice of text or binary mode also affects how CMP displays lines in difference blocks. In normal text mode, any differing lines are displayed as simple strings. Non-printing characters, like tab (ASCII 9) or Control-X (ASCII 24), are given no special treatment and appear just as DOS displays them; thus screen output may appear strange if a text file contains non-printing characters. But in binary mode, non-printing characters are displayed using their numeric values in hex, such as <09> or <18>.
 
Dependencies: When the /R option is set, the /B option and the /E option are ignored.
 

/S
After comparing the indicated files, work down the subdirectory trees to compare matching files in subdirectories, including directories marked hidden or system.
 
The /S option is most useful with wild cards. Consider this example:
        cmp /s *.htm d:\new
Here CMP will compare all .HTM files in the current directory to files with the same names in directory D:\NEW. Then CMP will work its way down all subdirectories below the current directory, and whenever it finds a corresponding file in a corresponding subdirectory under D:\NEW it will compare them.
 
The first set of files need not be in the current directory. For example, suppose that you made a backup a couple of days ago and since then have edited a lot of files, and you now want to list all the changes you made. If the backup is rooted at directory JANBKUP on drive E, and the current files are rooted at directory WORKING on drive C, you could use this command:
        cmp /s e:\janbkup\*.h *.cpp c:\working
Wherever there's a .H or .CPP file in E:\JANBKUP or a subdirectory, such as E:\JANBKUP\WESTREGN, CMP will try to compare it to a file of the same name the corresponding subdirectory (C:\WORKING\WESTREGN). Please see Multiple Compare and Missing Files for details of how CMP will diagnose missing files.
 
The /S option is active in both the registered and the evaluation version of CMP. But in the evaluation version, CMP will search only two levels, the initial level and one level of subdirectories below that.
 
/U
Display UNIX-style output, putting line numbers above each difference block with a letter for added, changed, or deleted. Traditional CMP output displays the line number with each line. Please see Overview: Reporting Difference Blocks for sample outputs.
 
The freeware Vim editor will color-code UNIX-style difference reports, if your terminal can display colors.
 
Dependencies: When the /Q2 or /Q3 option is set. the /U option is ignored. When the /U option is set, the /F option and the /N option are ignored.
 
/Wwidth
Compare lines only up to width characters or in width-character blocks. The default width is 254.
 
width can be 2 to 32764 in CMP16 and 2 to 2147483644 in CMP32. But your computer probably doesn't have enough memory for lines that wide; see Look-Ahead and Memory Use in the Overview.
 
Comparing text files
 
CMP will examine each line only up to the specified width, and will display a warning message for any lines that exceed it. You can suppress these warnings by using the /Q1, /Q2, or /Q3 option.
 
In addition to the warning for each line, if any lines were truncated then CMP will display a single warning at the end of execution to tell you the longest line that was read from any file. Then you know the exact value to use with /W if you want to run CMP again and have it compare all lines to the end.
 
If you want to predict the needed width for a given file, simply compare the file to itself with a small width value and the /Q2 option to suppress messages, like this:
        cmp /Q2W10 file1 file1
Comparing binary files
 
CMP will read the files in chunks of width bytes and compare them. There is no question of truncation.
 
/Z
Reset all options to their default values.

If you use the /Z option on the command line, any options in the environment variable will be disregarded, and so will any preceding options on the command line. This can be useful in batch files, to make sure that the action of CMP is controlled only by the options on the command line, and not by any settings in the environment variable.
 
The /Z option is the only one whose effect can't be reversed. If you use /Z more than once, CMP disregards the environment variable and all command-line options up through the last /Z.


5.4  Environment Variable

If you use certain options frequently, with the registered version of CMP you can put them in the ORS_CMP environment variable. You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.

CMP processes the environment variable before any command-line options, which means that an option on the command line will override the corresponding option in the environment variable.

The toggles, /B /D /E /I /M /R /S /U, reverse their state every time you specify them. So if you usually want case-blind comparisons, put /I in the environment variable. Then, if you want case-sensitive comparisons for a particular run, simply put /I on the command line and that will reverse the setting from the environment variable. To alter the settings of other options, like /L and /F, simply put the option on the command line with the new desired setting.

You may want to specify options without regard to what might be in the environment variable -- when running CMP in a batch file, for instance. To ensure this, put the /Z option first on the command line.

If you have any question which options are in effect, simply use /D on the command line to display all option values.


 

6. Return Values (ERRORLEVEL)


By default, CMP will return one of the following values to DOS, and you can test the return value with IF ERRORLEVEL in a batch file.

0   program ran to completion (whether the files are the same or different)
2   help message displayed (/? option, or no files specified on the command line)
253   not enough memory for look-ahead or other program requirements
254   specified file not available in single file compare
255   bad option, or other error on the command line
 

You might want to use CMP in a batch file or a makefile and take different actions depending on whether two files are the same or different. To do this, use the /0 or /1 option. The /1 option emulates UNIX diff by returning an error level of 1 if the files are different or 0 if they're the same. /0 is the opposite: it returns 0 if the files are different or 1 if they're the same. In other words, the /0 or /1 option gives the value CMP should return if differences are found.

When comparing multiple files, the /1 option tells CMP to return an error level of 1 if any files compare as different, or 0 if all files compare as identical. The /0 option returns 0 if any files compare different, or 1 if all files compare identical.


 

7. What's New?


The latest release, 5.0, is a complete rewrite of the program. This section lists only highlights of the changes. The complete revision history is available as a separate document. It includes an important Transitional Note for users who are upgrading from CMP 4.x.

The present release, CMP 4.9, 2001-02-08, is a beta release of 5.0. The only known bug is that under certain circumstances CMP does not resynchronize soon enough. After some difference blocks, you may see one or two identical lines from both files reported. I am of course working to fix that problem.

New features and enhancements in 5.0 include:

A small number of features in previous releases are different in 5.0: