Join the XXCOPY group
[ Back to Table of Contents ] [ << ] [ >> ] [ Feedback ]

XXCOPY TECHNICAL BULLETIN #05



From:    Kan Yabumoto           tech@xxcopy.com
To:      XXCOPY user
Subject: The Exclusion specifier in XXCOPY
Date:    2000-12-01  (revised)
====================================================================

Much of the mostly hidden power of XXCOPY lies in the exclusion
mechanism.  We identified the /X switch to be one of the most
important enhancements we made in XXCOPY.  Because it is a
complex scheme with many implied rules, one cannot effectively
use the full potential of the exclusion feature without a detailed
explanation of the full scope of the syntax as well as the way
the exclusion scheme is implemented.  This article will discuss
all the rules applied to the exclusion feature.


XXCOPY Exclusion switch syntax

  /X<xspec>       excludes the file or directory item given by
                  <xspec> which is an exclusion specifier.
                  If the specifier contains an embedded space,
                  the specifier must be surrounded by a pair
                  of double-quotes (").

  /EX<xfile>      specifies a text file whose name is <xfile>
                  which contains a list of <xspec> separated by space.

  /ZX             ignores the environment variable, "XXCOPYX".

  XXCOPYX         The environment variable XXCOPYX specifies a
  (env var)       list of <xspec> which are separated by a space.

  XXCOPY          The environment variable XXCOPY  specifies a
  (env var)       list of XXCOPY switches which may be /X<xspec>.

  Note that the difference between the two environment variables,
  XXCOPY and XXCOPYX is that every item in the XXCOPY value
  must be prefixed with a slash (/) followed by an XXCOPY switch
  (which can be for any XXCOPY switch) whereas XXCOPYX values are
  strictly for the /X switch as a list of exclusion specifiers in
  order to save space.

  You may specify as many exclusion specifiers as you like.


Some examples of the /X switches

  /Xc:\mydir\myfile.txt   // specifies just a single file
  /X*.tmp                 // all files that end with ".tmp"
  /Xabc*                  // all files that start with "abc"
  /Xmydir\                // the entire directory, "mydir" in the source
  /Xmydir\*\*             // same as /Xmydir\ which is a shortcut
  /Xmydir\*\*.tmp         // inside mydir, all files matching "*.tmp"
  /Xmy*xyz\*\abc*.c       // inside mydir, all files matching "abc*.c"
  /X*\cache\              // multiple-level subdirectories
  /X*\cache\*\*           // same as above with a trailing backslash
  /X*\cach?\*\*           // multiple-level subdir spec may have wildcards

  Here, you may see the glimpse of the powerful syntax in the exclusion
  specifier.  The first example seems the most straight forward.  The
  fourth example which ends with a backslash is a shorthand of for the
  common case of excluding a directory (it abbreviates "*\*" which follows).
  Therefore, all of the above examples except the first one contain
  or imply at least one wildcard specifier.  The last example includes
  one asterisks in each of the three parts.

  Don't worry about the complexity yet.  At least the first example shows
  a case which you can use it immediately without any further reading.
  Yes, if you have energy to list all of the files you want to exclude,
  you may painstakingly list all of such files by giving the full
  file specification of each file.  Since you will soon run out of the
  command line space, you will probably want to set up a text file using
  the /EX switch.


  E.g.,  /EXmyexcl.lst

   and myexcl.lst  contains the following specifiers:

     :: this is a comment line
     c:\win386.swp               :: comment may start like this
     c:\autoexec.bat  myfile.tmp :: one line may have multiple items
     "c:\program files"          :: use quotes (") for embedded space
     mydir\myfile.txt            :: pathspec relative to the source dir
     yourdir\                    :: entire yourdir\*\*


     Syntax rule for the Exclusion List File.

         An "Exclusion List File" specified in the /EX switch is a plain
         text file which contains a list of exclusion specifiers.
         You may list as many exclusion specifiers in one line.
         Exclusion specifiers are separated by one or more blank, tab,
         and/or newline character.  An exclusion specifier cannot be
         broken into two or more lines.  When a space character is
         embedded, the exclusion specifier must be surrounded by a
         pair of double-quotes (").  A line may contain a comment field
         which will be ignored by XXCOPY.  A comment field starts with
         two consecutive colons (::) and ends at the end of the line.
         We suggest the use of a line for each exclusion specifier which
         is followed by a comment.


Definition of the exclusion specifier.

    Up to now, the exact meaning of the exclusion specifier has not
    been defined.  Now, we are going to analyze the syntax and its
    meaning to its death.  (Note: the exclusion specifier has been
    revised on 2000-10-09 with the addition of the multiple-level
    subdirectory exclusion feature).


    The exclusion specifier has up to three parts.

        [ dir_spec\ ] [ *\ ]  [ template ]

        Although any of the three parts can be omitted, you must not skip
        both dir_spec and template at the same time.  Note that the last
        part (template) can be either a file-template or a directory-
        template which will be explained below with more details.


     Directory specifier ( dir_spec )

        The dir_spec part specifies the base directory of the exclusion.
        It is always followed by a backslash (\) character.
        The directory can be specified in an absolute path (starting with
        the root directory), or a relative path (without a leading
        backslash) which is treated as relative to the source directory
        (not the "current" directory).

        The dir_spec may contain a wildcard specification in its
        last part. For example

           /Xc:\mydir\level1\abc*\*\template
           /Xc:\mydir\level1\a*bc*.?oc\*\template

        In both of the examples here, the last part of the directory
        specifier (after \level1\) has asterisk(s) in it.  The second
        example goes one step farther by allowing multiple asterisks
        and even a question mark which is another wildcard for a single
        letter.

    The middle part (*\)

        It denotes that the exclusion specification will be applied
        not only to the dir_spec directory, but also to all of the
        subdirectories underneath.  It is equivalent of the familiar
        /S switch which is applied to modify the source specifier
        meaning that the XXCOPY action will include all subdirectories.
        Since we do not have the luxury of a separate /S switch on each
        exclusion items, we invented this notation which figuratively
        suggests the fact that the directory starts with dir_spec,
        ends with the template and anything in between is accepted.

        The following two examples highlight the effect of the middle part.

          /Xmydir\myfile.*      // myfile in mydir\ only
          /Xmydir\*\myfile.*c   // myfile in every directories under mydir\

     Template specifier ( template )

        The last part of the exclusion specifier is the template which
        may be either a file-template or a directory-template.  So, the
        exclusion specifier can be more precisely described by using the
        following two notations:

        [ dir_spec\ ] [ *\ ]  [ filetemplate ]
        [ dir_spec\ ] [ *\ ]  [ dirtemplate ]

        Here, the syntactic distinction of the two types is made by
        the ending of the template string.


Common shortcut notations of the exclusion specifier.

     File template

        When a lone template is specified without a trailing backslash,
        (e.g., /Xmyfile.txt ), it is treated as a shortcut for a
        multiple-level filename template which is equivalent to
        /X*\myfile.txt).  This is mostly for historic reason,
        (also, the frequency of this type of usage justifies it).

        If you need to specify a one-level filename template, you
        should place the dot directory (denoting the current directory)
        to distinguish it from the multiple-level case ( /X.\myfile.txt ).

        Examples:

          /Xtemplate      // file which matches the template inside
                          // the current (src) directory (Multil-Level).
          /X*\template    // the template applies to all subdirectories
                          // this is same as above (Multi-Level)
          /X.\template    // the dot denotes relative to the base (src)
                          // directory (1-Level)

     Directory template

        The directory template may have the following four variations
        in the ending.

           dirtemplate\         // full directory
           dirtemplate\*\*      // same thing with alternate notation
           dirtemplate\*        // file in the directory (one-level)
           dirtemplate\?\*      // all subdirectories but not
                                // the first-level files

           The first two notations are interchangeable and denote
           the whole directory.  And the third and fourth cases are
           partial directory notations (when the two are combined,
           it will match the whole directory.


     Examples:

        /Xdirtmpl\*\*      // excludes all matching directories in the
                           // base (src) directory and its contents
        /Xdirtmpl\         // same as above (the triling backslash
                           // denotes everything inside the directory)
        /X.\dirtmpl\       // in the case of the directory template,
                           // it applies to one directory relative to
                           // the base (src) directory (1-Level)
        /x*\dirtmp\        // you may make a directory template apply
                           // to many instances (Multi-Level)


        /xc:\windows\*     // specifies all the files in the first
                           // level of the c:\Windows directory such
                           // as, EXPLORER.EXE, WIN.INI, COMMAND.COM

        /xc:\windows\?\*   // this does not includes the first level
                           // files but all subdirectories in it such
                           // as \WINDOWS\SYSTEM\  \WINDOWS\DESKTOP\ etc.


        Since both dir_spec and dirtemplate may contain wildcards,
        it could be as complex as...

        /Xc:\mydir\pat*ern\*\dir???\*\*

        This one excludes all subdirectories which starts by "dir"
        followed by three characters which appear in any level of
        subdirectory under any directory inside c:\bydir whose
        name match "pat*ern".

     Note that the following two are distinct:

         /Xdir_spec\*     // one layer only (subdirectories not excluded)
         /Xdir_spec\*\*   // the entire dir_spec directory is excluded

         XXCOPY allows you to exclude either the entire subdirectory
         (which affects both files and directories of any level), or
         one directory layer (which affects only files in the immediate
         level but not subdirectories).


The variations in exclusion specifiers (11 cases)

    The exclusion specifier may be classified into the following
    eleven classes (A - K).

     simple cases     1-Level templates        Multi-Level templates
   -------------------------------------------------------------------
                    D dir_spec\filetmpl      H dir_spec\*\filetmpl
    A dir_spec\*    E dir_spec\dir_tmpl\*    I dir_spec\*\dir_tmpl\*
    B dir_spec\?\*  F dir_spec\dir_tmpl\?\*  J dir_spec\*\dir_tmpl\?\*
    C dir_spec\*\*  G dir_spec\dir_tmpl\*\*  K dir_spec\*\dir_tmpl\*\*

      Note that a dir_spec may be specified with wildcard characters
      in the last component level.  For example,

        c:\mydir\Level2\last?level\*              // simple case
        c:\mydir\Level2\last?level\template\      // 1-level case
        c:\mydir\Level2\last?level\*\template\    // multi-level

      Also, the file_template or directory_template may contain
      wildcard characters.

        c:\mydir\L2\last?level\file?template        // simple filepattern
        c:\mydir\L2\last?level\dir?template\        // whole directory
        c:\mydir\L2\last?level\*\dir?template\*     // 1-level files
        c:\mydir\L2\last?level\*\dir?template\?\*   // Multi-level case

        Here, to illustrate the wildcard in the respective compoents,
        a questionmark(?) was added where a wildcard is permitted
        (last?level\,  file?template or dir?template).

      Note that whereas the dir_spec shown above may consists of many
      levels of directories, the template specifiers (dir_tmpl) in
      Groups I, J and, K must be a single-level directory template
      (without a backslash inside).


The optimization of exclusion matching.

    In a very large scale backup operation, an XXCOPY job may encompass
    an entire volume as the source directory (such as c:\*).  To make
    the matters worse, the more files the source directory contains,
    the more the need for the exclusion specifiers grows.  Therefore,
    it is entirely possible that the entire C: drive may contain
    70,000 files and the total number of exclusion items the user specify
    in the exclusion list file with the /EX switch may contain literally
    hundreds of various exclusion specifiers.  If we were to test every
    file against this very large number of exclusion list, the combination
    will easily reach tens of millions which would slow down the entire
    backup process.  Therefore, XXCOPY performs preprocessing steps
    to analyze the set of exclusion specifiers.  First by classifying
    them into the five classes, some redundant exclusion specifiers can
    be removed.  For example, if a dir_spec is specified in Class B,
    any subdirectories of the same directory in Classes C, D, E, or F
    regardless of the template will be automatically excluded because
    the same directory in Class B spec overshadow any subset of the
    directory.  Moreover, in the actual XXCOPY implementation, the
    the active file pattern matching templates is computed to each
    subdirectory to reduce the number of filename matching to
    eliminate a significant number of redundant combinations.


Debug feature

    Because of the complexities of the exclusion parameters when the
    number of exclusion specifiers become substantially large,  you may
    analyze the list of exclusion parameters immediately after the
    initial exclusion parameter optimization steps are completed by the
    following two debug switches:

      /DEBUG    // displays the parameters and prompt for continuation
      /DEBUGX   // displays the parameters and exit XXCOPY.
      /OX       // outputs the exclusion parameters in the log file
      /OP       // outputs the regular parameters in the log file.

      /OX/W     // a convenient switch to test the exclusion settings


Automatically excluded files.

    Since the few output files (e.g., the error log files) which are
    generated by the XXCOPY program itself could not be successfully
    included in the current copying job if any of them happens to be
    in the source directory (or its subdirectories), those files will
    be always excluded implicitly.



© Copyright 2002 Pixelab, Inc. All rights reserved.

[ XXCOPY Home ] [ Table of Contents ] [ << ] [ >> ] [ Feedback ] [ DATMAN Home ]

DATMAN Home Page