Wednesday, May 07, 2008

find & xargs

http://www.kalamazoolinux.org/tech/find.html

find & xargs

Part of the reason why the Linux command line is so POWERFUL!

Finding files can be a daunting task, given the vast number of files on your average Linux filesystem. The number of system files in an average Linux install is well into the tens of thousands, or even hundreds of thousands of files. That's not counting user files!

The command `find` is, as it's name implies, used to aid you in finding files you are looking for, anywhere in your filesystems. Plus much more (see below).

find can use a large number of other criteria to find a file.

  • The first argument to "find" is the directory (or directories) to perform the search.
    • Example: find (and display) every file in your home directory:
      • find $HOME

  • "-name" the name of a file, or a partial name (basic regex).
    • Example: find the file named "bookmarks.html" in your home directory:
      • find $HOME -name bookmarks.html
    • Example: find all files starting with the name "bookmarks" in your home directory:
      • find $HOME -name bookmarks\*
        • Characters that mean something special to the shell, like the asterisk must
          be escaped with a backslash or put in single quotes, to avoid problems.

  • "-atime/-ctime/-mtime" the last time a files's "access time", "file status" and "modification time",
    measured in days or minutes. The time can be compared to another file with "-newer/-anewer/-cnewer".
    • Example: find everything in your home directory modified in the last 24 hours:
      • find $HOME -mtime 0
    • Example: find everything in your home directory modified in the last 7 days:
      • find $HOME -mtime -7
    • Example: find everything in your home directory that have NOT been modified in the last year:
      • find $HOME -mtime +365
    • Example: find everything in your home that has been modified more recently than "abc.txt":
      • find $HOME -newer klug/find.html

  • "-type x" files of a certain type (file, directory, symlink, socket, pipe, block, character) (fdlspbc)
    • Example: find all directories under /tmp
      • find /tmp -type d

  • "-user" files owned by a certain user.
    • Example: find all files owned by user "bruce" under /var
      • find /var -user bruce

  • "-group" files which are a member of a certain group.
    • Example: find all files in group "users" under /var
      • find /var -group users

  • "-size" files of a certain size.
    • Size can be specified in blocks, bytes, works, Kilobytes, Megabytes or Gigabytes (bcwkMG).
    • Example: find all files in your home directory exactly 100 bytes long:
      • find $HOME -size 100c
    • Example: find all files in your home directory smaller than 100 bytes:
      • find $HOME -size -100c
    • Example: find all files in your home directory larger than 100MB:
      • find $HOME -size +100M

  • "-perm" files that has certain permissions, or has individual bits set or not set.
    • Example: find all files in your root directory that are SUID.
      • find / -xdev -type f -perm +4000
    • Example: find all files in your root directory that are SUID-root.
      • find / -xdev -type f -user root -perm +4000

  • "-links" files that has a certain number of hard links.
    • Example: find all files in your home directory with a hard link count of two:
      • find $HOME -type f -links 2
    • Example: find all files in your home directory with more than one hard link:
      • find $HOME -type f -links +1

  • "-inum" a file with a certain `inum`, useful in filesystem debugging and locating identical hard linked files.
    • Example: find file with inum=114300 in the /home partition:
      • find /home -inum 114300

find can perform a number of actions on the file(s) it finds.

  • "-print" prints the names of the files it finds. This is the default if no other actions are specified.
    These two commands are identical on recent Linux systems:
    • find $HOME -name bookmarks.html
      find $HOME -name bookmarks.html -print
    • Variations include:
      • "-ls" to display detailed output instead of just filename ("ls -dils" format).
      • "-fprint" to send the output to a file instead of stdout.
      • "-printf" to format the output in a specific way.
      • "-fprintf" a combination of the above two.

  • "-print0" Same as -print, except it separates files by a null character (ascii 0) instead of a newline.
    Although the usefulness of this may not be immediately obvious, it is extremely useful!
    See examples below. (the argument ends in the number ZERO, not the letter O)
    • Variations include:
      • "-fprint0" to send output to a file instead of stdout.

  • "-delete" will delete all files it finds. Use with care! :-)
    • Example: delete all files named "core" in the /tmp directory:
      • find /tmp -type f -name core -delete

  • "-exec" will execute any command on the files found.
    • Use "{}" to specify the filename found in the command.
    • End the command with a ";" (escape it!) to execute the command every time a file is found.
    • End the command with a "+" to pass multiple files to the command (like xargs).
    • Variations include:
      • "-execdir" to execute the command in it's directory (instead of the current directory)
      • "-ok" to ask the user for each file found if the command should be executed.
      • "-okdir" ask the user and execute in the file's directory.

Other useful find parameters:

  • "-xdev" Don't descend directories on other filesystems.
    • Useful for searching a single hard drive partition and omitting other HDD partitions, /proc,
      CDROM's, network mounts, etc. (network drives and CD's can be really slow to search)

  • "-maxdepth n" Descend at most n directory levels. (cannot be negative)

  • "-mindepth n" Do not apply tests or actions at levels less than n (non-negative).

  • "-daystart" perform time tests from beginning of today, instead of current date/time.

  • "-L" follow symbolic links (does not follow symlinks by default).

  • "-fstype x" only find files a filesystems of type x.
    • Useful for searching hard drive partitions and omitting CDROM's, network mounts, etc.

  • "-regex pattern" use full regular expressions.
    • Variations include:
      • "-iregex" case insensitive regex.

  • "-depth" Process each directory's contents before the directory itself
    • Useful for removing, since the directory has to be empty before it can be removed

  • "-noleaf" Do not optimize by assuming that directories contain 2 fewer subdirectories than their hard link count.
    • The default optimization improves speed significantly on Unix filesystems.
      However it doesn't work so well on other filesystems (DOS, CDROM, etc.), hence this option.

OPERATORS

  • "! expr" True if expr is false. (logical NOT)

  • "( expr )" Force precedence.

  • "expr1 -a expr2" Logical AND (default operation, not necessary)

  • "expr1 -o expr2" Logical OR.

  • "expr1 , expr2" For different searches while traversing the filesystem hierarchy only once.
    Must be used with parenthesis and -fprint to save separate outputs.

Examples:

  • Display all jpg files in the top two levels of your home directory:
    • find $HOME -maxdepth 2 -name \*jpg -print -exec xv {} \;
    • find $HOME -maxdepth 2 -name '*jpg' -print -exec xv {} +
    • find $HOME -maxdepth 2 -name '*jpg' -print0 | xargs -0 xv

  • cron job to make all files & directories world readable/writable in common area:
    • find /somedir/common -type f -exec chmod a+wr {} \;
      find /somedir/common -type d -exec chmod 777 {} \;

  • cron job to force correct owner/group/permissions on certain files:
    • find $BSE/lib/user \( -name '[p,u]*' -a -type f -a ! -perm 664 \) -exec chmod 664 {} \;
      find $BSE/lib/user \( -name 'd*' -a -type f -a ! -perm 666 \) -exec chmod 666 {} \;
      find $BSE/lib/user \( -type f -a ! -user bsp \) -exec chown bsp {} \;
      find $BSE/lib/user \( -type f -a ! -group programs \) -exec chgrp programs {} \;

  • cron job to delete some old log files and keep record of files removed:
    • find /var/opt/hparray/log -mtime +30 -print -exec rm -f {} \; >> $logf 2> /dev/null

  • cron job to delete some old temp files and keep record of files removed:
    • find / -name core -type f -fstype xfs -print -exec rm -f {} \; >> $logf 2> /dev/null
      find /var/tmp -mtime +1 -name '*aaa*' -print -exec rm -f {} \; >> $logf 2> /dev/null
      find /var/tmp -mtime +1 -name 'srt*' -print -exec rm -f {} \; >> $logf 2> /dev/null
      find /var/tmp -mtime +7 -print -exec rm -f {} \; >> $logf 2> /dev/null

  • Traverse /var only once, listing setuid files and directories into /root/suid.txt
    and large files into /root/big/txt. (example taken from the find man page):
    • find /var \( -perm +4000 -fprintf /root/suid.txt '%#m %u %p\n' \) , \
      \( -size +100M -fprintf /root/big.txt '%-10s %p\n' \)

xargs

  • Why do we need this "xargs" thing? It's in the presentation title! :-)
    Answer: Speed and efficiency.
    • The second line runs much faster than the first for a large number of files:
      • find / -name core -exec rm -f {} \;
      • rm -f $(find / -name core -print)
      In other words, running "rm" once, with all the filenames on the command line
      is much faster than running "rm" multiple times, once for each file.
    • However, the second line could fail if the number of files is very large and
      exceeds the maximum number of characters allowed in a single command.
    • "xargs" will combine the single line output of find and run commands with multiple
      arguments, multiple times if necessary to avoid the max chars per line limit.
      • find / -name core -print | xargs rm -f
    • The simplest way to see what xargs does, is to run some simple commands:
      • find $HOME -maxdepth 2 -name \*.jpg -exec echo {} \;
      • find $HOME -maxdepth 2 -name \*.jpg | xargs echo

  • Enter the power of ZERO!
    • The 2nd command will fail if any of the files contain a space or other special character:
      • find $HOME -maxdepth 2 -name \*.jpg -exec ls {} \;
      • find $HOME -maxdepth 2 -name \*.jpg | xargs ls
    • Delimiting the file names with NULL fixes the problem:
      • find $HOME -maxdepth 2 -name \*.jpg -print0 | xargs -0 ls

  • Real world example of a very useful set of commands: (This happens to me all the time)
    • Our "webmaster" comes to me and asks if I can "find" all the web pages
      that contain the graphic file "ArmstrongISS.jpg" so they can edit those pages.
      • find /home/httpd \( -name \*.html -o -name \*.php -o -name \*.php3 \) -print0 \
        | xargs -0 grep -l "ArmstrongISS.jpg"
      Note: add a "-i" parameter to "grep" for a case insensitive search on the string.
    • The above example alone is worth more than double the price of admission! :-)
      Not only does it find files by name, it only displays file names containing a certain string!
      When combining "find" with other Linux commands (like grep) and it's potential use in shell
      scripts, the power is only limited by your imagination! (and your command line skills). :-)
    • Similar examples to demo on my local system:
      • find $HOME \( -name \*txt -o -name \*html \) -print0 | xargs -0 grep -li vpn
      • find $HOME \( -name \*txt -o -name \*html \) -exec grep -li vpn {} +

Miscellaneous

  • Always read the man page for "find" & "xargs" on the system where you plan on using it.
    "find" has been around for a long time, but is still evolving at a rapid rate.
    • Some arguments are fairly new, and may not exist on some older systems,
      and some commercial Unix systems.
      • Some parameters, like "-delete" and -exec followed by a plus sign ("+"),
        are REALLY NEW. (Neither existed in the SuSE 9.2 "find"!)
      • Other parameters may be named something completely different on
        commerical Unix systems (i.e. "-fstype" == "-fsonly" on HPUX)
      • Older versions of "find" did NOT have "-print" as the default action.
        In fact there was no default, so running find without any action did
        nothing noticeable (except use CPU time and grind the hard drive)!

  • Always test your commands using "-exec echo" before invoking real commands,
    especially destructive commands like removing files! :-)

  • Honorable mention: "locate".
    • The "locate" package comes with most distributions.
      • Installs by default on Redhat/Fedora systems.
      • Comes with SuSE, but does not install by default.
    • "Locate" consists of a cron job that runs nightly (by default),
      and stores all filenames on your system in a searchable database.
    • Simply type "locate filename" to find the location (directory) of a file.
    • Is MUCH faster than using "find" on your entire directory structure, but only
      tells you where files live. It has none of the extra functionality of "find".
    • Using the right tool for the job, "locate" is faster than "find" when you are
      only looking for the locations of files (or where they lived as of yesterday).

  • Any [more] questions?
    • Note to self: if this went too quickly, read the "find" man page out-loud very slowly! :-)

  • Fin.

Blog Archive