Unix: Sorting, searching, comparing and counting

Sorting

Unix has a built-in sort command named sort. It will sort the lines in a text file alphabetically or numerically. The default is alphabetic sort. To sort the lines in the file unixpast.txt, type:

$ sort unixpast.txt

Searching for a file

find

This command searches recursively through directories for files and directories with a given name, date, size, or any other attribute you specify.

Examples:

To search for all files with file type extention .txt, starting at the current directory (.) and working through all sub-directories, then printing the name of the files found to the screen, type:

$ find . -name "*.txt" -print

Since print is the default opration, this is equivalent to:

$ find . -name "*.txt"

To find files over 1Mb in size, and display the result as a long listing, type:

$ find . -size +1M -ls

To find all files modified in the last 24 hours (last full day) in current directory and its sub-directories, type:

$ find . -mtime -1

To search for and remove all files ending with .bak that have not been accessed in 7 days, type:

$ find . -name "*. bak" -type f -atime +7 -exec rm {} \;

Searching the contents of a file

grep

The command grep (get regular expression and print) is one of many standard Unix utilities. It searches files for specified words or patterns. To search for the word “open”, type:

$ grep open unixpast.txt

The grep command is by default case sensitive; it distinguishes between Open and open.

For case-insesitive search, use the -i option, e.g.:

$ grep -i open unixpast.txt

To search for a phrase or pattern, enclose it in single quotes. For example to search for “unix systems”:

$ grep -i 'unix systems' unixpast.txt

Some of the most used options of grep are:

noteBoth “egrep” and “fgrep” are deprecated.

Try some of them and see the different results. You can use more than one option at a time. For example, to see the number of lines in the file that does not contain the words “open”, “Open”, “OPEN”, etc is:

$ grep -ivc open unixpast.txt

To do a case-insensitive recursive search for “open” below the current directive, use one of the following two commands. The first will just report the filenames containg the string. The second will also report the number of matches.

$ grep -irl open . 
$ grep -irc open . | grep -v :0$

The option -c will print out the count of matching lines, even if the number of matches is zero. To only get the names of files that contains the string, the result is piped to the second search that excludes the line matching the regexp “:0$” (lines ending with “:0”).

Comparing files

diff

This command compares the contents of two files and displays the differences. Suppose you have a file called file1.txt and you edit some part of it and save it as file2.txt. To see the differences type:

$ diff file1.txt file2.txt

Lines beginning with a < denotes file1.txt, while lines beginning with a > denotes file2.txt.

You may even recursively compare to file trees:

$ diff -r dir1/ dir2/

To only list the names of the files that differ, use --brief:

$ diff --brief -r dir1/ dir2/

Counting

wc (word count)

A handy little utility is the wc command, short for word count. It actually counts characters, words and lines. To do a word count on unixpast.txt, type:

$ wc -w unixpast.txt

To find out how many lines the file has, type:

$ wc -l unixpast.txt

Summary

Command Meaning
find search for files in a directory hierarchy
cp file1 file2 copy file1 and call it file2
mv file1 file2 move or rename file1 to file2
rm file remove a file
rmdir directory remove a directory
cat file display a file
less file display a file a page at a time
head file display the first few lines of a file
tail file display the last few lines of a file
grep 'keyword' file search a file for keywords
wc file count number of lines/words/characters in file