Unix: Sorting, searching, comparing and counting

Sorting

Unix has a built-in sort command named sort. It will sort the lines in a text file alphabetically or numerically. The default is alphabetic sort. To sort the lines in the file unixpast.txt, type:

$ sort unixpast.txt

Searching for a file

find

This command searches recursively through directories for files and directories with a given name, date, size, or any other attribute you specify.

Examples:

To search for all files with file type extention .txt, starting at the current directory (.) and working through all sub-directories, then printing the name of the files found to the screen, type:

$ find . -name "*.txt" -print

Since print is the default opration, this is equivalent to:

$ find . -name "*.txt"

To find files over 1Mb in size, and display the result as a long listing, type:

$ find . -size +1M -ls

To find all files modified in the last 24 hours (last full day) in current directory and its sub-directories, type:

$ find . -mtime -1

To search for and remove all files ending with .bak that have not been accessed in 7 days, type:

$ find . -name "*. bak" -type f -atime +7 -exec rm {} \;

Searching the contents of a file

grep

The command grep (get regular expression and print) is one of many standard Unix utilities. It searches files for specified words or patterns. To search for the word “open”, type:

$ grep open unixpast.txt

The grep command is by default case sensitive; it distinguishes between Open and open.

For case-insesitive search, use the -i option, e.g.:

$ grep -i open unixpast.txt

To search for a phrase or pattern, enclose it in single quotes. For example to search for “unix systems”:

$ grep -i 'unix systems' unixpast.txt

Some of the most used options of grep are:

-c: print only the total count of matched lines (includes 0).
-E: extended regexp, (was egrep).
-F: fixed strings, (was fgrep).
-i: ignore case.
-l: only print the filename.
-n: precede each matching line with the line number.
-r: recurse directories.
-v: display those lines that do not match.

note Both “egrep” and “fgrep” are deprecated.

Try some of them and see the different results. You can use more than one option at a time. For example, to see the number of lines in the file that does not contain the words “open”, “Open”, “OPEN”, etc is:

$ grep -ivc open unixpast.txt

To do a case-insensitive recursive search for “open” below the current directive, use one of the following two commands. The first will just report the filenames containg the string. The second will also report the number of matches.

$ grep -irl open . 
$ grep -irc open . | grep -v :0$

The option -c will print out the count of matching lines, even if the number of matches is zero. To only get the names of files that contains the string, the result is piped to the second search that excludes the line matching the regexp “:0$” (lines ending with “:0”).

Comparing files

diff

This command compares the contents of two files and displays the differences. Suppose you have a file called file1.txt and you edit some part of it and save it as file2.txt. To see the differences type:

$ diff file1.txt file2.txt

Lines beginning with a < denotes file1.txt, while lines beginning with a > denotes file2.txt.

You may even recursively compare to file trees:

$ diff -r dir1/ dir2/

To only list the names of the files that differ, use --brief:

$ diff --brief -r dir1/ dir2/

Counting

wc (word count)

A handy little utility is the wc command, short for word count. It actually counts characters, words and lines. To do a word count on unixpast.txt, type:

$ wc -w unixpast.txt

To find out how many lines the file has, type:

$ wc -l unixpast.txt

Summary

Command	Meaning
`find`	search for files in a directory hierarchy
`cp file1 file2`	copy file1 and call it file2
`mv file1 file2`	move or rename file1 to file2
`rm file`	remove a file
`rmdir directory`	remove a directory
`cat file`	display a file
`less file`	display a file a page at a time
`head file`	display the first few lines of a file
`tail file`	display the last few lines of a file
`grep 'keyword' file`	search a file for keywords
`wc file`	count number of lines/words/characters in file