Unix: Sorting, searching, comparing and counting
Sorting
Unix has a built-in sort command named sort
. It will
sort the lines in a text file alphabetically or numerically. The
default is alphabetic sort. To sort the lines in the file
unixpast.txt, type:
$ sort unixpast.txt
Searching for a file
find
This command searches recursively through directories for files and directories with a given name, date, size, or any other attribute you specify.
Examples:
To search for all files with file type extention .txt, starting at the current directory (.) and working through all sub-directories, then printing the name of the files found to the screen, type:
$ find . -name "*.txt" -print
Since print
is the default opration, this is equivalent to:
$ find . -name "*.txt"
To find files over 1Mb in size, and display the result as a long listing, type:
$ find . -size +1M -ls
To find all files modified in the last 24 hours (last full day) in current directory and its sub-directories, type:
$ find . -mtime -1
To search for and remove all files ending with .bak
that have not been accessed in 7 days, type:
$ find . -name "*. bak" -type f -atime +7 -exec rm {} \;
Searching the contents of a file
grep
The command grep
(get regular expression and print) is
one of many standard Unix utilities. It searches files for specified
words or patterns. To search for the word “open”, type:
$ grep open unixpast.txt
The grep
command is by default case sensitive; it distinguishes between Open and open.
For case-insesitive search, use the -i
option, e.g.:
$ grep -i open unixpast.txt
To search for a phrase or pattern, enclose it in single quotes. For example to search for “unix systems”:
$ grep -i 'unix systems' unixpast.txt
Some of the most used options of grep are:
-c
: print only the total count of matched lines (includes 0).-E
: extended regexp, (was egrep).-F
: fixed strings, (was fgrep).-i
: ignore case.-l
: only print the filename.-n
: precede each matching line with the line number.-r
: recurse directories.-v
: display those lines that do not match.
Both “egrep” and “fgrep” are deprecated.
Try some of them and see the different results. You can use more than one option at a time. For example, to see the number of lines in the file that does not contain the words “open”, “Open”, “OPEN”, etc is:
$ grep -ivc open unixpast.txt
To do a case-insensitive recursive search for “open” below the current directive, use one of the following two commands. The first will just report the filenames containg the string. The second will also report the number of matches.
$ grep -irl open . $ grep -irc open . | grep -v :0$
The option -c
will print out the count of matching
lines, even if the number of matches is zero. To only get the
names of files that contains the string, the result is piped to the
second search that excludes the line matching the regexp
“:0$
” (lines ending with “:0
”).
Comparing files
diff
This command compares the contents of two files and displays the differences. Suppose you have a file called file1.txt and you edit some part of it and save it as file2.txt. To see the differences type:
$ diff file1.txt file2.txt
Lines beginning with a < denotes file1.txt, while lines beginning with a > denotes file2.txt.
You may even recursively compare to file trees:
$ diff -r dir1/ dir2/
To only list the names of the files that differ, use --brief
:
$ diff --brief -r dir1/ dir2/
Counting
wc (word count)
A handy little utility is the wc
command, short for word count.
It actually counts characters, words and lines.
To do a word count on unixpast.txt, type:
$ wc -w unixpast.txt
To find out how many lines the file has, type:
$ wc -l unixpast.txt
Summary
Command | Meaning |
---|---|
find |
search for files in a directory hierarchy |
cp file1 file2 |
copy file1 and call it file2 |
mv file1 file2 |
move or rename file1 to file2 |
rm file |
remove a file |
rmdir directory |
remove a directory |
cat file |
display a file |
less file |
display a file a page at a time |
head file |
display the first few lines of a file |
tail file |
display the last few lines of a file |
grep 'keyword' file |
search a file for keywords |
wc file |
count number of lines/words/characters in file |