Category: Unix

  • UNIX Shell Programming : grep,sed and awk

    UNIX Shell Programming : grep,sed and awk

    Unix Shell Programming Part 3 grep, sed and awk

    Contents :-

    grep:-Operation, grep Family, Searching for File Content.
    sed:-Scripts, Operation, Addresses, commands, Applications, grep and sed.
    awk:-Execution, Fields and Records, Scripts, Operations, Patterns, Actions, Associative Arrays, String Functions, Mathematical Functions, User Defined Functions, Using System commands in awk, Applications of awk, grep and sed

    grep

    grep stands for global regular expression print. It is a family of programs that is used to search the input file for all lines that match a specified regular expression and write them to the standard output file (monitor).

    Operation

    To write scripts that operate correctly, we must understand how the greputilities work. For each line in the standard input (input file or keyboard), grep performs the following operations:
    1.     Copies the next input line into the pattern space. The pattern space is a buffer that can hold only one text line.
    2.     Applies the regular expression to the pattern space.
    3.     If there is a match, the line is copied from the pattern space to the standard output. 
    The grep utilities repeat these three operations on each line in the input.

    grep Flowchart

    A flowchart for the grep utility is given on the left and two points are to be noted along with that. First, the flowchart assumes that no options were specified. Selecting one ore more options will change the flowchart. Second, although grepkeeps a current line counter so that it always knows which line is being processed, the current line number is not reflected in the flowchart.
    grep Operation Example
    Let’s take a simple example. We have a file having four lines. Our aim is to display any line in the file that contains UNIX. There are only three lines matching the grep expression. grephandles the following situations:
    1.     grep is a search utility; it can search only for the existence of a line that matches a regular expression.
    2.     The only action that grepcan perform on a line is to send it to standard output. If the line does not match the regular expression, it is not printed.
    3.     The line selection is based only on the regular expression. The line number or other criteria cannot be used to select the line.
    4.     grep is a filter. It can be used at the left- or right-hand side of a pipe.
    5.     grep cannot be used to add, delete or change a line.
    6.     grep cannot be used to print only part of a line.
    7.     grep cannot read only part of a file.
    8.     grep cannot select a line based on the contents of the previous or the next line. There is only one buffer, and it holds only the current line.
    The file contents are:
             Only one UNIX
             DOS only here
    Mac OS X is UNIX
    Linux is UNIX
    $ grep ‘UNIX’ file1

    grep Family

    There are three utilities in the grep family: grep, egrep and fgrep. All three search one or more files and output lines that contain text that matches criteria specified as a regular expression. The whole line does not have to match the criteria; any matching text in the line is sufficient for it to be output. It examines each line in the file, one by one. When a line contains a matching pattern, the line is output. Although this is a powerful capability that quickly reduces a large amount of data to a meaningful set of information, it cannot be used to process only a portion of the data.
    fgrep (Fast grep): supports only string patterns, no regular expressions.
    grep: supports only a limited number of regular expressions.
    egrep (Extended grep): supports most regular expressions but not all of them.

    grep Family Options

    There are several options available to the grep family. A summary is given below:
    Option
    Explanation
    -b
    -c
    -i
    -l
    -n
    -s
    -v
    -x
    -f file
    Precedes each line by the file block number in which it is found
    Prints only a count of the number of lines matching the pattern
    Ignores upper- / lowercase in matching text.
    Prints a list of files that contain at least one line matching the pattern
    Shows line number of each line before the line.
    Silent mode. Executes utility but suppresses all output.
    Inverse output. Prints lines that do not match pattern.
    Prints only lines that entirely match pattern.
    List of strings to be matched are in file.

    grep Family Expressions

    As we have seen before, fast grep (fgrep) uses only sequence operators in a pattern; it does not support any of the other regular expression operators. Basic grep and extended grep (egrep) both accept regular expressions as shown in table below:
    Atoms
    grep
    fgrep
    egrep
    Operators
    grep
    fgrep
    egrep
    Character
    Dot
    Class
    Anchors
    Back Reference
    ^ $
    Sequence
    Repetition
    Alternation
    Group
    Save
    All but ?
    * ? +
    Expressions in the grep utilities can become quite complex, often combining several atoms and/or operators into one large expression. When operators and atoms are combined, they are generally enclosed in either single quotes or double quotes. Technically, the quotes are need only when there is a blank or other character that has a special meaning to the grep utilities. As a good technique, we should always use them.

    grep

    The original of the file-matching utilities, grep handles most of the regular expressions. The middle road between the other two members of the family, grep allows regular expressions but is generally slower than egrep. It is the only member of the grepfamily that allows saving the results of a match for later use. In the example below, we use grep to find all the lines that end in the word the and then pipe the results to head and print the first five.
    students@firewall:~/test$ man bash > bash.txt
    students@firewall:~/test$ grep -n “the$” bash.txt | head -5
    24:       In  addition  to  the  single-character shell options documented in the
    46:                 shopt_option is one of the  shell  options  accepted  by  the
    51:                 the  standard  output.   If  the invocation option is +O, the
    115:       If arguments remain after option processing, and neither the -c nor the
    116:       -s option has been supplied, the first argument is assumed  to  be  the

    Fast grep

    If your search criteria require only sequence expressions, fast grep (fgrep) is the best utility. Because its expressions consist of only sequence operators, it is also easiest to use if you are searching for text characters that are the same as regular expression operators, such as the escape, parentheses or quotes. For example, to extract all lines of bash.txt that contain an apostrophe, we could use fgrepas shown below:
    students@firewall:~/test$ fgrep -n “‘” bash.txt | tail -5
    5235:              job spec is given, all processes  in  that  job’s  pipeline  are
    5334:       to  mail that as well!  Suggestions and `philosophical’ bug reports may
    5344:       A short script or `recipe’ which exercises the bug
    5353:       It’s too big and too slow.
    5362:       Compound commands and command sequences of the form `a ; b ; c’ are not

    Extended grep

    Extended grep (egrep) is the most powerful of the three grep utilities. While it doesn’t have the save option, it does allow more complex patterns. Consider the case where we want to extract all lines that start with a capital letter and end in letter ‘N’ (uppercase n). Our first attempt at this command is shown below:
    students@firewall:~/test$ egrep -n ‘^[A-Z].*N$’ bash.txt |head -5
    14:DESCRIPTION
    126:INVOCATION
    1288:EXPANSION
    1748:REDIRECTION
    2060:ARITHMETIC EVALUATION
    This is relatively a complex expression. It has three parts. The first looks for any line that starts with an uppercase letter. The second part says it can be followed by any character zero or more times. The third part says such a matching line end with uppercase ‘N’.
    While the above expression is fine, we want to extend it such that we also want to find all lines that start with a space ‘ ‘ and end with the character comma ‘,’ and also end with period ‘.’. For doing this, we can use the alternation operator. We will design the pattern for each of the requirement and then combine the three patterns with alternation operator. The result is given below:
    students@firewall:~/test$ egrep -n ‘(^[A-Z].*N$)|(^ .*,$)|(^ .*.$)’ bash.txt|head -406 |tail -11
    2058:       recursive calls.
    2060:ARITHMETIC EVALUATION
    2064:       for overflow, though division by 0 is trapped and flagged as an  error.
    2068:       order of decreasing precedence.
    2099:       0 when referenced by name without using the parameter expansion syntax.
    2104:       to be used in an expression.
    2114:       35.
    2118:       above.
    2126:       the file argument to  one  of  the  primaries  is  one  of  /dev/stdin,
    2127:       /dev/stdout,  or /dev/stderr, file descriptor 0, 1, or 2, respectively,
    2128:       is checked.

    Examples

    1.     Select the lines from the file that have exactly three characters.
    egrep ‘^…$’ testFile
    2.     Select the lines from the file that have at least three characters.
    egrep ‘…’ testFile
    3.     Select the lines from the  file that have three or fewer characters
    egrep –vn ‘….’ testFile
    4.     Count the number of blank lines in the file.
    egrep –c ‘^$’ testFile
    5.     Count the number of nonblank lines in the file
    egrep –c ‘.’ testFile
    6.     Select the lines from the file that have the string UNIX
    fgrep ‘UNIX’ testFile
    7.     Select the lines from the file that have only the string UNIX.
    egrep ‘^UNIX$’ testFile
    8.     Select the lines from the file that have the pattern UNIX at least two times.
    egrep ‘UNIX.*UNIX’ testFile
    9.     Copy the file to the monitor but delete the blank lines.
    egrep –v ‘^$’ testFile
    10.Select the lines from the file that have at least two digits without any other characters in between.
    egrep ‘[0-9][0-9]’ testFile
    11.Select the lines from the file whose first nonblank character is A.
    egrep ‘^ *A’ testFile
    12.Select the lines from the file that do not start with A to G.
    egrep –n ‘^[^A-G]’ testFile
    13.Find out if John is currently logged into the system.
    who |  grep ‘John’

    Searching for File Content

    Some modern operating systems allow us to search for a file based on a phrase contained in it. This is especially handy when we have forgotten the filename but know that it contains a specific expression or set of words. Although UNIX doesn’t have this capability, we can use the grep family to accomplish the same thing.

    Search a Specific Directory

    When we know the directory that contains the file, we can simply use grep by itself. For example, to find a list of all files in the current directory that contain “bash”, we should use the search as below. The option l prints out the filename of any file that has at least one line that matches the grepexpression.
    students@firewall:~/test$ ls
    bash.txt     cmpFile1       fgLoop.scr  file3         result.txt
    biju.txt     cmpFile2       file1       goodStudents
    censusFixed  dastardly.txt  file2       mylist
    students@firewall:~/test$ grep -l ‘bash’ *
    bash.txt
    biju.txt
    fgLoop.scr

    Search All Directories in a Path

    When we don’t know where the file is located, we must use the find command with the execute criterion. The find command begins by executing the specified command, in this case a grep search, using each file in the current directory. It then moves through the subdirectories of the current file applying the grep command. After each directory, it processes its subdirectories until all directories have been processed.
    students@firewall:~/test$ find ~ -type f -exec grep -l “bash” {} ;
    /home/students/.bash_logout
    /home/students/passwd
    /home/students/test/fgLoop.scr
    /home/students/test/bash.txt
    /home/students/test/biju.txt
    /home/students/.profile
    /home/students/.bash_history
    /home/students/assg.txt
    /home/students/pd2.txt
    /home/students/sort.txt
    /home/students/.bashrc

    sed

    sed is an acronym for stream editor. Although the name implies editing, it is not a true editor; it does not change anything in the original file. Rather sed scans the input file, line by line, and applies a list of instructions (called a sed script) to each line in the input file. The script, which is usually a separate file, can be included in the sed command line if it is a one-line command. The sed utility has three useful options. Option –n suppresses the automatic output. It allows us to write scripts in which we control the printing. Option –f indicates that there is a script file, which immediately follows on the command line. The third option –e is the default. It indicates that the script is on the command line, not in a file.

    Scripts

    The sed utility is called like any other utility. In addition to input data, sed also requires one or more instructions that provide editing criteria. When there is only command, it may be entered from the keyboard. Most of the time, instructions are placed in a file known as a sed script (program). Each instruction in a sed script contains an address and a command

    Script Formats

    When the script fits in a few lines, its instructions can be included in the command line. The script must be enclosed in quotes. For longer scripts, or for scripts that are going to be executed repeatedly over time, a separate script file is preferred. The file is created with a text editor and saved. We may give an extension .sed to indicate that it is a sed script. Examples of both are given below:
                $ sed –e ‘address command’ input_file
                $ sed –f script.sed input_file

    Instruction Format

    Each instruction consists of an address and a command.
    address
    ! (complement, optional)
    command
    The address selects the line to be processed (or not processed) by the command. The exclamation point (!) is an optional address complement. When it is not present, the address must exactly match a line to select the line. When the complement operator is present, any line that does not match the address is selected; line that match the address are skipped. The command indicates the action that sed is to apply to each input line that matches the address.

    Comments

    A comment is a script line that documents or explains one or more instructions in a script. It is provided to assist the reader and is ignored by sed. Comment lines begin with a comment token, which the pound sign (#). If the comment requires more than one line, each line must start with the comment token.
                # This line is a comment
                2,14 s/A/B
    30d
    42d

    Operation

    Each line in the input file is given a line number by sed. This number can be used to address lines in the text. For each line, sedperforms the following operations:
    1.     Copies an input line to the pattern space. The pattern space is a special buffer capable of holding one or more text lines for processing.
    2.     Applies all the instructions in the script, one by one, to all pattern space lines that match the specified addresses in the instruction.
    3.     Copies the contents of the pattern space to the output file unless directed not to by the –n option flag.
    sed does not change the input file. All modified output is written to standard output and to be saved must be redirected to a file.
    When all of the commands have been processed, sed repeats the cycle starting with 1. When you examine this process carefully, you will note that there are two loops in this processing cycle. One loop processes all of the instruction against the current line. The second loop processes all lines.
    A second buffer, the hold space, is available to temporarily store one or more lines as directed by the sed instructions.
    To fully understand how sed interacts with the input file, let’s look at the example. The data file hello.dat contains the following text:
    Hello friends
    Hello guests
    Hello students
    Welcome
    And the scripts file hello.sed contains the following lines:
    1,3s/Hello/Greetings/
    2,3s/friends/buddies/
    Now executing the command:
    $ sed –f hello.sed hello.dat
    Let’s go through the different steps. Each line of the input file is copied over to the pattern space, and then all instructions are applied and then output.

    Addresses

    The address in an instruction determines which lines in the input file are to be processed by the commands in the instruction. Addresses in sedcan be one of four types: single line, set of lines, range of lines and nested addresses.

    Single-Line Addresses

    A single-line address specifies one and only one line in the input file. There are two single-line formats: a line number or a dollar sign ($), which specifies the last line in the input file. Examples are:
    4command
    16command
    $command

    Set-of-Line Addresses

    A set-of-line address is a regular expression that may match zero or more liens, not necessarily consecutive, in the input file. The regular expression is written between two slashes. Any line in the input file that matches the regular expression is processed by the instruction command. Two important points need to be noted: First, the regular expression may match several lines that may or may not be consecutive. Second, even if a line matches, the instruction may not find the data to be replaced. Examples are:
    /^A/command
    /B$/command
    A special case of a set-of-line address is the every-line address. When the regular expression is missing, every line is selected. In other words, when there is no address, every line matches.

    Range Addresses

    An address range defines a set of consecutive lines. Its format is start address, comma with no space, and end address:
                start-address,end-address
    The start and end address can be a sed line number or a regular expression as in the next example:
    line- number,line-number
    line-number,/regexp/
    /regexp/,line-number
    /regexp/,/regexp/
    When a line that is in the pattern space matches a start range, it is selected for processing. At this point, sed notes that the instruction is in a range. Each input line is processed by the instruction’s command until the stop address matches a line. The line that matches the stop address is also processed by the command, but at that point, the range is no longer active. If at some future line the start range again matches, the range is again active until a stop address is found. Two important points need to be noted: First, while a range is active, all other instructions are also checked to determine if any of them also match an address. Second, more than one range may be active at a time. Examples are given below:
    A———
    B—————
    C———-
    D————-
    B—————–
    C—————–
    A———–
    A—————–
    B———–
    C————-
    A——————
    C————-
    3,/^A/
    /^A/,/^B/
    A special case of range address is 1,$, which defines every line from the first line (1) to the last line ($). However, this special case address is not the same as the set-of-lines special case address, which is not address. Given the following two addresses:
                1. command            2. 1,$command
    sed interprets the first as a set-of-line address and the second as a range address. Some commands, such as insert (i) and append (a) can be used only with a set-of-line address. These commands accept no address but do not accept 1,$ addresses.

    Nested Addresses

    A nested address is an address that is contained within another address. While the outer (first) address range, by definition, must be either a set of lines or an address range, the nested address may be either a single line, a set of lines or another range.
    Let’s look at two examples. In the first example, we want to delete all blank lines between lines 20 and 30. The first command specifies the line range: it is the outer command. The second command, which is enclosed in braces, contains the regular expression for a blank line. It contains the nested address.
                20,30{
                            /^$/d
                            }
    In the second example, we want to delete all lines that contain the work Raven, but oly if the line also contains the word Quoth. In this case, the outer address searches for lines containing Raven, while the inner address looks for lines containing Quoth. What is especially interesting about this example is that the outer address is not a block of lines but a set of lines spread throughout the file.
                /Raven/{
                            /Quoth/d
                            }

    Commands

    There are 25 commands that can be used in an instruction. They may be grouped into nine categories based on how they perform their task. They are Line Number commands, Modify commands, Substitute commands, Transform commands, Input/Output commands, Files commands, Branch commands, Hold space commands and Quit commands.

     

    Line Number Command

    The Line number command (=) writes the current line number at the beginning of the line when it writes the line to the output without affecting the pattern space. It is similar to the grep –n option. The only difference is that the line number is written on a separate line. The following example shows the usage. Note that this example uses the special case of the set-of-line address – there is no address, so the command applies to every line.
    $ sed ‘=’ TheRavenV1
    1
    Once upon a midnight dreary, while I pondered, weak and weary
    2
    Over many a quaint and curious volume of forgotten lore
    3
    While I nodded, nearly napping, suddenly there came a tapping
    4
    As of someone gently rapping, rapping at my chamber door.
    5
    “’Tis some visitor,” I muttered, “tapping at my chamber door
    6
    Only this and nothing more.”
    The next eample, we print only the line number of lines beginning with an upper-case O. To do this we must use the –n option.
    $ sed –n ‘/^O/=’ TheRavenV1
    2
    6

    Modify Commands

    Modify commands are used to insert, append, change or delete one or more whole lines. The modify commands require that any text associated with them be placed on the next line in the script. Therefore, the script must be in a file; it cannot be coded on the shell command line. Also, the modify commands operate on the whole line. In other words, they are line replacement commands. This means that we can’t use these sedcommands to insert text into the middle of a line. Whatever text you supply will completely replace any lines that match the address.

    Insert Command (i)

    Insert adds one or more lines directly to the output before the address. This command can only be used with the single line and a set of lines; it cannot be used with a range. In the next example, we insert a title at the beginning of Poe’s “The Raven”.
    $ sed –f insertTitle.sed TheRavenV1 | cat –n
    # Script Name: insertTitle.sed
    # Adds a title to file
    1i
                            The Raven
                                        By
                            Edgar Allan Poe
    1:                    The Raven
    2:                                By
    3:                    Edgar Allan Poe
    4: Once upon a midnight dreary, …
    If you use the insert command with the all lines address, the lines are inserted before every line in the file. This is an easy way to quickly double space a file.
    $ sed –f insertBlankLines.sed TheRavenV1
    # Script Name: insertBlankLines.sed
    # This script inserts a blank line before all lines in a file
    i
    # End of script

    Append Command (a)

    Append is similar to the insert command except that it writes the text directly to the output after the specified line. Like insert, append cannot be used with a range address. Inserted and appended text never appear in sed’s pattern space. They are written to the output before the specified line (insert) or after the specified line (append), even if the pattern space is not itself written. Because they are not inserted into the pattern space, they cannot match a regular expression, nor do they affect sed’s internal line counter. The following example demonstrates the append command by appending a dashed line separator after every line and “The End” after the last line of “The Raven”.
    $ sed –f appendLineSep.sed TheRavenV1
    # Script Name: appendLineSep.sed
    # This script appends dashed dividers after each line
    a
    ———————————
    $a
                                        The End

    Change Command (c)

    Change replaces a matched line with new text. Unlike insert and append, it accepts all four address types. In the next example, we replace the second line of Poe’s classic with a common thought expressed by many a weary calculus student.
    $ sed –f change.sed TheRavenV1
    # Script Name: change.sed
    # Replace second line of The Raven
    2c
    Over many an obscure and meaningless problem of calculus bore

    Delete Patten Space Command (d)

    The delete command comes in two versions. When a lowercase delete command (d) is used, it deletes the entire pattern space. Any script commands following the delete command that also pertain to the deleted text are ignored because the text is no longer in the pattern space.
    $ sed ‘/^O/d’ TheRavenV1

    Delete Only First Line Command (D)

    When an uppercase delete command (D) is used, only the first line of the pattern space is deleted. Of course, if the only line in the pattern space, the effect is the same as the lowercase delete.

    Substitute Command (s)

    Pattern substitution is one of the most powerful commands in sed. In general, substitute replaces text that is selected by a regular expression with a replacement string. Thus, it is similar to the search and replace found in text editors. With it, we can add, delete or change text in one or more lines.
    Address
    s
    /
    pattern
    /
    Replacement String
    /
    Flag(s)

     

    Search Pattern

    The sed search pattern uses only a subset of the regular expression atoms and patterns. The allowable atoms and operators are shown below.
    Atoms
    Allowed
    Operators
    Allowed
    Character
    Sequence
    Dot
    Repetition
    * ? {…}
    Class
    Alternation
    Anchors
    ^ $
    Group
    Back Reference
    Save
    When a text line is selected, its text is matched to the pattern. If matching text is found, it is replaced by the replacement string. The pattern and replacement strings are separated by a triplet of identical delimiters, slashes (/) in the example given before. Any character can be used as the delimiters, although the slash is the most common.

    Pattern Matches Address

    If the address contains a regular expression that is same as the pattern we want to match, that is a special case. Here, we don’t need to repeat the regular expression in the substitute command. We do need to show that it is omitted, however, by coding two slashes at the beginning of the pattern. An example is given below.
    $ sed ‘/love/s//adore/’ browning.txt
    Input:
    How do I love thee? Let me count the ways.
    I love thee to the depth and breadth and height
    My soul can reach, when feeling out of sight
    For the ends of being and ideal grace.
    I love thee to the level of everyday’s
    Most quiet need, by sun and candle-light.
    Output:
    How do I adore thee? Let me count the ways.
    I adore thee to the depth and breadth and height
    My soul can reach, when feeling out of sight
    For the ends of being and ideal grace.
    I adore thee to the level of everyday’s
    Most quiet need, by sun and candle-light.

    Replace String

    The replacement text is a string. Only one atom and two meta-characters can be used in the replacement string. The allowed replacement atom is the back reference. The two meta-character tokens are the ampersand (&) and the back slash (). The ampersand is used to place the pattern in the replacement string; the backslash is used to escape an ampersand when it needs to be included in the substitute text (if it is not quoted, it will be replaced by the pattern). The following example shows how the meta-characters are used. In the first example, the replacement string becomes *** UNIX ***. In the second example, the replacement string is now & forever.
    $ sed ‘s/UNIX/*** & ***/’ file1
    $ sed ‘/now/s//now & forever/’ file1

    Substitute Operation

    As we have seen before, when the pattern matches the text, sed first deletes the text and then inserts the replacement text. This means that we can use the substitute command to add, delete or replace part of a line.
    Delete Part of a Line: To delete part of a line, we leave the replacement text empty. In other words, partial line deletes are a special substitution case in which the replacement is null. The following example deletes all digits in the input from standard input.
    $ sed ‘s/[0-9]//g’
    Usually sed command operates only on the first occurrence of a pattern in a line. In the above example, we wanted to delete all digits. Therefore, we used the global flag (g) at the end of the pattern. If we did not use it, only the first digit on each line would be deleted.
    Change Part of a Line: To change only part of a line, we create a pattern that matches the part to be changed and then place the new text in the replacement expression. In the following example, we change every space in the file to a tab.
    $ sed ‘s/ /     /g’
    Now is the time
    For all good students
    To come to the aid
    of their college.
    Now     is      the     time
    For     all     good    students
    To      come    to      the     aid
    of      their   college.
    Add to Part of a Line: To add text to a line requires both a pattern to locate the text and the text that is to be added. Because the pattern deletes the text, we must include it in the new text.
    The next example add two spaces at the beginning of each line and two dashes at the end of each line.
    $ sed –f addPart.sed
    #!/bin/ksh
    # Script Name: addPart.sed
    # Adds two spaces at the beginning of line and – to end of line
    s/^/  /
    s/$/–/

    Back References

    The examples in the previous section were all very simple and straightforward. More often, we find that we must restore the data that we deleted in the search. This problem is solved with the regular expression tools as demonstrated. The sed utility uses two different back references in the substitution replacement string: whole pattern (&) and numbered buffer (d). The whole pattern substitutes the deleted text into the replacement string. In numbered buffer replacement, whenever a regular expression matches tex, the text is placed sequentially in one of the nine buffers. Numbered buffer replacement (d), in which the d is a number between 1 and 9, substitutes the numbered buffer contents in the replacement string.
                s/—–/—–&—–/
                s/(—–)…(—–)/——1——2/
    Whole Pattern Substitution: When a pattern substitution command matches text in the pattern space, the matched text is automatically saved in a buffer (&). We can then retrieve its contents and insert it anywhere, and as many times as needed, into the replacement string. Using the & buffer therefore allows us to match text, which automatically deletes it, and then restore it so that it is not lost. As an example, we can rewrite the previous example of adding two spaces at the beginning of line and two dashes at the end of the line with a single command.
    $ sed ‘s/^.*$/  &–/’
    Another example is to modify the price list of a restaurant menu so that the $ symbol is added before the prices.
    $ sed ‘s/[0-9]/$&/’ priceFile
    Numbered Buffer Substitution: Numbered buffer substitution uses one or more of the regular expression numbered buffers. We use it when the pattern matches part of the input text but not all of it. As an example, let’s write a script that reformats a social security number with dashes. We assume that all nine-digit numbers are social security numbers. There are three parts to a social security number: three digits-two digits-four digits. This problem requires that we find them and reformat them. Our script uses a search pattern that uses the numbered buffers to save three constitutive digits followed by two digits and then four digits. Once a complete match is found, the numbered buffers are used to reformat the numbers.
    $ sed ‘s/([0-9]{3})([0-9]{2})([0-9]{4})/123/’ empFile
                                                     
    George Washigton        001010001
    John Adams              002020002
    Thomas Jefferson        003030003
    James Madison           123456789
    George Washigton        001-01-0001
    John Adams              002-02-0002
    Thomas Jefferson        003-03-0003
    James Madison           123-45-6789
    Substitute Flags
    There are four flags that can be added at the end of the substitute command to modify its behaviour: global substitution (g), specific occurrence substitution (digit), print (p) and write file (w file-name).

    Global Flag

    The substitute command only replaces the first occurrence of a pattern. If there are multiple occurrences, none after the first are changed.
    root@firewall:/var/log# sed ‘s/cat/dog/’
    Mary had a black cat and a white cat.
    Mary had a black dog and a white cat.
    root@firewall:/var/log# sed ‘s/cat/dog/g’
    Mary had a black cat and a white cat.
    Mary had a black dog and a white dog.

    Specific Occurrence Flag

    We now know how to change the first occurrence and all o fthe occurrences of a text pattern. Specific occurrence substitution (digit) changes any single occurrence of text that matches the pattern. The digit indicates which one to change. To change the second occurrence of a pattern, we use 2; to change the fifth, we use 5.
    root@firewall:/var/log# sed ‘s/cat/dog/2’
    Mary had a black cat, a yetllow cat and a white cat.
    Mary had a black cat, a yetllow dog and a white cat.

     

    Print Flag

    There are occasions when we do not want to print all of the output. For example, when developing a script, it helps to view only the lines that have been changed. To control the printing from within a script, we must first turn off the automatic printing. This is done with the –n option. Once the automatic printing has been turned off, we can add a print flag to the substitution command.
    -rw-r–r– 1 root root       170 Feb 13 10:12 Feb2013
    root@firewall:~# ls -l | sed -n “/^-/s/(-[^ ]*).*:..(.*)/12/p”
    -rw-r–r– Feb2013
    -rw-r–r– fileList
    -rw-r–r– ifconfig.txt
    -rw-r–r– iptables_1.lst
    -rw-r–r– iptables_2.lst
    -rw-r–r– iptables.lst

    Write File Flag

    The write file command is similar to the print flag. The only difference is that rather than a print command we use the write command. One caution: there can be only one space between the command and the filename. To write the files in the previous example to a file, we would change the code as shown below.

    Transform Command (y)

    It is sometimes necessary to transform one set of characters to another. For example, IBM mainframe text file are written in a coding system known as EBCDIC. In EBCDIC, the binary codes for characters are different from ASCII. To read an EBCDIC file, therefore, all characters must be transformed to their ASCII equivalent as the file is read. The transform command (y) requires two parallel sets of characters. Each character in the first string represents a value to be changed to its corresponding character in the second string. Another example is to translate the lowercase vowels to the uppercase vowels below:
    root@firewall:~# sed ‘y/aeiou/AEIOU/’
    A good time was had by all Under the Harvest Moon last September.
    A gOOd tImE wAs hAd by All UndEr thE HArvEst MOOn lAst SEptEmbEr.

    Input and Output Commands

    The sed utility automatically reads text from the input file and writes data to the output file, usually standard output. In this section, we discuss commands that allow us to control the input and output more fully. There are five input/output commands: next (n), append next (N), print (p), print first line (P) and list (l).

    Next Command (n)

    The next command (n) forces sedto read the next text line from the input file. Before reading the next line, however, it copies the current contents of the pattern space to the output, deletes the current text in the pattern space, and then refills it with the next input line. After reading the input line, it continues processing through the script. The next example, we use the next command to force data to be read. Whenever a line that starts with a digit is immediately followed by a blank line, we delete the blank line.
    students@firewall:~/test$ cat deleteBlankLines.sed
    # Script Name: deleteBlankLines.sed
    # This cript deletes blank likes only if the preceding line starts with a number
    /^[0-9]/{
            n
            /^$/d
            }
    students@firewall:~/test$ sed -f deleteBlankLines.sed deleteBlankLines.dat
    Second Line: Line 1 & Line 3 blank
    4th line followed by non-blank line
    This is line 5
    6th line followed by blank line
    Last line (#8)

    Append Next Command (N)

    Whereas the next command clears the pattern space before inputting the next line, the append nextcommand (N) does not. Rather, it adds the next input line to the current contents of the pattern space. This is especially useful when we need to apply patterns to two or more lines at the same time.
    To demonstrate the append next command, we create a script that appends the second line to the first, the fourth to the third and so on until the end of the file. Note however that if we simply append the lines, when they are printed they will revert to two separate lines because there is a newline at the end of the first line. After we append the lines, therefore, we search for the newline and replace it with a space. The file consists of lines filled with the line number.
    $ sed –f appendLines.sed appendLines.dat
    # Script Name: appendLines.sed
    # This script appends every two lines so that the output is Line Line2,
    # Line 3 Line4 etc.
    N
    s/n/ /
    Input
    Output
    11111one1111111111
    22222two2222222222
    33333three33333333
    44444four444444444
    55555five555555555
    11111one1111111111 22222two2222222222
    33333three33333333 44444four444444444
    55555five555555555
    Another interesting and much useful example replaces multiple blank lines with only one.
    $ sed –f appendBlkLines.sed appendBlkLines.dat
    /^$/{
                $!N
                /^n$/D
                }
    The $!N command is interpreted as “if the line is not the last line”.

    Print Command (p)

    The print command (p) copies the current contents of the pattern space to the standard output file. If there are multiple lines in the pattern space, they are all copied. The contents of the pattern space are not deleted by the print command.
    $ sed ‘p’ linesOfNums.dat
    1111111111
    2222222222
    3333333333
    1111111111
    1111111111
    2222222222
    2222222222
    3333333333
    3333333333

    Print First Line Command (P)

    Whereas the print command prints the entire contents of the pattern space, the print first line command (P) prints only the first line. That is, it prints the contents of the pattern space up to and including a newline character. Any text following the first newline is not printed.
    To demonstrate the print first line, let’s write a script that prints a line only if it is followed by a line that begins with a tab. This problem requires that we first append two lines in the pattern space. We then search the pattern space for a newline immediately followed by a tab. If we find this combination, we print only the first line. We then delete the first line only.
    $ sed –nf printFirstLine.sed printFirstLine.dat
    # Script Name: printFirstLine.sed
    $!N
    /n      /P
    D
    This is line1.
    This is line2.
            Line 3 starts with a tab.
            Line 4 starts with a tab.
    This is line 5. It’s the last line.
    This is line2.
            Line 3 starts with a tab.

    List Command (l)

    The list command (l) converts the unprintable characters to their octal code.

    File Commands

    There are two file commands that can be used to read and write files. The basic format for read and write commands are shown below.
    address
    r/w
    file-name
                                                    Exactly one space

    Read File Command ( r )

    The read file command ( r) reads a file and places its contents in the output before moving to the next command. It is useful when you need to insert one or more common lines after text in a file. The contents of the file appear after the current line (pattern space) in the output. An example is to prepare a letter with a standard letter head and signature.
    $ sed –f readFile.sed readFile.dat
    # Script Name: readFile.sed
    1 r letterhead.dat
    $ r signature.dat

    Write File Command (w)

    The write file command (w) writes (actually appends) the contents of the pattern space to a file. It is useful for saving selected data to a file. For example, let’s create an activity log in which entries are grouped by days of the week. The end of each day is identified by a blank line. The first group of entries represents Monday’s activity, the second group represents Tuesday and so forth. The first word in each activity line is the day of the week.
    $ sed –nf writeFile.sed aptFile.dat
    # Script Name: writeFile.sed
    /Monday/,/^$/w Monday.dat
    /Tuesday/,/^$/w Tuesday.dat
    /Wednesday/,/^$/w Wednesday.dat
    /Thursday/,/^$/w Thursday.dat
    /Friday/,/^$/w Friday.dat
    /Saturday/,/^$/w Saturday.dat
    /Sunday/,/^$/w Sunday.dat

    Branch Commands

    The branch commands change the regular flow of the commands in the script file. Recall that for every line in the file, sed runs through the script file applying commands that match the current pattern space text.. At the end of the script file, the text in the pattern space is copied to the output file, and the next text line is read into the pattern space replacing the old text. Occasionally we want to skip the application of the commands. The branch commands allow us to do just that, skip one or more commands in the script file.

    Branch Label

    Each branch command must have a target, which is either a label or the last instruction in the script (a blank label). A label consists of a line that beings with a colon (:) and is followed by up to seven characters that constitute the label name. There can be no other commands or text on the script-label line other than the colon and the label name. The label name must immediately follow the colon; there can be no space between the colon and the name, and the name cannot have embedded spaces. An example of the label is:
    :comHere

    Branch Command

    The branch command (b) follows the normal instruction format consisting of an address, the command (b), and an attribute (target) that can be used to branch to the end of the script or to a specific location within the script. The target must be blank or match a script label in the script. If no label is provided, the branch is to be end of the script (after the last line), at which point the current contents of the pattern space are copied to the output file and the script is repeated for the next input line.
    The next example demonstrates the basic branch command. It prints lines in a file once, twice or three times depending on a print control at the beginning of the file.
    $ sed –f branch.sed branch.dat
    # Script Name: branch.sed
    # This script prints a line multiple times
    /(1)/ b
    /(2)/ b print2
    /(3)/ b print3
    # Branch to end of script
    b
    # print three
    :print3
    p
    p
    b
    # print two
    :print2
    p
    Print me once.
    (2)Print me twice.
    (3)Print me thrice.
    (4)Print me once.
    Print me once.
    (2)Print me twice.
    (2)Print me twice.
    (3)Print me thrice.
    (3)Print me thrice.
    (3)Print me thrice.
    (4)Print me once.

    Branch on Substitution Command

    Rather than branch unconditionally, we may need to branch only if a substitution has been made. In this case, we use the branch on substitution or as it is also known, the test command (t). Its format is same as the basic branch command.
    students@firewall:~/test$ sed -f branchSub.sed branchSub.dat
    # Script Name: branchSub.sed
    # This script prints a line multiple times
    s/(1)//
    t
    s/(2)//
    t print2
    s/(3)//
    t print3
    # Branch to end of script
    b
    # print three
    :print3
    p
    p
    b
    # print two
    :print2
    p
    (1)Print me once.
    (2)Print me twice.
    (3)Print me thrice.
    Default: print me once.
    Print me once.
    Print me twice.
    Print me twice.
    Print me thrice.
    Print me thrice.
    Print me thrice.
    Default: print me once.

    Hold Space Commands

    The sed has actually a hold space which can be used for saving the pattern space. There are five commands that are used to move text back and forth between the pattern space and hold space: hold and destroy (h), hold and append (H), get and destroy (g), get and append (G) and exchange (x).

    Hold and Destroy Command

    The hold and destroy command (h) copies the current contents of the pattern space to the hold space and destroys any text currently in the hold space.

    Hold and Append Command

    The hold and append command (H) appends the current contents of the pattern space to the hold space.

    Get and Destroy Command

    The get and destroy (g) copies the text in the hold space to the pattern space and destroys any text currently in the pattern space.

    Get and Append Command

    The get and append command (G) appends the current contents of the hold space to the pattern space.

    Exchange Command

    The exchange command (x) swaps the text in the pattern and hold spaces. That is the text in the pattern space is moved to the hold space and the data that were in the hold space are moved to the pattern space.
    Applications, grep and sed. awk:-Execution, Fields and Records, Scripts, Operations, Patterns, Actions, Associative Arrays, String Functions, Mathematical Functions, User Defined Functions, Using System commands in awk, Applications of awk, grep and sed
  • UNIX Shell Programming – Complete Turtorial : Part 1

    UNIX Shell Programming – Complete Turtorial : Part 1

    Lecture Note : Unix Shell Programming Part 1

    Topics :-

    Introduction to Unix –  Architecture of Unix, Features of Unix , Basic Unix Commands – Unix Utilities:- Introduction to unix file system, vi editor, file handling utilities, security by file permissions, process utilities, disk utilities, networking commands – Text processing utilities and backup

    Introduction to Unix

    UNIX is a popular operating system in the engineering world and has been growing in popularity lately in the business world. Knowledge of its functions and purpose will help you to understand why so many people choose to use UNIX and will make your own use of it more effective.

    History of Unix

    In 1965, Bell Telephone Laboratories joined an effort with the General Electric Company and Project MAC of the Massachusetts Institute of Technology to develop a new operating system called Multics. The goal of the Multics system were to provide simultaneous computer access to a large community of users, to supply ample computation power and data storage and to allow users to share their data easily, if desired. Although a primitive version of the Multics system was running on a GE 645 computer by 1969, it did not provide the general service computing for which it was intended, nor was it clear when its development goals would be met.

    In an attempt to improve Bell Laboratories’s programming environment, Ken Thompson, Dennis Ritchie and others sketched a paper design of a file system that later evolved into an early version of the UNIX file system. Thompson wrote programs that simulated the behavior of the proposed file system and of programs in a demand-paging environment and he even encoded a simple kernel for the GE 645 computer. Later he found that the program was unsatisfactory because it was difficult to control the space ship and the program was expensive to run. Thompson later found a little used PDP-7 computer that provided good graphic display and cheap executing power. Programming Space Travel for the PDP-7 enabled Thompson to learn about the machine, but its environment for program development required cross-assembly of the program on the GECOS machine and carrying paper tape for input to the PDP-7. To create a better development environment, Thompson and Ritchie implemented their system design on the PDP-7, including an early version of the UNIX file system, the process subsystem and a small set of utility programs. Eventually the new system no longer needed the

    GECOS system as a development environment but could support itself. The new system was given the name UNIX.
    Although this early version of the UNIX system held much promise, it could not realize its potential until it was used in a real project. Thus, while providing a text processing system for the patent department at Bell Laboratories, the UNIX system was moved to a PDP-11 in 1971. The system was characterized by its small size: 16K bytes for the system, 8K bytes for user programs, a disk of 512K bytes, and a limit of 64K bytes per file. After its early success, Thompson set out to implement a Fortran compiler for the new system, but instead came up with the language B. influenced by BCPL. B was an interpretive language with the performance drawbacks implied by such languages, so Ritchie developed it into one he called C, allowing generation of machine code, declaration of data types and definition of data structures. In 1973, the operating system was rewritten in C. The number of installations at Bell Laboratories grew to about 25 and a UNIX Systems Group was formed to provide internal support.
    AT&T could not market UNIX as they have signed a Consent Decree with the Federal government in 1956. But they provided UNIX  system to universities who requested it for educational purposes. In 1974, Thompson and Ritchie published a paper describing the UNIX system in the Communications of the ACM giving further impetus to its acceptance. By 1977, the number of UNIX system sites had grown to about 500, of which 125 were in the universities. UNIX systems became popular in the operating telephone companies, providing a good environment for program development, network transaction operations services and real-time services. In 1977, the UNIX system was first ported to a non-PDP machine, ie, it is made to run on another machine with few or no changes, the Interdata 8/32.
    With the growing popularity of microprocessors, other companies ported the UNIX system to new machines, but its simplicity and clarity tempted many developers to enhance it in their own way, resulting in several variants of the basic system. In the period 1977 to 1982, Bell Laboratories combined several AT&T variants into a single system, known commercially as UNIX System III. Bell Laboratories later added several features to UNIX System III, calling the new product UNIX System V and AT&T announced official support for System V in January 1983. However, people at the University of California at Berkeley had developed a variant to the UNIX system, the most recent of version of which  is called 4.3 BSD for VAX machines. By the beginning of 1984, there were about 100,000 UNIX system installations in the world, running on machines with a wide range of computing power from microprocessors to mainframes and on machines across different manufacturers product lines. No other operating system can make that claim.

    Architecture of Unix

    Unix Architecture Diagram _genuinecoder
    Unix Architecture
    The figure depicts the high-level architecture of the UNIX system. The hardware at the centre of the diagram provides the operating system with basic services. The operating system interacts directly with the hardware, providing common services to programs and insulating them from hardware idiosyncrasies. Viewing the system as a set of layers, the operating system is commonly called the system kernel or just the kernel emphasizing its isolation from user programs. Because programs are independent of the underlying hardware, it is easy to move them between UNIX systems running on different hardware if the programs do not make assumptions about the underlying hardware.
    Programs such as the shell and editors (ed and vi) shown in the outer layers interact with the kernel by invoking a well defined set of system calls. The system calls instruct the kernel to do various operations for the calling program and exchange data between the kernel and the program. Several programs shown in the figure are in standard system configurations and are known as commands, but private user programs may also exist in this layer as indicated by the program whose name is a.out, the standard name for executable files produced by the C compiler. Other application programs can build on top of lower-level programs, hence the existence of the outermost layer in the figure. For example, the standard C compiler, cc, is in the outermost layer of the figure: it invokes a C preprocessor, two-pass compiler, assembler, and loader (link-editor), all separate lower-level programs. Although the figure depicts a two-level hierarchy of application programs, users can extend the hierarchy to whatever levels are appropriate. Indeed, the style of programming favored by the UNIX system encourages the combination of existing programs to accomplish a task.
    Many application subsystems and programs that provide a high-level view of the system such as the shell, editors, SCCS and document preparation packages have gradually become synonymous with the name “UNIX system”. However, they all use lower-level services ultimately provided by the kernel, and they avail themselves of these services via the set of system calls.

     Features of Unix

     Some useful facts concerning UNIX programs and files:
    ·        A file is a collection of data that is usually stored on disk, although some files are stored on tape. UNIX treats peripherals as special files, so that terminals, printers, and other devices are accessible in the same way as disk-based files.
    ·        A program is a collection of bytes representing code and data that are stored in a file.
    ·        When a program is started, it is loaded from disk into RAM. When a program is running, it is called a process.
    ·        Most processes read and write data from files.
    ·        Processes and files have an owner and may be protected against unauthorized access.
    ·        UNIX supports a hierarchical directory structure.
    ·        Files and processes have a location within the directory hierarchy. A process may change its own location or the location of a file.
    ·        UNIX provides services for the creation, modification and destruction of programs, processes and files.
    The main features of UNIX are listed below:
    ·        UNIX allows many users to access a computer system at the same time.
    ·        It supports the creation, modification and destruction of programs, processes and files.
    ·        It provides a directory hierarchy that gives a location to processes and files.
    ·        It shares CPUs, memory, and disk space in a fair and efficient manner among competing processes.
    ·        It allows processes and peripherals to talk to each other, even if they are on different machines.
    ·        It comes complete with a large number of standard utilities.
    ·        There are plenty of high-quality, commercially available software packages for most versions of UNIX.
    ·        It allows programmers to access operating features easily via a well-defined set of system calls that are analogous to library routines.
    ·        It is a portable operating system and thus is available on a wide variety of platforms.

    Basic Unix Commands – Unix file permissions, process utilities, disk utilities, networking commands

    This section introduces the basic Unix commands. The commands which are introduced here are cancel, cat, cd, chgrp, chmod, chown, clear, cp, date, emacs, file, groups, head, lp, lpr, lprm, lpq, lpstat, ls, mail, man, mkdir, more, mv, newgrp, page, passwd, pwd, rm, rmdir, sty, tail tset, vi, wc etc.

     Obtaining an account in unix

    You need to get an account for working with Unix. If you don’t have your own computer with Unix installed and working, you need to get an account to work with. This may be possible at your work location, the institute where you study etc. You need to contact the system administrator to get an account. The account will have a user id and a password to access the same.

    Logging in

    In order to use a UNIX system, you must first log in with a suitable username – a unique name that distinguishes you from the other users of the system. Your user name and initial password are assigned to you by the system administrator or it is set to something standard if you bought your UNIX system. Use these to login to the system. For login, the system will show a screen asking for a login name with a prompt “login: “. There you have to type the username and then press ENTER key. It will prompt with the password “Password: “. Normally whatever you type here will not be shown since it is a secret. If you enter the correct user name and password, the system logs you in and shows a prompt that it is ready to take on commands from you. By convention, the prompt is dollar sign (“$”) or percentage sign (“%”) for normal users and hash character “#” for system administrator. These can be changed by the user, but these are the default values. When a command is issued, the shell executes the command and once the command execution is over, it comes back to the shell. It is again indicated by the prompt which the system shows.

    cd command in Unix

    The cd command is used for changing directory. If you issue this command without any parameter, it will go to the home directory. Home directory is the directory where you are positioned when you login. You can give a directory name also as a parameter for the cd command. The directory name you give can be relative or absolute. In a relative path, you specify where you want to go relative to where you are. In an absolute path, you specify the path starting with a slash “/” character to indicate the root filesystem. From root point onwards, you specify each directory in that order separated by a slash.
    Ex:
    cd
    cd /var/log/squid
    cd ../../var/log/squid

    Shells

    The $ or % prompt that you see when you first log into UNIX is displayed by a special kind of program called a shell – a program that acts as a middleman between you and the raw UNIX operating system. A shell lets you run programs, build pipelines of processes, save output to files, and run more than one program at the same time. A shell executes all the commands that you enter. The most popular shells are the Bourne shell, The Korn shell, the C shell and the Bourne Again shell. All these shells share a similar set of core functionality, together with some specialized properties. The Korn shell is a superset of the Bourne shell and thus users typically no longer use the Bourne shell as their login shell. Each shell has its own programming language. One reasonable question to ask is why would you write a program in a shell language rather than a language like C or Java? The shell languages are tailored to manipulating files and processes in the UNIX system, which makes them more convenient in many situations.

    date command

    date [yymmddhhmm[.ss]]
    The date command prints the current system date. When run with no arguments, it displays the current date and time. If arguments are provided, date sets the date to the setting supplied, where yy is the last tow digits of the year, the first mm is the number of the month, dd is the number of the day, hh is the number of hours using 24-hour clock, and the last mm is the number of minutes. The optional ss is the number of seconds. Only a superuser may set the date.

    clear command 

    This command clears your screen.

    man command

    The man command is used for obtaining online help. There are times when you are at your terminal and you can’t quite remember how to use a particular utility. Alternatively, you may know what a utility does but not remember what it’s called. You may also want to lookup some utility for the exact implementation on the particular installation. The UNIX system has a utility called man (short for “manual page”) that puts information at your fingertips.
    man [ [-s] section ]  word
    man –k keyword
     The manual pages are on-line copes fo the original UNIX documentation, which is usually divided into eight sections. The pages contain information about utilities, system calls, fire formats and shells. When man displays help about a given utility, it indicates in which section the entry appears. The first usage of man displays the manual entry associated with word. A few versions of UNIX use the –s argument to indicate the section number. If no section number is specified, the first entry that man finds is displayed. The second usage of man displays a list of all the manual entries that contain keyword.

    stty command

    The stty  command is used to print and set the special characters for controlling the terminal. Some special characters or character combinations – they are also called metacharacters – are interpreted differently by UNIX. For example, after you start to execute a command in a shell, you want to stop execution of that command. You may press Ctrl-C (control key and C). This will close the program. Similarly, there are a number of such special combinations which control the terminal. The stty command can be used to print the current setting as well as set some of them. For printing the current settings, stty –a will do. The typical controls are erase (backspace one character), kill (erase all of the current line), flush (ignore any pending input and reprint the line), susp (suspend the process for a futre awakening), intr (terminate or interrupt the foreground job with no core dump), quit (terminate the foreground job and generate a core dump), stop (stop/restart terminal output), eof (end of input) etc. 

    passwd command

    The passwd command allows you to change your password. You are prompted for your old password and then twice for the new one. The new password may be stored in an encrypted form in the password file “/etc/passwd” or in a “shadow” file for more security. You can also use this utility to change the password of another user by specifying the userid “passwd userid”.

    Log out

    For log out of the system, press the key sequence Ctrl-D which is end of input. That means, there is nothing more as input for the shell program which is running for you and hence it terminates. Alternatively, the command exit can be used which is valid for many shells.

    Some more commands

    The command pwd can be used for displaying the current working directory.
    The command cat (contatenate) can be used for displaying the contents of a text file.
    The command ls can be used for listing the contents of a directory.
    The command more, page, head, tail can be used for displaying the contents of a text file.
    The command mv can be used to rename a file to another name. It can also be used to move one file from one directory to another directory.
    The command mkdir can be used to create a new directory.
    The command cp can be used to copy files and directories.
    The command vi is used to edit a text file.
    The cd command is used to change directory.
    The rm command can be used to remove a file.
    The rmdir command can be used to remove a directory.
    A text file can be printed using lpr command.
    The wc command can be used to count the characters, words and lines in a text file.
    The file command will help to identify what type of file is the input argument.
    The groups command will help to find out how many groups I am assigned to.
    The command chgrp can be used to change the current group to a new group.
    The command chmod can be used to change the permissions of a file.

    File handling utilities in Unix

    Filtering files
    egrep, fgrep grep
    These are used for searching files for specific patterns.
    ls command is used for listing details about the files.
    Removing duplicate lines: uniq
    Sorting files: sort
    Comparing files: cmp, diff. cmp finds the first byte that differs between two files. diff displays all the differences and similarities between two files.
    Finding files: find
    grep – Global Regular Expression Printer – helps to search for a pattern or expression in a text file. It helps to either print the matching lines or non-matching lines. The command grep is very useful to build text processing applications.
    cut – the command cut is used to selectively cut a column of output from a text file. The column can be decided based on the list of bytes, characters or fields. We can specify one and only one of bytes, characters or fields. List is made up on one range or many ranges separated by commas. For fields, TAB is the default delimiter. Any other delimiter can be specified by using option –d.
     Security by file permissions
    File permissions can be changed or set using chmod command. It works in either incremental mode or absolute mode. Incremental mode either adds or removes a particular permission or permissions. Absolute mode uses a code for setting permissions and then applies the same in one go.
    Incremental mode: chmod u+x mycommands
    chmod g-w mycommands
    chmod o-wx mycommands
    chmod [ugoa][+-=][rwxXst] <file> ; u – user, g – other users in the file,s groups, o – other users not in the file’s group, a – all, r – read, w – write, x- execute, s – set uid or groupid on execution, t – restricted deletion flag or sticky bit.
    chmod 755 mycommands
    File owner can be changed using chown command. You need to specify the owner and the file on which it has to be changed. It can also be used to change the group along with the owner.
    The file group can be changed using chgrp. Specify the new group and the file whose group is to be changed.

    Process utilities in Unix

    The main command for listing processes is ps. It stands for process status. You will get a list of processes running now with the ps command. You will get information about a selection of the active processes. If you need a repetitive update of the selection, use top instead. By default, ps selects all processes with the same effective user ID as the current user and associated with the same terminal as the invoker. It displays the process ID, the terminal associated with the process (tty), the cumulated CPU time and the executable time. To see every process on the system, try
    ps –e OR ps –ef
    We can use the command top to see a continuous update of all processes. This is useful for system administrators to see which all processes take maximum CPU time, maximum memory etc.

    Disk utilities in Unix

    Most commonly used disk utility is df and du. The command df is used to find out how much disk space is there, how much is being used and what percentage is free for each of the partitions and for selected partitions. By giving the option k, it will print the details in kilobytes instead of default blocks.
    The command du is used for finding out how much is the disk usage for a particular file in number of blocks. You may give a file, a directory or any combination of these as the argument.
    The command kill can be used for terminating a process. It takes as its argument the  process id (pid) of the process. You have an option to send a specific signal while invoking the kill command. For example, you can send the SIGKILL (-9) which says kill the process under any circumstances. SIGHUP is used to send a reconfigure command to the service type (daemon) processes.

    Networking Commands in Unix

    The commonly used networking commands are: finger, mesg, write, talk, wall
    The finger command displays information about a list of users that is generated from the following sources: The users home directory, start-up shell and full name are read from the password file /etc/passwd.
    If the user supplies a file called .plan in the home directory, the contents of the file are displayed as the user’s plan.
    If the user supplies a file called .project, in the home directory, the  contents of the file are displayed as the user’s project.
    If no user IDs are listed, finger displays information about every user that is currently logged on.
    The mesg command allows you to protect yourself from others contacting you. You may use it as “mesg [y|n]”. If you say “mesg n” the system will not allow others (except the administrator) to contact you using the communication utilities write, talk and wall.
    The write command allows you to send one line at a time to a named user. You have to use it as “write <userID> [tty]”. Then the receiver is show the message on his/her screen along with who is writing. The receiver could also write back to the sender.
    The talk utility allows you to have a two-way conversation across a network. The format is “talk user@host [tty]”. It displays a message on the user and that user has to respond with talk “fromUser@fromHost [tty]”.
    The wall command helps you to send a message to all users logged in. wall stands for “write all”. You start the wall command, it collects all text typed there until EOF and then sends it to all users logged in.
    The real networking utilities include ifconfig, netstat etc. The ifconfig command is used to configure the network interfaces. The netstat command can be used for listing the active network connections on the system.

    Text processing utilities and backup commands in Unix                             

    Archives: cpio, tar and dump
    cpio is handy for saving small quantities of data, but the single-volume restriction makes it useless for large backups. cpio copies files into or out of a cpio or tar archive, the archive can be another file on the disk, a magnetic tape or a pipe. GNU cpio supports the following archive formats: binary, old ASCII, new ASCII, crc, HPUX binary, HPUX old ASCII, old tar, and POSIX.1 tar. When extracting from archives, cpio automatically recognizes which kind of archive it is reading and can read archives created on machines with a different byte-order.
    In copy-out mode, cpio copies files into an archive. It reads a list of filenames, one per line, on the standard input, and writes the archive onto the standard output.
    In copy-in mode, cpio copies files out of an archive or lists the archive contents. It reads the archive from the standard input.
    In copy-pass mode, cpio copies files from one directory tree to another, combining the copy-out and copy-in steps without actually using an archive.
    tar allows you to save directory structures to a single backup volume. It is designed to save files to tape, so it always archives files at the end of the storage medium. The different options are concatenate (-A), create (-c), append (-r), list and test (-t), update (-u) and extract (-x).
    dump allows you to save a file system to multiple backup volumes. Dump is designed for doing total and incremental backups, but restoring individual files with it is tricky. A dump that is larger than the output medium is broken into multiple volumes.
    split allows us to split files into multiple pieces. We can specify the size of the fractions. It usually creates files with name of input file appended with .aa, .ab, .ac etc.