Lecture Note : Unix Shell Programming Part 1
Introduction to Unix – Architecture of Unix, Features of Unix , Basic Unix Commands – Unix Utilities:- Introduction to unix file system, vi editor, file handling utilities, security by file permissions, process utilities, disk utilities, networking commands – Text processing utilities and backup
Introduction to Unix
UNIX is a popular operating system in the engineering world and has been growing in popularity lately in the business world. Knowledge of its functions and purpose will help you to understand why so many people choose to use UNIX and will make your own use of it more effective.
History of Unix
In 1965, Bell Telephone Laboratories joined an effort with the General Electric Company and Project MAC of the Massachusetts Institute of Technology to develop a new operating system called Multics. The goal of the Multics system were to provide simultaneous computer access to a large community of users, to supply ample computation power and data storage and to allow users to share their data easily, if desired. Although a primitive version of the Multics system was running on a GE 645 computer by 1969, it did not provide the general service computing for which it was intended, nor was it clear when its development goals would be met.
In an attempt to improve Bell Laboratories’s programming environment, Ken Thompson, Dennis Ritchie and others sketched a paper design of a file system that later evolved into an early version of the UNIX file system. Thompson wrote programs that simulated the behavior of the proposed file system and of programs in a demand-paging environment and he even encoded a simple kernel for the GE 645 computer. Later he found that the program was unsatisfactory because it was difficult to control the space ship and the program was expensive to run. Thompson later found a little used PDP-7 computer that provided good graphic display and cheap executing power. Programming Space Travel for the PDP-7 enabled Thompson to learn about the machine, but its environment for program development required cross-assembly of the program on the GECOS machine and carrying paper tape for input to the PDP-7. To create a better development environment, Thompson and Ritchie implemented their system design on the PDP-7, including an early version of the UNIX file system, the process subsystem and a small set of utility programs. Eventually the new system no longer needed the
GECOS system as a development environment but could support itself. The new system was given the name UNIX.
Although this early version of the UNIX system held much promise, it could not realize its potential until it was used in a real project. Thus, while providing a text processing system for the patent department at Bell Laboratories, the UNIX system was moved to a PDP-11 in 1971. The system was characterized by its small size: 16K bytes for the system, 8K bytes for user programs, a disk of 512K bytes, and a limit of 64K bytes per file. After its early success, Thompson set out to implement a Fortran compiler for the new system, but instead came up with the language B. influenced by BCPL. B was an interpretive language with the performance drawbacks implied by such languages, so Ritchie developed it into one he called C, allowing generation of machine code, declaration of data types and definition of data structures. In 1973, the operating system was rewritten in C. The number of installations at Bell Laboratories grew to about 25 and a UNIX Systems Group was formed to provide internal support.
AT&T could not market UNIX as they have signed a Consent Decree with the Federal government in 1956. But they provided UNIX system to universities who requested it for educational purposes. In 1974, Thompson and Ritchie published a paper describing the UNIX system in the Communications of the ACM giving further impetus to its acceptance. By 1977, the number of UNIX system sites had grown to about 500, of which 125 were in the universities. UNIX systems became popular in the operating telephone companies, providing a good environment for program development, network transaction operations services and real-time services. In 1977, the UNIX system was first ported to a non-PDP machine, ie, it is made to run on another machine with few or no changes, the Interdata 8/32.
With the growing popularity of microprocessors, other companies ported the UNIX system to new machines, but its simplicity and clarity tempted many developers to enhance it in their own way, resulting in several variants of the basic system. In the period 1977 to 1982, Bell Laboratories combined several AT&T variants into a single system, known commercially as UNIX System III. Bell Laboratories later added several features to UNIX System III, calling the new product UNIX System V and AT&T announced official support for System V in January 1983. However, people at the University of California at Berkeley had developed a variant to the UNIX system, the most recent of version of which is called 4.3 BSD for VAX machines. By the beginning of 1984, there were about 100,000 UNIX system installations in the world, running on machines with a wide range of computing power from microprocessors to mainframes and on machines across different manufacturers product lines. No other operating system can make that claim.
Architecture of Unix
The figure depicts the high-level architecture of the UNIX system. The hardware at the centre of the diagram provides the operating system with basic services. The operating system interacts directly with the hardware, providing common services to programs and insulating them from hardware idiosyncrasies. Viewing the system as a set of layers, the operating system is commonly called the system kernel or just the kernel emphasizing its isolation from user programs. Because programs are independent of the underlying hardware, it is easy to move them between UNIX systems running on different hardware if the programs do not make assumptions about the underlying hardware.
Programs such as the shell and editors (ed and vi) shown in the outer layers interact with the kernel by invoking a well defined set of system calls. The system calls instruct the kernel to do various operations for the calling program and exchange data between the kernel and the program. Several programs shown in the figure are in standard system configurations and are known as commands, but private user programs may also exist in this layer as indicated by the program whose name is a.out, the standard name for executable files produced by the C compiler. Other application programs can build on top of lower-level programs, hence the existence of the outermost layer in the figure. For example, the standard C compiler, cc, is in the outermost layer of the figure: it invokes a C preprocessor, two-pass compiler, assembler, and loader (link-editor), all separate lower-level programs. Although the figure depicts a two-level hierarchy of application programs, users can extend the hierarchy to whatever levels are appropriate. Indeed, the style of programming favored by the UNIX system encourages the combination of existing programs to accomplish a task.
Many application subsystems and programs that provide a high-level view of the system such as the shell, editors, SCCS and document preparation packages have gradually become synonymous with the name “UNIX system”. However, they all use lower-level services ultimately provided by the kernel, and they avail themselves of these services via the set of system calls.
Features of Unix
Some useful facts concerning UNIX programs and files:
· A file is a collection of data that is usually stored on disk, although some files are stored on tape. UNIX treats peripherals as special files, so that terminals, printers, and other devices are accessible in the same way as disk-based files.
· A program is a collection of bytes representing code and data that are stored in a file.
· When a program is started, it is loaded from disk into RAM. When a program is running, it is called a process.
· Most processes read and write data from files.
· Processes and files have an owner and may be protected against unauthorized access.
· UNIX supports a hierarchical directory structure.
· Files and processes have a location within the directory hierarchy. A process may change its own location or the location of a file.
· UNIX provides services for the creation, modification and destruction of programs, processes and files.
The main features of UNIX are listed below:
· UNIX allows many users to access a computer system at the same time.
· It supports the creation, modification and destruction of programs, processes and files.
· It provides a directory hierarchy that gives a location to processes and files.
· It shares CPUs, memory, and disk space in a fair and efficient manner among competing processes.
· It allows processes and peripherals to talk to each other, even if they are on different machines.
· It comes complete with a large number of standard utilities.
· There are plenty of high-quality, commercially available software packages for most versions of UNIX.
· It allows programmers to access operating features easily via a well-defined set of system calls that are analogous to library routines.
· It is a portable operating system and thus is available on a wide variety of platforms.
Basic Unix Commands – Unix file permissions, process utilities, disk utilities, networking commands
This section introduces the basic Unix commands. The commands which are introduced here are cancel, cat, cd, chgrp, chmod, chown, clear, cp, date, emacs, file, groups, head, lp, lpr, lprm, lpq, lpstat, ls, mail, man, mkdir, more, mv, newgrp, page, passwd, pwd, rm, rmdir, sty, tail tset, vi, wc etc.
Obtaining an account in unix
You need to get an account for working with Unix. If you don’t have your own computer with Unix installed and working, you need to get an account to work with. This may be possible at your work location, the institute where you study etc. You need to contact the system administrator to get an account. The account will have a user id and a password to access the same.
In order to use a UNIX system, you must first log in with a suitable username – a unique name that distinguishes you from the other users of the system. Your user name and initial password are assigned to you by the system administrator or it is set to something standard if you bought your UNIX system. Use these to login to the system. For login, the system will show a screen asking for a login name with a prompt “login: “. There you have to type the username and then press ENTER key. It will prompt with the password “Password: “. Normally whatever you type here will not be shown since it is a secret. If you enter the correct user name and password, the system logs you in and shows a prompt that it is ready to take on commands from you. By convention, the prompt is dollar sign (“$”) or percentage sign (“%”) for normal users and hash character “#” for system administrator. These can be changed by the user, but these are the default values. When a command is issued, the shell executes the command and once the command execution is over, it comes back to the shell. It is again indicated by the prompt which the system shows.
cd command in Unix
The cd command is used for changing directory. If you issue this command without any parameter, it will go to the home directory. Home directory is the directory where you are positioned when you login. You can give a directory name also as a parameter for the cd command. The directory name you give can be relative or absolute. In a relative path, you specify where you want to go relative to where you are. In an absolute path, you specify the path starting with a slash “/” character to indicate the root filesystem. From root point onwards, you specify each directory in that order separated by a slash.
The $ or % prompt that you see when you first log into UNIX is displayed by a special kind of program called a shell – a program that acts as a middleman between you and the raw UNIX operating system. A shell lets you run programs, build pipelines of processes, save output to files, and run more than one program at the same time. A shell executes all the commands that you enter. The most popular shells are the Bourne shell, The Korn shell, the C shell and the Bourne Again shell. All these shells share a similar set of core functionality, together with some specialized properties. The Korn shell is a superset of the Bourne shell and thus users typically no longer use the Bourne shell as their login shell. Each shell has its own programming language. One reasonable question to ask is why would you write a program in a shell language rather than a language like C or Java? The shell languages are tailored to manipulating files and processes in the UNIX system, which makes them more convenient in many situations.
The date command prints the current system date. When run with no arguments, it displays the current date and time. If arguments are provided, date sets the date to the setting supplied, where yy is the last tow digits of the year, the first mm is the number of the month, dd is the number of the day, hh is the number of hours using 24-hour clock, and the last mm is the number of minutes. The optional ss is the number of seconds. Only a superuser may set the date.
This command clears your screen.
The man command is used for obtaining online help. There are times when you are at your terminal and you can’t quite remember how to use a particular utility. Alternatively, you may know what a utility does but not remember what it’s called. You may also want to lookup some utility for the exact implementation on the particular installation. The UNIX system has a utility called man (short for “manual page”) that puts information at your fingertips.
man [ [-s] section ] word
man –k keyword
The manual pages are on-line copes fo the original UNIX documentation, which is usually divided into eight sections. The pages contain information about utilities, system calls, fire formats and shells. When man displays help about a given utility, it indicates in which section the entry appears. The first usage of man displays the manual entry associated with word. A few versions of UNIX use the –s argument to indicate the section number. If no section number is specified, the first entry that man finds is displayed. The second usage of man displays a list of all the manual entries that contain keyword.
The stty command is used to print and set the special characters for controlling the terminal. Some special characters or character combinations – they are also called metacharacters – are interpreted differently by UNIX. For example, after you start to execute a command in a shell, you want to stop execution of that command. You may press Ctrl-C (control key and C). This will close the program. Similarly, there are a number of such special combinations which control the terminal. The stty command can be used to print the current setting as well as set some of them. For printing the current settings, stty –a will do. The typical controls are erase (backspace one character), kill (erase all of the current line), flush (ignore any pending input and reprint the line), susp (suspend the process for a futre awakening), intr (terminate or interrupt the foreground job with no core dump), quit (terminate the foreground job and generate a core dump), stop (stop/restart terminal output), eof (end of input) etc.
The passwd command allows you to change your password. You are prompted for your old password and then twice for the new one. The new password may be stored in an encrypted form in the password file “/etc/passwd” or in a “shadow” file for more security. You can also use this utility to change the password of another user by specifying the userid “passwd userid”.
For log out of the system, press the key sequence Ctrl-D which is end of input. That means, there is nothing more as input for the shell program which is running for you and hence it terminates. Alternatively, the command exit can be used which is valid for many shells.
Some more commands
The command pwd can be used for displaying the current working directory.
The command cat (contatenate) can be used for displaying the contents of a text file.
The command ls can be used for listing the contents of a directory.
The command more, page, head, tail can be used for displaying the contents of a text file.
The command mv can be used to rename a file to another name. It can also be used to move one file from one directory to another directory.
The command mkdir can be used to create a new directory.
The command cp can be used to copy files and directories.
The command vi is used to edit a text file.
The cd command is used to change directory.
The rm command can be used to remove a file.
The rmdir command can be used to remove a directory.
A text file can be printed using lpr command.
The wc command can be used to count the characters, words and lines in a text file.
The file command will help to identify what type of file is the input argument.
The groups command will help to find out how many groups I am assigned to.
The command chgrp can be used to change the current group to a new group.
The command chmod can be used to change the permissions of a file.
File handling utilities in Unix
egrep, fgrep grep
These are used for searching files for specific patterns.
ls command is used for listing details about the files.
Removing duplicate lines: uniq
Sorting files: sort
Comparing files: cmp, diff. cmp finds the first byte that differs between two files. diff displays all the differences and similarities between two files.
Finding files: find
grep – Global Regular Expression Printer – helps to search for a pattern or expression in a text file. It helps to either print the matching lines or non-matching lines. The command grep is very useful to build text processing applications.
cut – the command cut is used to selectively cut a column of output from a text file. The column can be decided based on the list of bytes, characters or fields. We can specify one and only one of bytes, characters or fields. List is made up on one range or many ranges separated by commas. For fields, TAB is the default delimiter. Any other delimiter can be specified by using option –d.
Security by file permissions
File permissions can be changed or set using chmod command. It works in either incremental mode or absolute mode. Incremental mode either adds or removes a particular permission or permissions. Absolute mode uses a code for setting permissions and then applies the same in one go.
Incremental mode: chmod u+x mycommands
chmod g-w mycommands
chmod o-wx mycommands
chmod [ugoa][+-=][rwxXst] <file> ; u – user, g – other users in the file,s groups, o – other users not in the file’s group, a – all, r – read, w – write, x- execute, s – set uid or groupid on execution, t – restricted deletion flag or sticky bit.
chmod 755 mycommands
File owner can be changed using chown command. You need to specify the owner and the file on which it has to be changed. It can also be used to change the group along with the owner.
The file group can be changed using chgrp. Specify the new group and the file whose group is to be changed.
Process utilities in Unix
The main command for listing processes is ps. It stands for process status. You will get a list of processes running now with the ps command. You will get information about a selection of the active processes. If you need a repetitive update of the selection, use top instead. By default, ps selects all processes with the same effective user ID as the current user and associated with the same terminal as the invoker. It displays the process ID, the terminal associated with the process (tty), the cumulated CPU time and the executable time. To see every process on the system, try
ps –e OR ps –ef
We can use the command top to see a continuous update of all processes. This is useful for system administrators to see which all processes take maximum CPU time, maximum memory etc.
Disk utilities in Unix
Most commonly used disk utility is df and du. The command df is used to find out how much disk space is there, how much is being used and what percentage is free for each of the partitions and for selected partitions. By giving the option k, it will print the details in kilobytes instead of default blocks.
The command du is used for finding out how much is the disk usage for a particular file in number of blocks. You may give a file, a directory or any combination of these as the argument.
The command kill can be used for terminating a process. It takes as its argument the process id (pid) of the process. You have an option to send a specific signal while invoking the kill command. For example, you can send the SIGKILL (-9) which says kill the process under any circumstances. SIGHUP is used to send a reconfigure command to the service type (daemon) processes.
Networking Commands in Unix
The commonly used networking commands are: finger, mesg, write, talk, wall
The finger command displays information about a list of users that is generated from the following sources: The users home directory, start-up shell and full name are read from the password file /etc/passwd.
If the user supplies a file called .plan in the home directory, the contents of the file are displayed as the user’s plan.
If the user supplies a file called .project, in the home directory, the contents of the file are displayed as the user’s project.
If no user IDs are listed, finger displays information about every user that is currently logged on.
The mesg command allows you to protect yourself from others contacting you. You may use it as “mesg [y|n]”. If you say “mesg n” the system will not allow others (except the administrator) to contact you using the communication utilities write, talk and wall.
The write command allows you to send one line at a time to a named user. You have to use it as “write <userID> [tty]”. Then the receiver is show the message on his/her screen along with who is writing. The receiver could also write back to the sender.
The talk utility allows you to have a two-way conversation across a network. The format is “talk [email protected] [tty]”. It displays a message on the user and that user has to respond with talk “[email protected] [tty]”.
The wall command helps you to send a message to all users logged in. wall stands for “write all”. You start the wall command, it collects all text typed there until EOF and then sends it to all users logged in.
The real networking utilities include ifconfig, netstat etc. The ifconfig command is used to configure the network interfaces. The netstat command can be used for listing the active network connections on the system.
Text processing utilities and backup commands in Unix
Archives: cpio, tar and dump
cpio is handy for saving small quantities of data, but the single-volume restriction makes it useless for large backups. cpio copies files into or out of a cpio or tar archive, the archive can be another file on the disk, a magnetic tape or a pipe. GNU cpio supports the following archive formats: binary, old ASCII, new ASCII, crc, HPUX binary, HPUX old ASCII, old tar, and POSIX.1 tar. When extracting from archives, cpio automatically recognizes which kind of archive it is reading and can read archives created on machines with a different byte-order.
In copy-out mode, cpio copies files into an archive. It reads a list of filenames, one per line, on the standard input, and writes the archive onto the standard output.
In copy-in mode, cpio copies files out of an archive or lists the archive contents. It reads the archive from the standard input.
In copy-pass mode, cpio copies files from one directory tree to another, combining the copy-out and copy-in steps without actually using an archive.
tar allows you to save directory structures to a single backup volume. It is designed to save files to tape, so it always archives files at the end of the storage medium. The different options are concatenate (-A), create (-c), append (-r), list and test (-t), update (-u) and extract (-x).
dump allows you to save a file system to multiple backup volumes. Dump is designed for doing total and incremental backups, but restoring individual files with it is tricky. A dump that is larger than the output medium is broken into multiple volumes.
split allows us to split files into multiple pieces. We can specify the size of the fractions. It usually creates files with name of input file appended with .aa, .ab, .ac etc.