Linux Command - Text Processing

cat - Concatenate Files and Print on Standard Output

  • Using cat as a primitive word processor.

    • You can enter the below command, type your text, press ENTER to finish the line, and then press CTRL-D to indicate the end-of-file.
bharatwaj@comp:~$ cat > foo.txt
    Hey there!!!
  • Use cat with the -A option to display the text

    • The ^I represents the tab character (CTRL-I), and the $ marks the end of the line, showing any trailing spaces in the text.

    • This could help us spot hidden carriage.

bharatwaj@comp:~$ cat -A foo.txt
^IHey there!!!$
  • The cat command has options to modify text

    • -n: Numbers the lines.

    • -s: Suppresses extra blank lines (reduces consecutive empty lines to one)

    bharatwaj@comp:~$ cat > foo.txt
    The quick brown fox



    jumped over the lazy dog
    bharatwaj@comp:~$ cat -ns foo.txt
         1  The quick brown fox
         2
         3  jumped over the lazy dog

sort - Sort Lines of Text Files

  • The sort program sorts the contents of standard input, or one or more files specified on the command line, and sends the results to standard output.
bharatwaj@comp:~$ sort > foo.txt
c
a
b
bharatwaj@comp:~$ cat foo.txt
a
b
c
  • You can use sort with multiple files to merge and sort them.
bharatwaj@comp:~$ sort file1.txt file2.txt file3.txt > final_sorted_list.txt
  • Using the -nr options sorts the results in reverse numerical order, with the largest values listed first. This works because the numerical values appear at the start of each line.
bharatwaj@comp:~$ du -s /usr/share/* | sort -nr | head
36588   /usr/share/vim
26232   /usr/share/locale
20548   /usr/share/perl
20288   /usr/share/doc
16980   /usr/share/man
16580   /usr/share/i18n
6108    /usr/share/X11
6000    /usr/share/mime
5912    /usr/share/zoneinfo
4320    /usr/share/fonts
  • To sort the output of ls -l by a specific value within the line (like file size), we can use the sort command with the -k option to specify the column (in this case, the 5th column, which contains file sizes). The -nr options sort the list in reverse numerical order, with the largest files appearing first.

    • -n: Sorts numerically.

    • -r: Reverses the order (largest first).

    • -k 5: Sorts by the 5th column (file size).

bharatwaj@comp:~$ ls -l /usr/share/* | sort -nr -k 5 | head
-rw-r--r-- 1 root root 1299875 Jan 22  2022 pci.ids
-rwxr-xr-x 1 root root 254484 May 20  2024 gitweb.cgi
-rw-r--r-- 1 root root 237878 Feb  7  2022 coreutils.info.gz
-rw-r--r-- 1 root root 236848 Dec  7  2021 public_suffix_list.dat
-rw-r--r-- 1 root root 139520 May  2  2023 mime.cache
-rw-r--r--  1 root root 116337 Feb 21  2024 tzdata.zi
-rw-r--r-- 1 root root 101908 Jul  4  2022 gnupg-module-overview.png
-rw-r--r-- 1 root root  91538 Jul  4  2022 gnupg.info-1.gz
-rw-r--r-- 1 root root  90573 Mar 23  2022 find.info-1.gz
-rw-r--r-- 1 root root 77071 Nov 16  2021 bash_completion
  • We have a file distros.txt containing Linux distribution names, version numbers, and release dates.
Fedora 10 11/25/2008
SUSE 11.0 06/19/2008
Ubuntu 8.04 04/24/2008
Fedora 8 11/08/2007
SUSE 10.3 10/04/2007
Ubuntu 6.10 10/26/2006
Fedora 7 05/31/2007
Ubuntu 7.10 10/18/2007
Ubuntu 7.04 04/19/2007
SUSE 10.1 05/11/2006
Fedora 6 10/24/2006
Fedora 9 05/13/2008
Ubuntu 6.06 06/01/2006
Ubuntu 8.10 10/30/2008
Fedora 5 03/20/2006

After using the sort command, it sorts the lines alphabetically by default, which results in the following:

bharatwaj@comp:~$ sort distros.txt
Fedora 10 11/25/2008
Fedora 5 03/20/2006
Fedora 6 10/24/2006
Fedora 7 05/31/2007
Fedora 8 11/08/2007
Fedora 9 05/13/2008
SUSE 10.1 05/11/2006
SUSE 10.2 12/07/2006
SUSE 10.3 10/04/2007
SUSE 11.0 06/19/2008
Ubuntu 6.06 06/01/2006
Ubuntu 6.10 10/26/2006
Ubuntu 7.04 04/19/2007
Ubuntu 7.10 10/18/2007
Ubuntu 8.04 04/24/2008
Ubuntu 8.10 10/30/2008

The Fedora version numbers don't sort correctly. Since sort compares characters lexicographically (i.e., alphabetically), it places Fedora 10 before Fedora 5 because 1 (from "10") comes before 5 in the character set.

To fix the sorting issue, we need to sort by multiple keys: first alphabetically by the distribution name (field 1), and then numerically by the version number (field 2). The sort command allows multiple -k options to specify multiple keys.

This command works as follows:

  • --key=1,1 sorts by the first field (distribution name) alphabetically.

  • --key=2n sorts by the second field (version number) numerically.

bharatwaj@comp:~$ sort --key=1,1 --key=2,2n distros.txt
Fedora 5 03/20/2006
Fedora 6 10/24/2006
Fedora 7 05/31/2007
Fedora 8 11/08/2007
Fedora 9 05/13/2008
Fedora 10 11/25/2008
SUSE 10.1 05/11/2006
SUSE 10.2 12/07/2006
SUSE 10.3 10/04/2007
SUSE 11.0 06/19/2008
Ubuntu 6.06 06/01/2006
Ubuntu 6.10 10/26/2006
Ubuntu 7.04 04/19/2007
Ubuntu 7.10 10/18/2007
Ubuntu 8.04 04/24/2008
Ubuntu 8.10 10/30/2008
  • To sort dates that are in the American format of MM/DD/YYYY (e.g., 11/25/2008) into chronological order, we need to rearrange the date components to match the ISO format (YYYY-MM-DD), which is easier for sorting.

sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt breaks the date into parts and sorts them numerically:

  • -k 3.7: Sorts by the year (starting at character 7 of field 3, which is the beginning of the year in MM/DD/YYYY).

  • -k 3.1: Sorts by the month (starting at character 1 of field 3, which is the beginning of the month in MM/DD/YYYY).

  • -k 3.4: Sorts by the day (starting at character 4 of field 3, which is the beginning of the day in MM/DD/YYYY).

  • n: Numeric sort to ensure that numbers are compared as numbers (e.g., 10 is greater than 9).

  • b: Ignores leading spaces for cleaner sorting.

By using -k 3.7, we direct sort to begin at the 7th character of the third field, which is the year part of the date. Similarly, -k 3.1 and -k 3.4 are used to isolate the month and day portions of the date, respectively. The n option ensures numeric sorting, while the r option reverses the order. The b option is included to eliminate any leading spaces, ensuring a more accurate and consistent sort, particularly when spaces vary across lines.

bharatwaj@comp:~$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt
Fedora 10 11/25/2008
Ubuntu 8.10 10/30/2008
SUSE 11.0 06/19/2008
Fedora 9 05/13/2008
Ubuntu 8.04 04/24/2008
Fedora 8 11/08/2007
Ubuntu 7.10 10/18/2007
SUSE 10.3 10/04/2007
Fedora 7 05/31/2007
Ubuntu 7.04 04/19/2007
SUSE 10.2 12/07/2006
Ubuntu 6.10 10/26/2006
Fedora 6 10/24/2006
Ubuntu 6.06 06/01/2006
SUSE 10.1 05/11/2006
Fedora 5 03/20/2006
  • Sort with different delimiter

    • -t ':': Specifies that the fields in the /etc/passwd file are separated by colons (:).

    • -k 7: Tells sort to use the seventh field (the default shell) as the key for sorting.

bharatwaj@comp:~$ sort -t ':' -k 7 /etc/passwd | head
bharatwaj:x:1000:1000:,,,:/home/bharatwaj:/bin/bash
root:x:0:0:root:/root:/bin/bash
sync:x:4:65534:sync:/bin:/bin/sync
_apt:x:105:65534::/nonexistent:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
games:x:5:60:games:/usr/games:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin

uniq - Report or Omit Repeated Lines

  • When given a sorted file (including standard input), it removes any duplicate lines and sends the results to standard output. It is often used in conjunction with sort to clean the output of duplicates
bharatwaj@comp:~$ cat > foo.txt
c
a
b
a
b
c
bharatwaj@comp:~$ uniq foo.txt
c
a
b
a
b
c

For uniq to actually do its job, the input must be sorted first. This is because uniq only removes duplicate lines that are adjacent to each other.

bharatwaj@comp:~$ sort foo.txt | uniq
a
b
c
  • Output a list of duplicate lines preceded by the number of times the line occurs.
bharatwaj@comp:~$ sort foo.txt | uniq -c
      2 a
      2 b
      2 c

cut - Remove Sections from Each Line of Files

  • The cut program is used to extract specific parts of a line of text and display them. It can take input from one or more files or from standard input.

The cut program works best with files that are generated by other programs, rather than text typed manually, because it expects a consistent format. To check if a file is suitable for cut, like the distros.txt file, you can use cat -A to see if the file has tab-separated fields, which is what cut relies on. (The default delimiter is tab)

bharatwaj@comp:~$ cat -A distros.txt
SUSE 10.2 12/07/2006 $
Fedora 10 11/25/2008 $
SUSE 11.0 06/19/2008 $
Ubuntu 8.04 04/24/2008 $
Fedora 8 11/08/2007 $
SUSE 10.3 10/04/2007 $
Ubuntu 6.10 10/26/2006 $
Fedora 7 05/31/2007 $
Ubuntu 7.10 10/18/2007 $
Ubuntu 7.04 04/19/2007 $
SUSE 10.1 05/11/2006 $
Fedora 6 10/24/2006$
Fedora 9 05/13/2008 $
Ubuntu 6.06 06/01/2006 $
Ubuntu 8.10 10/30/2008 $
Fedora 5 03/20/2006 $

But in our case the delimiter is (a space) so

  • -d " ": Sets the delimiter to a space (" "), meaning it will split each line into parts wherever there is a space.

  • -f 3: Tells cut to select the third part (field) from each line.

bharatwaj@comp:~$ cut -d " " -f 3 distros.txt
12/07/2006
11/25/2008
06/19/2008
04/24/2008
11/08/2007
10/04/2007
10/26/2006
05/31/2007
10/18/2007
04/19/2007
05/11/2006
10/24/2006
05/13/2008
06/01/2006
10/30/2008
03/20/2006
  • Extracting year from each line

    • This second cut command takes the output from the first cut command.

    • It extracts characters 7 through 10 (-c 7-10) from each line of the previous output.

bharatwaj@comp:~$ cut -d " " -f 3 distros.txt | cut -c 7-10
2006
2008
2008
2008
2007
2007
2006
2007
2007
2007
2006
2006
2008
2006
2008
2006
  • cut on a file

    • -d ':': This sets the delimiter to a colon (:), meaning it will split each line in the /etc/passwd file wherever there is a colon.

    • -f 1: This selects the first field (part before the first colon) from each line. In /etc/passwd, the first field is typically the username.

bharatwaj@comp:~$ cut -d ':' -f 1 /etc/passwd | head
root
daemon
bin
sys
sync
games
man
lp
mail
news

paste - Merge Lines of Files

  • The paste command does the opposite of cut. Rather than extracting a column of text from a file, it adds one or more columns of text to a file.

  • It does this by reading multiple files and combining the fields found in each file into a single stream of standard output. Like cut, paste accepts multiple file arguments and/or standard input

To demonstrate how paste operates, we will perform some surgery on our distros.txt file to produce a chronological list of releases.

From our earlier work with sort, we will first produce a list of distros sorted by date and store the result in a file called distros-by-date.txt:

bharatwaj@comp:~$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt > distros-by-date.txt

Next, we will use cut to extract the first two fields from the file (the distro name and version) and store that result in a file named distro-versions.txt

bharatwaj@comp:~$ cut -d " " -f 1,2 distros-by-date.txt > distros-versions.txt
bharatwaj@comp:~$ head distros-versions.txt
Fedora 10
Ubuntu 8.10
SUSE 11.0
Fedora 9
Ubuntu 8.04
Fedora 8
Ubuntu 7.10
SUSE 10.3
Fedora 7
Ubuntu 7.04

The final piece of preparation is to extract the release dates and store them a file named distro-dates.txt:

bharatwaj@comp:~$ cut -d " " -f 3 distros-by-date.txt > distros-dates.txt
bharatwaj@comp:~$ head distros-dates.txt
11/25/2008
10/30/2008
06/19/2008
05/13/2008
04/24/2008
11/08/2007
10/18/2007
10/04/2007
05/31/2007
04/19/2007

We now have the parts we need. To complete the process, use paste to put the column of dates ahead of the distro names and versions, thus creating a chronological list. This is done simply by using paste and ordering its arguments in the desired arrangement.

  • -d " ": This specifies that a space character should be used to separate the fields when pasting the lines together.
bharatwaj@comp:~$ paste -d " " distros-dates.txt distros-versions.txt
11/25/2008 Fedora 10
10/30/2008 Ubuntu 8.10
06/19/2008 SUSE 11.0
05/13/2008 Fedora 9
04/24/2008 Ubuntu 8.04
11/08/2007 Fedora 8
10/18/2007 Ubuntu 7.10
10/04/2007 SUSE 10.3
05/31/2007 Fedora 7
04/19/2007 Ubuntu 7.04
12/07/2006 SUSE 10.2
10/26/2006 Ubuntu 6.10
10/24/2006 Fedora 6
06/01/2006 Ubuntu 6.06
05/11/2006 SUSE 10.1
03/20/2006 Fedora 5

join - Join Lines of Two Files on a Common Field

In some ways, join is like paste in that it adds columns to a file, but it does so in a unique way. A join is an operation usually associated with relational databases where data from multiple tables with a shared key field is combined to form a desired result. The join program performs the same operation. It joins data from multiple files based on a shared key field.

To demonstrate the join program, we’ll need to make a couple of files with a shared key. To do this, we will use our distros-by-date.txt file. From this file, we will construct two additional files. One contains the release dates (which will be our shared key field for this demonstration) and the release names

bharatwaj@comp:~$ cut -d " " -f 1,1 distros-by-date.txt > distros-names.txt
bharatwaj@comp:~$ paste distros-dates.txt distros-names.txt > distros-key-names.txt
bharatwaj@comp:~$ head distros-key-names.txt
11/25/2008      Fedora
10/30/2008      Ubuntu
06/19/2008      SUSE
05/13/2008      Fedora
04/24/2008      Ubuntu
11/08/2007      Fedora
10/18/2007      Ubuntu
10/04/2007      SUSE
05/31/2007      Fedora
04/19/2007      Ubuntu

The second file contains the release dates and the version numbers:

bharatwaj@comp:~$ cut -d " " -f 2,2 distros-by-date.txt > distros-vernums.txt
bharatwaj@comp:~$ paste distros-dates.txt distros-vernums.txt > distros-key-vernums.txt
bharatwaj@comp:~$ head distros-key-vernums.txt
11/25/2008      10
10/30/2008      8.10
06/19/2008      11.0
05/13/2008      9
04/24/2008      8.04
11/08/2007      8
10/18/2007      7.10
10/04/2007      10.3
05/31/2007      7
04/19/2007      7.04

We now have two files with a shared key (the “release date” field). It is important to point out that the files must be sorted on the key field for join to work properly.

  • join command: This merges two files based on a common key (the first column in each file).

    • distros-key-names.txt: Contains the release date and distribution name (e.g., 11/25/2008 Fedora).

    • distros-key-vernums.txt: Contains the release date and version number (e.g., 11/25/2008 10).

  • What join does: It looks for lines in both files where the release date (the first column) is the same. For those matching dates, it combines the corresponding lines from both files, merging the distribution name and version number into a single line.

bharatwaj@comp:~$ join distros-key-names.txt distros-key-vernums.txt | head
11/25/2008 Fedora 10
10/30/2008 Ubuntu 8.10
06/19/2008 SUSE 11.0
05/13/2008 Fedora 9
04/24/2008 Ubuntu 8.04
11/08/2007 Fedora 8
10/18/2007 Ubuntu 7.10
10/04/2007 SUSE 10.3
05/31/2007 Fedora 7
04/19/2007 Ubuntu 7.04

Note also that, by default, join uses whitespace as the input field delimiter and a single space as the output field delimiter. This behavior can be modified by specifying options. See the join man page for details.


comm - Compare Two Sorted Files Line by Line

The comm program compares two text files, displaying the lines that are unique to each one and the lines they have in common. To demonstrate, we will create two nearly identical text files using cat

bharatwaj@comp:~$ cat > file1.txt
a
b
c
d
bharatwaj@comp:~$ cat > file2.txt
b
c
d
e

We will compare the two files using comm

bharatwaj@comp:~$ comm file1.txt file2.txt
a
                b
                c
                d
        e

As we can see, comm produces three columns of output. The first column contains lines unique to the first file argument; the second column, the lines unique to the second file argument; and the third column, the lines shared by both files.

comm supports options in the form -n where n is either 1, 2, or 3. When used, these options specify which column(s) to suppress. For example, if we wanted to output only the lines shared by both files, we would suppress the output of columns 1 and 2:

bharatwaj@comp:~$ comm -12 file1.txt file2.txt
b
c
d

diff - Compare Files Line by Line

diff is a tool used to compare files and identify differences, often used by developers to track changes in source code and create patch files for updating versions.

If we use diff to look at our previous example files, we see its default style of output: a terse description of the differences between the two files.

bharatwaj@comp:~$ diff file1.txt file2.txt
1d0
< a
4a4
> e

In the default format, diff shows changes with a command indicating the range and type of modifications needed to transform one file into another.

Change DescriptionExplanation
r1ar2Append the lines at position r2 in the second file to position r1 in the first file.
r1cr2Change (replace) the lines at position r1 in the first file with the lines at position r2 in the second file.
r1dr2Delete the lines at position r1 in the first file that would have appeared at range r2 in the second file.

When viewed using the context format (the -c option), the output looks like this:

bharatwaj@comp:~$ diff -c file1.txt file2.txt
*** file1.txt   2024-12-14 22:14:07.048924575 +0530
--- file2.txt   2024-12-14 22:14:18.048923011 +0530
***************
*** 1,4 ****
- a
  b
  c
  d
--- 1,4 ----
  b
  c
  d
+ e

The output begins with the names of the two files and their timestamps. The first file is marked with asterisks, and the second file is marked with dashes. Throughout the remainder of the listing, these markers will signify their respective files. Next, we see groups of changes, including the default number of surrounding context lines. In the first group, we see *** 1,4 ****`, which indicates lines 1 through 4 in the first file. Later we see --- 1,4 ----, which indicates lines 1 through 4 in the second file. Within a change group, lines begin with one of four indicators, as shown in below table

IndicatorMeaning
(none)A line shown for context, indicating no difference between the two files.
-A line deleted, appearing in the first file but not in the second.
+A line added, appearing in the second file but not in the first.
!A line changed, with both versions displayed in their respective sections.

The unified format is similar to the context format but is more concise. It is specified with the -u option:

bharatwaj@comp:~$ diff -u file1.txt file2.txt
--- file1.txt   2024-12-14 22:14:07.048924575 +0530
+++ file2.txt   2024-12-14 22:14:18.048923011 +0530
@@ -1,4 +1,4 @@
-a
 b
 c
 d
+e

The most notable difference between the context and unified formats is the elimination of the duplicated lines of context, making the results of the unified format shorter than those of the context format. In our example above, we see file timestamps like those of the context format, followed by the string @@ -1,4 +1,4 @@. This indicates the lines in the first file and the lines in the second file described in the change group. Following this are the lines themselves, with the default three lines of context. As shown in below table, each line starts with one of three possible characters.

CharacterMeaning
(none)This line is shared by both files.
-This line was removed from the first file.
+This line was added to the first file.

patch - Apply a diff to an Original

The patch program is used to apply changes to text files. It accepts output from diff

and is generally used to convert older version of files into newer versions.

bharatwaj@comp:~$ cat file1.txt
a
b
c
d
bharatwaj@comp:~$ cat file2.txt
b
c
d
e
bharatwaj@comp:~$  diff -Naur file1.txt file2.txt > patchfile.txt
bharatwaj@comp:~$ patch < patchfile.txt
patching file file1.txt
bharatwaj@comp:~$ cat file1.txt
b
c
d
e

we created a diff file named patchfile.txt and then used the patch program to apply the patch. Note that we did not have to specify a target file to patch, as the diff file (in unified format) already contains the filenames in the header. Once the patch is applied, we can see that file1.txt now matches file2.txt.

patch has a large number of options, and additional utility programs can be used to analyze and edit patches.


tr - Transliterate or Delete Characters

The tr program is used to transliterate characters. We can think of this as a sort of character-based search-and-replace operation. Transliteration is the process of changing characters from one alphabet to another.

For example, converting characters from lowercase to uppercase is transliteration. We can perform such a conversion with tr as follows:

  • tr a-z A-Z: The tr command is used to translate or replace characters. In this case:

    • a-z represents the range of lowercase letters from 'a' to 'z'.

    • A-Z represents the range of uppercase letters from 'A' to 'Z’.

bharatwaj@comp:~$ echo "lowercase letters" | tr a-z A-Z
LOWERCASE LETTERS

Another example,

bharatwaj@comp:~$  echo "lowercase letters" | tr [:lower:] A
AAAAAAAAA AAAAAAA

tr command could be used to convert a DOS-style text file (which uses both carriage return \r and line feed \n for line breaks) into a Unix-style text file (which only uses \n for line breaks).

  • tr -d '\r': This tells the tr command to delete the carriage return characters (\r) from the file.

  • < dos_file: This reads the content of the dos_file.

  • > unix_file: This writes the output (with the carriage returns removed) to the unix_file.

bharatwaj@comp:~$ tr -d '\r' < dos_file > unix_file

Used in ROT13 encoding of text.

bharatwaj@comp:~$ echo "secret text" | tr a-zA-Z n-za-mN-ZA-M
frperg grkg
bharatwaj@comp:~$ echo "frperg grkg" | tr a-zA-Z n-za-mN-ZA-M
secret text

Using the -s option, tr can “squeeze” (delete) repeated instances of a character:

  • By specifying the set ab to tr, we eliminate the repeated instances of the letters in the set, while leaving the character that is missing from the set (c) unchanged.
bharatwaj@comp:~$  echo "aaabbbccc" | tr -s ab
abccc

Note that the repeating characters must be adjoining. If they are not, the squeezing will have no effect:

bharatwaj@comp:~$ echo "abcabcabc" | tr -s ab
abcabcabc

sed - Stream Editor for Filtering and Transforming Text

The name sed is short for stream editor. It performs text editing on a stream of text, either a set of specified files or standard input.

  • The expression 's/front/back/' tells sed to perform a substitution:

    • s stands for substitute.

    • front is the pattern to search for in the input.

    • back is the replacement text.

bharatwaj@comp:~$ echo "front" | sed 's/front/back/'
back

The choice of the delimiter character is arbitrary. By convention, the slash character is often used, but sed will accept any character that immediately follows the command as the delimiter. We could perform the same command this way:

bharatwaj@comp:~$ echo "front" | sed 's_front_back_'
back

Most commands in sed may be preceded by an address, which specifies which line(s) of the input stream will be edited. If the address is omitted, then the editing command is carried out on every line in the input stream.

bharatwaj@comp:~$ echo -e "front\nfront" | sed 's/front/back/'
back
back
bharatwaj@comp:~$ echo -e "front\nfront" | sed '1s/front/back/'
back
front
bharatwaj@comp:~$ echo -e "front\nfront" | sed '2s/front/back/'
front
back
bharatwaj@comp:~$ echo -e "front\nfront" | sed '3s/front/back/'
front
front

sed Address Notation

AddressDescription
nA specific line number, where n is a positive integer (e.g., 1 for the first line).
$Represents the last line of the input stream.
/regexp/Lines matching a regular expression (POSIX basic regular expression). The regex is usually delimited by slashes (/), but you can use an alternate delimiter (\cregexpc), where c is the chosen delimiter.
addr1,addr2A range of lines from addr1 to addr2, inclusive (e.g., 1,5 for lines 1 through 5).
first~stepMatches the first line and then every subsequent line at step intervals. For example, 1~2 matches every odd-numbered line, and 5~5 matches the fifth line and every fifth line after that.
addr1,+nMatches addr1 and the following n lines (e.g., 2,+3 matches lines 2, 3, 4).
addr!Matches all lines except the specified address (e.g., 1! matches all lines except the first).
  • The command sed -n '1,5p' distros.txt tells sed to print only lines 1 through 5 from the file distros.txt. Here's a breakdown:

    1. -n: This option tells sed to suppress automatic printing of all lines. Normally, sed prints each line of input by default, but with -n, only lines explicitly instructed to be printed will be shown.

    2. '1,5p':

      • 1,5 specifies the range of lines (lines 1 through 5).

      • p stands for print, meaning sed will print these specified lines.

bharatwaj@comp:~$ sed -n '1,5p' distros.txt
SUSE 10.2 12/07/2006
Fedora 10 11/25/2008
SUSE 11.0 06/19/2008
Ubuntu 8.04 04/24/2008
Fedora 8 11/08/2007

Regular Expression:

  • By including the slash-delimited regular expression /SUSE/, we are able to isolate the lines containing it in much the same manner as grep.
bharatwaj@comp:~$ sed -n '/SUSE/p' distros.txt
SUSE 10.2 12/07/2006
SUSE 11.0 06/19/2008
SUSE 10.3 10/04/2007
SUSE 10.1 05/11/2006
  • We’ll try negation by adding an exclamation point (!) to the address
bharatwaj@comp:~$ sed -n '/SUSE/!p' distros.txt
Fedora 10 11/25/2008
Ubuntu 8.04 04/24/2008
Fedora 8 11/08/2007
Ubuntu 6.10 10/26/2006
Fedora 7 05/31/2007
Ubuntu 7.10 10/18/2007
Ubuntu 7.04 04/19/2007
Fedora 6 10/24/2006
Fedora 9 05/13/2008
Ubuntu 6.06 06/01/2006
Ubuntu 8.10 10/30/2008
Fedora 5 03/20/2006

There are further more commands please refer to the documentation

CommandDescriptionExample
=Output current line numbersed '=' distros.txt
aAppend text after the current linesed '2a This is a new line' distros.txt
dDelete the current linesed '3d' distros.txt
iInsert text in front of the current linesed '2i This is a new line' distros.txt
pPrint the current line (with -n option to suppress default)sed -n '2p' distros.txt
qExit sed without processing further linessed '3q' distros.txt
s/regexp/replacement/Substitute replacement for regexp in the input text. Can use & for matched text and \1-\9 for backreferences.sed 's/front/back/' distros.txt
y/set1/set2/Perform transliteration of characters in set1 to set2. Both sets must have the same length.sed 'y/abc/xyz/'

Another feature of the s command is the use of optional flags that may follow the replacement string. The most important of these is the g flag, which instructs sed to apply the search and replace globally to a line, not just to the first instance, which is the default

bharatwaj@comp:~$ echo "aaabbbccc" | sed 's/b/B/'
aaaBbbccc

The above will only change the first "b" to a "B". To replace all occurrences of the pattern in the line, you can use the g flag, which stands for "global".

bharatwaj@comp:~$ echo "aaabbbccc" | sed 's/b/B/g'
aaaBBBccc

aspell - Interactive Spell Checker

aspell is a command-line tool for checking and correcting spelling in text files.

bharatwaj@comp:~$ cat > foo.txt
The quick brown fox jimped over the laxy dog.

we’ll check the file using aspell:

bharatwaj@comp:~$ aspell check foo.txt

If we enter 1, aspell replaces the offending word with the word jumped and moves on to the next misspelled word, which is laxy. If we select the replacement lazy, aspell replaces it and terminates. Once aspell has finished, we can examine our file and see that the misspellings have been corrected.

bharatwaj@comp:~$ cat foo.txt
The quick brown fox jumped over the lazy dog.

Relevant Book:

The Linux Command Line - A Complete Introduction

Disclaimer: This is a personal blog that might come in handy when I suffer from Dementia in future.