Linux Command - Text Processing
cat - Concatenate Files and Print on Standard Output
Using cat as a primitive word processor.
- You can enter the below command, type your text, press ENTER to finish the line, and then press CTRL-D to indicate the end-of-file.
bharatwaj@comp:~$ cat > foo.txt
Hey there!!!
Use cat with the -A option to display the text
The
^I
represents the tab character (CTRL-I), and the$
marks the end of the line, showing any trailing spaces in the text.This could help us spot hidden carriage.
bharatwaj@comp:~$ cat -A foo.txt
^IHey there!!!$
The
cat
command has options to modify text-n
: Numbers the lines.-s
: Suppresses extra blank lines (reduces consecutive empty lines to one)
bharatwaj@comp:~$ cat > foo.txt
The quick brown fox
jumped over the lazy dog
bharatwaj@comp:~$ cat -ns foo.txt
1 The quick brown fox
2
3 jumped over the lazy dog
sort - Sort Lines of Text Files
- The sort program sorts the contents of standard input, or one or more files specified on the command line, and sends the results to standard output.
bharatwaj@comp:~$ sort > foo.txt
c
a
b
bharatwaj@comp:~$ cat foo.txt
a
b
c
- You can use
sort
with multiple files to merge and sort them.
bharatwaj@comp:~$ sort file1.txt file2.txt file3.txt > final_sorted_list.txt
- Using the
-nr
options sorts the results in reverse numerical order, with the largest values listed first. This works because the numerical values appear at the start of each line.
bharatwaj@comp:~$ du -s /usr/share/* | sort -nr | head
36588 /usr/share/vim
26232 /usr/share/locale
20548 /usr/share/perl
20288 /usr/share/doc
16980 /usr/share/man
16580 /usr/share/i18n
6108 /usr/share/X11
6000 /usr/share/mime
5912 /usr/share/zoneinfo
4320 /usr/share/fonts
To sort the output of
ls -l
by a specific value within the line (like file size), we can use thesort
command with the-k
option to specify the column (in this case, the 5th column, which contains file sizes). The-nr
options sort the list in reverse numerical order, with the largest files appearing first.-n
: Sorts numerically.-r
: Reverses the order (largest first).-k 5
: Sorts by the 5th column (file size).
bharatwaj@comp:~$ ls -l /usr/share/* | sort -nr -k 5 | head
-rw-r--r-- 1 root root 1299875 Jan 22 2022 pci.ids
-rwxr-xr-x 1 root root 254484 May 20 2024 gitweb.cgi
-rw-r--r-- 1 root root 237878 Feb 7 2022 coreutils.info.gz
-rw-r--r-- 1 root root 236848 Dec 7 2021 public_suffix_list.dat
-rw-r--r-- 1 root root 139520 May 2 2023 mime.cache
-rw-r--r-- 1 root root 116337 Feb 21 2024 tzdata.zi
-rw-r--r-- 1 root root 101908 Jul 4 2022 gnupg-module-overview.png
-rw-r--r-- 1 root root 91538 Jul 4 2022 gnupg.info-1.gz
-rw-r--r-- 1 root root 90573 Mar 23 2022 find.info-1.gz
-rw-r--r-- 1 root root 77071 Nov 16 2021 bash_completion
- We have a file
distros.txt
containing Linux distribution names, version numbers, and release dates.
Fedora 10 11/25/2008
SUSE 11.0 06/19/2008
Ubuntu 8.04 04/24/2008
Fedora 8 11/08/2007
SUSE 10.3 10/04/2007
Ubuntu 6.10 10/26/2006
Fedora 7 05/31/2007
Ubuntu 7.10 10/18/2007
Ubuntu 7.04 04/19/2007
SUSE 10.1 05/11/2006
Fedora 6 10/24/2006
Fedora 9 05/13/2008
Ubuntu 6.06 06/01/2006
Ubuntu 8.10 10/30/2008
Fedora 5 03/20/2006
After using the sort
command, it sorts the lines alphabetically by default, which results in the following:
bharatwaj@comp:~$ sort distros.txt
Fedora 10 11/25/2008
Fedora 5 03/20/2006
Fedora 6 10/24/2006
Fedora 7 05/31/2007
Fedora 8 11/08/2007
Fedora 9 05/13/2008
SUSE 10.1 05/11/2006
SUSE 10.2 12/07/2006
SUSE 10.3 10/04/2007
SUSE 11.0 06/19/2008
Ubuntu 6.06 06/01/2006
Ubuntu 6.10 10/26/2006
Ubuntu 7.04 04/19/2007
Ubuntu 7.10 10/18/2007
Ubuntu 8.04 04/24/2008
Ubuntu 8.10 10/30/2008
The Fedora version numbers don't sort correctly. Since sort
compares characters lexicographically (i.e., alphabetically), it places Fedora 10 before Fedora 5 because 1
(from "10") comes before 5
in the character set.
To fix the sorting issue, we need to sort by multiple keys: first alphabetically by the distribution name (field 1), and then numerically by the version number (field 2). The sort
command allows multiple -k
options to specify multiple keys.
This command works as follows:
--key=1,1
sorts by the first field (distribution name) alphabetically.--key=2n
sorts by the second field (version number) numerically.
bharatwaj@comp:~$ sort --key=1,1 --key=2,2n distros.txt
Fedora 5 03/20/2006
Fedora 6 10/24/2006
Fedora 7 05/31/2007
Fedora 8 11/08/2007
Fedora 9 05/13/2008
Fedora 10 11/25/2008
SUSE 10.1 05/11/2006
SUSE 10.2 12/07/2006
SUSE 10.3 10/04/2007
SUSE 11.0 06/19/2008
Ubuntu 6.06 06/01/2006
Ubuntu 6.10 10/26/2006
Ubuntu 7.04 04/19/2007
Ubuntu 7.10 10/18/2007
Ubuntu 8.04 04/24/2008
Ubuntu 8.10 10/30/2008
- To sort dates that are in the American format of MM/DD/YYYY (e.g., 11/25/2008) into chronological order, we need to rearrange the date components to match the ISO format (YYYY-MM-DD), which is easier for sorting.
sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt
breaks the date into parts and sorts them numerically:
-k 3.7
: Sorts by the year (starting at character 7 of field 3, which is the beginning of the year in MM/DD/YYYY).-k 3.1
: Sorts by the month (starting at character 1 of field 3, which is the beginning of the month in MM/DD/YYYY).-k 3.4
: Sorts by the day (starting at character 4 of field 3, which is the beginning of the day in MM/DD/YYYY).n
: Numeric sort to ensure that numbers are compared as numbers (e.g.,10
is greater than9
).b
: Ignores leading spaces for cleaner sorting.
By using -k 3.7
, we direct sort
to begin at the 7th character of the third field, which is the year part of the date. Similarly, -k 3.1
and -k 3.4
are used to isolate the month and day portions of the date, respectively. The n
option ensures numeric sorting, while the r
option reverses the order. The b
option is included to eliminate any leading spaces, ensuring a more accurate and consistent sort, particularly when spaces vary across lines.
bharatwaj@comp:~$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt
Fedora 10 11/25/2008
Ubuntu 8.10 10/30/2008
SUSE 11.0 06/19/2008
Fedora 9 05/13/2008
Ubuntu 8.04 04/24/2008
Fedora 8 11/08/2007
Ubuntu 7.10 10/18/2007
SUSE 10.3 10/04/2007
Fedora 7 05/31/2007
Ubuntu 7.04 04/19/2007
SUSE 10.2 12/07/2006
Ubuntu 6.10 10/26/2006
Fedora 6 10/24/2006
Ubuntu 6.06 06/01/2006
SUSE 10.1 05/11/2006
Fedora 5 03/20/2006
Sort with different delimiter
-t ':'
: Specifies that the fields in the/etc/passwd
file are separated by colons (:
).-k 7
: Tellssort
to use the seventh field (the default shell) as the key for sorting.
bharatwaj@comp:~$ sort -t ':' -k 7 /etc/passwd | head
bharatwaj:x:1000:1000:,,,:/home/bharatwaj:/bin/bash
root:x:0:0:root:/root:/bin/bash
sync:x:4:65534:sync:/bin:/bin/sync
_apt:x:105:65534::/nonexistent:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
games:x:5:60:games:/usr/games:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin
uniq - Report or Omit Repeated Lines
- When given a sorted file (including standard input), it removes any duplicate lines and sends the results to standard output. It is often used in conjunction with sort to clean the output of duplicates
bharatwaj@comp:~$ cat > foo.txt
c
a
b
a
b
c
bharatwaj@comp:~$ uniq foo.txt
c
a
b
a
b
c
For uniq
to actually do its job, the input must be sorted first. This is because uniq
only removes duplicate lines that are adjacent to each other.
bharatwaj@comp:~$ sort foo.txt | uniq
a
b
c
- Output a list of duplicate lines preceded by the number of times the line occurs.
bharatwaj@comp:~$ sort foo.txt | uniq -c
2 a
2 b
2 c
cut - Remove Sections from Each Line of Files
- The
cut
program is used to extract specific parts of a line of text and display them. It can take input from one or more files or from standard input.
The cut
program works best with files that are generated by other programs, rather than text typed manually, because it expects a consistent format. To check if a file is suitable for cut
, like the distros.txt
file, you can use cat -A
to see if the file has tab-separated fields, which is what cut
relies on. (The default delimiter is tab)
bharatwaj@comp:~$ cat -A distros.txt
SUSE 10.2 12/07/2006 $
Fedora 10 11/25/2008 $
SUSE 11.0 06/19/2008 $
Ubuntu 8.04 04/24/2008 $
Fedora 8 11/08/2007 $
SUSE 10.3 10/04/2007 $
Ubuntu 6.10 10/26/2006 $
Fedora 7 05/31/2007 $
Ubuntu 7.10 10/18/2007 $
Ubuntu 7.04 04/19/2007 $
SUSE 10.1 05/11/2006 $
Fedora 6 10/24/2006$
Fedora 9 05/13/2008 $
Ubuntu 6.06 06/01/2006 $
Ubuntu 8.10 10/30/2008 $
Fedora 5 03/20/2006 $
But in our case the delimiter is (a space) so
-d " "
: Sets the delimiter to a space (" "
), meaning it will split each line into parts wherever there is a space.-f 3
: Tellscut
to select the third part (field) from each line.
bharatwaj@comp:~$ cut -d " " -f 3 distros.txt
12/07/2006
11/25/2008
06/19/2008
04/24/2008
11/08/2007
10/04/2007
10/26/2006
05/31/2007
10/18/2007
04/19/2007
05/11/2006
10/24/2006
05/13/2008
06/01/2006
10/30/2008
03/20/2006
Extracting year from each line
This second
cut
command takes the output from the firstcut
command.It extracts characters 7 through 10 (
-c 7-10
) from each line of the previous output.
bharatwaj@comp:~$ cut -d " " -f 3 distros.txt | cut -c 7-10
2006
2008
2008
2008
2007
2007
2006
2007
2007
2007
2006
2006
2008
2006
2008
2006
cut on a file
-d ':'
: This sets the delimiter to a colon (:
), meaning it will split each line in the/etc/passwd
file wherever there is a colon.-f 1
: This selects the first field (part before the first colon) from each line. In/etc/passwd
, the first field is typically the username.
bharatwaj@comp:~$ cut -d ':' -f 1 /etc/passwd | head
root
daemon
bin
sys
sync
games
man
lp
mail
news
paste - Merge Lines of Files
The
paste
command does the opposite ofcut
. Rather than extracting a column of text from a file, it adds one or more columns of text to a file.It does this by reading multiple files and combining the fields found in each file into a single stream of standard output. Like
cut
,paste
accepts multiple file arguments and/or standard input
To demonstrate how paste
operates, we will perform some surgery on our distros.txt
file to produce a chronological list of releases.
From our earlier work with sort, we will first produce a list of distros sorted by date and store the result in a file called distros-by-date.txt
:
bharatwaj@comp:~$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt > distros-by-date.txt
Next, we will use cut to extract the first two fields from the file (the distro name and version) and store that result in a file named distro-versions.txt
bharatwaj@comp:~$ cut -d " " -f 1,2 distros-by-date.txt > distros-versions.txt
bharatwaj@comp:~$ head distros-versions.txt
Fedora 10
Ubuntu 8.10
SUSE 11.0
Fedora 9
Ubuntu 8.04
Fedora 8
Ubuntu 7.10
SUSE 10.3
Fedora 7
Ubuntu 7.04
The final piece of preparation is to extract the release dates and store them a file named distro-dates.txt
:
bharatwaj@comp:~$ cut -d " " -f 3 distros-by-date.txt > distros-dates.txt
bharatwaj@comp:~$ head distros-dates.txt
11/25/2008
10/30/2008
06/19/2008
05/13/2008
04/24/2008
11/08/2007
10/18/2007
10/04/2007
05/31/2007
04/19/2007
We now have the parts we need. To complete the process, use paste to put the column of dates ahead of the distro names and versions, thus creating a chronological list. This is done simply by using paste and ordering its arguments in the desired arrangement.
-d " "
: This specifies that a space character should be used to separate the fields when pasting the lines together.
bharatwaj@comp:~$ paste -d " " distros-dates.txt distros-versions.txt
11/25/2008 Fedora 10
10/30/2008 Ubuntu 8.10
06/19/2008 SUSE 11.0
05/13/2008 Fedora 9
04/24/2008 Ubuntu 8.04
11/08/2007 Fedora 8
10/18/2007 Ubuntu 7.10
10/04/2007 SUSE 10.3
05/31/2007 Fedora 7
04/19/2007 Ubuntu 7.04
12/07/2006 SUSE 10.2
10/26/2006 Ubuntu 6.10
10/24/2006 Fedora 6
06/01/2006 Ubuntu 6.06
05/11/2006 SUSE 10.1
03/20/2006 Fedora 5
join - Join Lines of Two Files on a Common Field
In some ways, join
is like paste
in that it adds columns to a file, but it does so in a unique way. A join
is an operation usually associated with relational databases where data from multiple tables with a shared key field is combined to form a desired result. The join
program performs the same operation. It joins data from multiple files based on a shared key field.
To demonstrate the join
program, we’ll need to make a couple of files with a shared key. To do this, we will use our distros-by-date.txt
file. From this file, we will construct two additional files. One contains the release dates (which will be our shared key field for this demonstration) and the release names
bharatwaj@comp:~$ cut -d " " -f 1,1 distros-by-date.txt > distros-names.txt
bharatwaj@comp:~$ paste distros-dates.txt distros-names.txt > distros-key-names.txt
bharatwaj@comp:~$ head distros-key-names.txt
11/25/2008 Fedora
10/30/2008 Ubuntu
06/19/2008 SUSE
05/13/2008 Fedora
04/24/2008 Ubuntu
11/08/2007 Fedora
10/18/2007 Ubuntu
10/04/2007 SUSE
05/31/2007 Fedora
04/19/2007 Ubuntu
The second file contains the release dates and the version numbers:
bharatwaj@comp:~$ cut -d " " -f 2,2 distros-by-date.txt > distros-vernums.txt
bharatwaj@comp:~$ paste distros-dates.txt distros-vernums.txt > distros-key-vernums.txt
bharatwaj@comp:~$ head distros-key-vernums.txt
11/25/2008 10
10/30/2008 8.10
06/19/2008 11.0
05/13/2008 9
04/24/2008 8.04
11/08/2007 8
10/18/2007 7.10
10/04/2007 10.3
05/31/2007 7
04/19/2007 7.04
We now have two files with a shared key (the “release date” field). It is important to point out that the files must be sorted on the key field for join to work properly.
join
command: This merges two files based on a common key (the first column in each file).distros-key-names.txt
: Contains the release date and distribution name (e.g.,11/25/2008 Fedora
).distros-key-vernums.txt
: Contains the release date and version number (e.g.,11/25/2008 10
).
What
join
does: It looks for lines in both files where the release date (the first column) is the same. For those matching dates, it combines the corresponding lines from both files, merging the distribution name and version number into a single line.
bharatwaj@comp:~$ join distros-key-names.txt distros-key-vernums.txt | head
11/25/2008 Fedora 10
10/30/2008 Ubuntu 8.10
06/19/2008 SUSE 11.0
05/13/2008 Fedora 9
04/24/2008 Ubuntu 8.04
11/08/2007 Fedora 8
10/18/2007 Ubuntu 7.10
10/04/2007 SUSE 10.3
05/31/2007 Fedora 7
04/19/2007 Ubuntu 7.04
Note also that, by default, join
uses whitespace as the input field delimiter and a single space as the output field delimiter. This behavior can be modified by specifying options. See the join
man page for details.
comm - Compare Two Sorted Files Line by Line
The comm
program compares two text files, displaying the lines that are unique to each one and the lines they have in common. To demonstrate, we will create two nearly identical text files using cat
bharatwaj@comp:~$ cat > file1.txt
a
b
c
d
bharatwaj@comp:~$ cat > file2.txt
b
c
d
e
We will compare the two files using comm
bharatwaj@comp:~$ comm file1.txt file2.txt
a
b
c
d
e
As we can see, comm
produces three columns of output. The first column contains lines unique to the first file argument; the second column, the lines unique to the second file argument; and the third column, the lines shared by both files.
comm
supports options in the form -n
where n
is either 1, 2, or 3. When used, these options specify which column(s) to suppress. For example, if we wanted to output only the lines shared by both files, we would suppress the output of columns 1 and 2:
bharatwaj@comp:~$ comm -12 file1.txt file2.txt
b
c
d
diff - Compare Files Line by Line
diff
is a tool used to compare files and identify differences, often used by developers to track changes in source code and create patch files for updating versions.
If we use diff to look at our previous example files, we see its default style of output: a terse description of the differences between the two files.
bharatwaj@comp:~$ diff file1.txt file2.txt
1d0
< a
4a4
> e
In the default format, diff
shows changes with a command indicating the range and type of modifications needed to transform one file into another.
Change Description | Explanation |
r1ar2 | Append the lines at position r2 in the second file to position r1 in the first file. |
r1cr2 | Change (replace) the lines at position r1 in the first file with the lines at position r2 in the second file. |
r1dr2 | Delete the lines at position r1 in the first file that would have appeared at range r2 in the second file. |
When viewed using the context format (the -c option), the output looks like this:
bharatwaj@comp:~$ diff -c file1.txt file2.txt
*** file1.txt 2024-12-14 22:14:07.048924575 +0530
--- file2.txt 2024-12-14 22:14:18.048923011 +0530
***************
*** 1,4 ****
- a
b
c
d
--- 1,4 ----
b
c
d
+ e
The output begins with the names of the two files and their timestamps. The first file is marked with asterisks, and the second file is marked with dashes. Throughout the remainder of the listing, these markers will signify their respective files. Next, we see groups of changes, including the default number of surrounding context lines. In the first group, we see *** 1,4 ****
`, which indicates lines 1 through 4 in the first file. Later we see --- 1,4 ----
, which indicates lines 1 through 4 in the second file. Within a change group, lines begin with one of four indicators, as shown in below table
Indicator | Meaning |
(none) | A line shown for context, indicating no difference between the two files. |
- | A line deleted, appearing in the first file but not in the second. |
+ | A line added, appearing in the second file but not in the first. |
! | A line changed, with both versions displayed in their respective sections. |
The unified format is similar to the context format but is more concise. It is specified with the -u option:
bharatwaj@comp:~$ diff -u file1.txt file2.txt
--- file1.txt 2024-12-14 22:14:07.048924575 +0530
+++ file2.txt 2024-12-14 22:14:18.048923011 +0530
@@ -1,4 +1,4 @@
-a
b
c
d
+e
The most notable difference between the context and unified formats is the elimination of the duplicated lines of context, making the results of the unified format shorter than those of the context format. In our example above, we see file timestamps like those of the context format, followed by the string @@ -1,4 +1,4 @@
. This indicates the lines in the first file and the lines in the second file described in the change group. Following this are the lines themselves, with the default three lines of context. As shown in below table, each line starts with one of three possible characters.
Character | Meaning |
(none) | This line is shared by both files. |
- | This line was removed from the first file. |
+ | This line was added to the first file. |
patch - Apply a diff to an Original
The patch
program is used to apply changes to text files. It accepts output from diff
and is generally used to convert older version of files into newer versions.
bharatwaj@comp:~$ cat file1.txt
a
b
c
d
bharatwaj@comp:~$ cat file2.txt
b
c
d
e
bharatwaj@comp:~$ diff -Naur file1.txt file2.txt > patchfile.txt
bharatwaj@comp:~$ patch < patchfile.txt
patching file file1.txt
bharatwaj@comp:~$ cat file1.txt
b
c
d
e
we created a diff file named patchfile.txt
and then used the patch program to apply the patch. Note that we did not have to specify a target file to patch, as the diff file (in unified format) already contains the filenames in the header. Once the patch is applied, we can see that file1.txt
now matches file2.txt
.
patch
has a large number of options, and additional utility programs can be used to analyze and edit patches.
tr - Transliterate or Delete Characters
The tr
program is used to transliterate characters. We can think of this as a sort of character-based search-and-replace operation. Transliteration is the process of changing characters from one alphabet to another.
For example, converting characters from lowercase to uppercase is transliteration. We can perform such a conversion with tr as follows:
tr a-z A-Z
: Thetr
command is used to translate or replace characters. In this case:a-z
represents the range of lowercase letters from 'a' to 'z'.A-Z
represents the range of uppercase letters from 'A' to 'Z’.
bharatwaj@comp:~$ echo "lowercase letters" | tr a-z A-Z
LOWERCASE LETTERS
Another example,
bharatwaj@comp:~$ echo "lowercase letters" | tr [:lower:] A
AAAAAAAAA AAAAAAA
tr
command could be used to convert a DOS-style text file (which uses both carriage return \r
and line feed \n
for line breaks) into a Unix-style text file (which only uses \n
for line breaks).
tr -d '\r'
: This tells thetr
command to delete the carriage return characters (\r
) from the file.< dos_file
: This reads the content of thedos_file
.> unix_file
: This writes the output (with the carriage returns removed) to theunix_file
.
bharatwaj@comp:~$ tr -d '\r' < dos_file > unix_file
Used in ROT13 encoding of text.
bharatwaj@comp:~$ echo "secret text" | tr a-zA-Z n-za-mN-ZA-M
frperg grkg
bharatwaj@comp:~$ echo "frperg grkg" | tr a-zA-Z n-za-mN-ZA-M
secret text
Using the -s option, tr
can “squeeze” (delete) repeated instances of a character:
- By specifying the set ab to
tr
, we eliminate the repeated instances of the letters in the set, while leaving the character that is missing from the set (c) unchanged.
bharatwaj@comp:~$ echo "aaabbbccc" | tr -s ab
abccc
Note that the repeating characters must be adjoining. If they are not, the squeezing will have no effect:
bharatwaj@comp:~$ echo "abcabcabc" | tr -s ab
abcabcabc
sed - Stream Editor for Filtering and Transforming Text
The name sed
is short for stream editor. It performs text editing on a stream of text, either a set of specified files or standard input.
The expression
's/front/back/'
tellssed
to perform a substitution:s
stands for substitute.front
is the pattern to search for in the input.back
is the replacement text.
bharatwaj@comp:~$ echo "front" | sed 's/front/back/'
back
The choice of the delimiter character is arbitrary. By convention, the slash character is often used, but sed will accept any character that immediately follows the command as the delimiter. We could perform the same command this way:
bharatwaj@comp:~$ echo "front" | sed 's_front_back_'
back
Most commands in sed
may be preceded by an address, which specifies which line(s) of the input stream will be edited. If the address is omitted, then the editing command is carried out on every line in the input stream.
bharatwaj@comp:~$ echo -e "front\nfront" | sed 's/front/back/'
back
back
bharatwaj@comp:~$ echo -e "front\nfront" | sed '1s/front/back/'
back
front
bharatwaj@comp:~$ echo -e "front\nfront" | sed '2s/front/back/'
front
back
bharatwaj@comp:~$ echo -e "front\nfront" | sed '3s/front/back/'
front
front
sed Address Notation
Address | Description |
n | A specific line number, where n is a positive integer (e.g., 1 for the first line). |
$ | Represents the last line of the input stream. |
/regexp/ | Lines matching a regular expression (POSIX basic regular expression). The regex is usually delimited by slashes (/ ), but you can use an alternate delimiter (\cregexpc ), where c is the chosen delimiter. |
addr1,addr2 | A range of lines from addr1 to addr2 , inclusive (e.g., 1,5 for lines 1 through 5). |
first~step | Matches the first line and then every subsequent line at step intervals. For example, 1~2 matches every odd-numbered line, and 5~5 matches the fifth line and every fifth line after that. |
addr1,+n | Matches addr1 and the following n lines (e.g., 2,+3 matches lines 2, 3, 4). |
addr! | Matches all lines except the specified address (e.g., 1! matches all lines except the first). |
The command
sed -n '1,5p' distros.txt
tellssed
to print only lines 1 through 5 from the filedistros.txt
. Here's a breakdown:-n
: This option tellssed
to suppress automatic printing of all lines. Normally,sed
prints each line of input by default, but with-n
, only lines explicitly instructed to be printed will be shown.'1,5p'
:1,5
specifies the range of lines (lines 1 through 5).p
stands for print, meaningsed
will print these specified lines.
bharatwaj@comp:~$ sed -n '1,5p' distros.txt
SUSE 10.2 12/07/2006
Fedora 10 11/25/2008
SUSE 11.0 06/19/2008
Ubuntu 8.04 04/24/2008
Fedora 8 11/08/2007
Regular Expression:
- By including the slash-delimited regular expression /SUSE/, we are able to isolate the lines containing it in much the same manner as grep.
bharatwaj@comp:~$ sed -n '/SUSE/p' distros.txt
SUSE 10.2 12/07/2006
SUSE 11.0 06/19/2008
SUSE 10.3 10/04/2007
SUSE 10.1 05/11/2006
- We’ll try negation by adding an exclamation point (!) to the address
bharatwaj@comp:~$ sed -n '/SUSE/!p' distros.txt
Fedora 10 11/25/2008
Ubuntu 8.04 04/24/2008
Fedora 8 11/08/2007
Ubuntu 6.10 10/26/2006
Fedora 7 05/31/2007
Ubuntu 7.10 10/18/2007
Ubuntu 7.04 04/19/2007
Fedora 6 10/24/2006
Fedora 9 05/13/2008
Ubuntu 6.06 06/01/2006
Ubuntu 8.10 10/30/2008
Fedora 5 03/20/2006
There are further more commands please refer to the documentation
Command | Description | Example |
= | Output current line number | sed '=' distros.txt |
a | Append text after the current line | sed '2a This is a new line' distros.txt |
d | Delete the current line | sed '3d' distros.txt |
i | Insert text in front of the current line | sed '2i This is a new line' distros.txt |
p | Print the current line (with -n option to suppress default) | sed -n '2p' distros.txt |
q | Exit sed without processing further lines | sed '3q' distros.txt |
s/regexp/replacement/ | Substitute replacement for regexp in the input text. Can use & for matched text and \1 -\9 for backreferences. | sed 's/front/back/' distros.txt |
y/set1/set2/ | Perform transliteration of characters in set1 to set2 . Both sets must have the same length. | sed 'y/abc/xyz/' |
Another feature of the s command is the use of optional flags that may follow the replacement string. The most important of these is the g flag, which instructs sed to apply the search and replace globally to a line, not just to the first instance, which is the default
bharatwaj@comp:~$ echo "aaabbbccc" | sed 's/b/B/'
aaaBbbccc
The above will only change the first "b" to a "B". To replace all occurrences of the pattern in the line, you can use the g
flag, which stands for "global".
bharatwaj@comp:~$ echo "aaabbbccc" | sed 's/b/B/g'
aaaBBBccc
aspell - Interactive Spell Checker
aspell
is a command-line tool for checking and correcting spelling in text files.
bharatwaj@comp:~$ cat > foo.txt
The quick brown fox jimped over the laxy dog.
we’ll check the file using aspell:
bharatwaj@comp:~$ aspell check foo.txt
If we enter 1, aspell
replaces the offending word with the word jumped and moves on to the next misspelled word, which is laxy. If we select the replacement lazy, aspell
replaces it and terminates. Once aspell
has finished, we can examine our file and see that the misspellings have been corrected.
bharatwaj@comp:~$ cat foo.txt
The quick brown fox jumped over the lazy dog.
Relevant Book:
The Linux Command Line - A Complete Introduction
Disclaimer: This is a personal blog that might come in handy when I suffer from Dementia in future.