» Publishers, Monetize your RSS feeds with FeedShow: More infos (Show/Hide Ads)
Date: Monday, 02 Nov 2009 23:35
Input file 'file.txt' contains names of few students.
Required output:
For the entries of the above file,
- add a serial number to each line
- Also add 'House' number such that all the students are group into total 4 houses in the following fashion:
The awk solution using awk NR variable:
Lets format the output for a better look:
Output:
Read about text alignment using awk printf function here
A Bash script for the same will be something like this:
Output:
Now a question:
What is that '$REPLY' in the above script ?
Answer: '$REPLY' is the default value when a variable is not supplied to read.
So the above script is same as:
In general, numbering of the lines of a file can be done in several ways viz
Using UNIX/Linux nl(1) command - number lines of files
Using awk NR:
Using sed syntax:
$ cat file.txt
Sam G
Ashok Niak
Rosy M
Peter K
Sid Thom
Rasi Yad
Papu S
Niaraj J
Aloh N K
Nipu H
Quam L
Required output:
For the entries of the above file,
- add a serial number to each line
- Also add 'House' number such that all the students are group into total 4 houses in the following fashion:
Sl No,Name,House
1,Sam G,House1
2,Ashok Niak,House2
3,Rosy M,House3
4,Peter K,House4
5,Sid Thom,House1
6,Rasi Yad,House2
7,Papu S,House3
8,Niaraj J,House4
9,Aloh N K,House1
10,Nipu H,House2
11,Quam L,House3
The awk solution using awk NR variable:
$ awk '
BEGIN {OFS=","; print "Sl No,Name,House"}
{print NR,$0,"House"((NR-1)%4)+1}
' file.txt
Lets format the output for a better look:
$ awk '
BEGIN {
FORMAT="%-8s%-18s%s\n" ;
{printf FORMAT,"Sl No","Name","House"}
}
{printf FORMAT,NR,$0,"House"((NR-1)%4)+1}
' file.txt
Output:
Sl No Name House
1 Sam G House1
2 Ashok Niak House2
3 Rosy M House3
4 Peter K House4
5 Sid Thom House1
6 Rasi Yad House2
7 Papu S House3
8 Niaraj J House4
9 Aloh N K House1
10 Nipu H House2
11 Quam L House3
Read about text alignment using awk printf function here
A Bash script for the same will be something like this:
#!/bin/sh
i=0
while read
do
echo "$((i+1)),$REPLY,House$((i++ % 4 + 1))"
done < file.txt
Output:
$ sh numbering.sh
1,Sam G,House1
2,Ashok Niak,House2
3,Rosy M,House3
4,Peter K,House4
5,Sid Thom,House1
6,Rasi Yad,House2
7,Papu S,House3
8,Niaraj J,House4
9,Aloh N K,House1
10,Nipu H,House2
11,Quam L,House3
Now a question:
What is that '$REPLY' in the above script ?
Answer: '$REPLY' is the default value when a variable is not supplied to read.
So the above script is same as:
#!/bin/sh
i=0
while read line
do
echo "$((i+1)),$line,House$((i++ % 4 + 1))"
done < file.txt
In general, numbering of the lines of a file can be done in several ways viz
Using UNIX/Linux nl(1) command - number lines of files
$ nl file.txt
1 Sam G
2 Ashok Niak
3 Rosy M
4 Peter K
5 Sid Thom
6 Rasi Yad
7 Papu S
8 Niaraj J
9 Aloh N K
10 Nipu H
11 Quam L
Using awk NR:
$ awk '{print "\t"NR"\t"$0}' file.txt
1 Sam G
2 Ashok Niak
3 Rosy M
4 Peter K
5 Sid Thom
6 Rasi Yad
7 Papu S
8 Niaraj J
9 Aloh N K
10 Nipu H
11 Quam L
Using sed syntax:
$ sed = file.txt | sed 'N;s/\n/\t/'
1 Sam G
2 Ashok Niak
3 Rosy M
4 Peter K
5 Sid Thom
6 Rasi Yad
7 Papu S
8 Niaraj J
9 Aloh N K
10 Nipu H
11 Quam L
Date: Saturday, 31 Oct 2009 23:45
Below are few different ways to print or extract a section of a file based on line numbers.
Lets try to extract lines between line number 27 and line number 99 of input file 'file.txt'
Using sed editor:
Which is same as:
Awk alternative : you can make use of awk NR variable
Using Linux/UNIX 'head' and 'tail' command:
Which is basically:
In vi editor, we can use the following command in ex mode (open the main file 'file.txt' in vi):
i.e. Write lines between line number 27 and line number 99 of main file 'file.txt' to file '/tmp/file6'
Perl alternative would be:
And the solution using python:
So the contents of all the output files produced (i.e /tmp/file[1-8]) will be the same (i.e. line number 27 to line number 99 of 'file.txt')
Lets try to extract lines between line number 27 and line number 99 of input file 'file.txt'
Using sed editor:
$ sed -n '27,99 p' file.txt > /tmp/file1
Which is same as:
$ sed '27,99 !d' file.txt > /tmp/file2
Awk alternative : you can make use of awk NR variable
$ awk 'NR >= 27 && NR <= 99' file.txt > /tmp/file3
Using Linux/UNIX 'head' and 'tail' command:
$ head -99 file.txt | tail -73 > /tmp/file4
Which is basically:
$ head -99 file.txt | tail -$(((99-27)+1)) > /tmp/file5
In vi editor, we can use the following command in ex mode (open the main file 'file.txt' in vi):
:27,99 w! /tmp/file6
i.e. Write lines between line number 27 and line number 99 of main file 'file.txt' to file '/tmp/file6'
Perl alternative would be:
$ perl -ne 'print if 27..99' file.txt > /tmp/file7
And the solution using python:
$ python
Python 2.5.2 (r252:60911, Jul 22 2009, 15:35:03)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
>>> fp = open("/tmp/file8","w")
>>> for i,line in enumerate(open("file.txt")):
... if i >= 26 and i < 99 :
... fp.write(line)
...
>>>
So the contents of all the output files produced (i.e /tmp/file[1-8]) will be the same (i.e. line number 27 to line number 99 of 'file.txt')
Date: Friday, 30 Oct 2009 09:35
On one of my directory I had a lot of log files and I had to find the count of the total number of lines which starts with 's' (i.e. ^s).
My first approach was:
And I got my result. Then I thought of performing the same using bash scripting for and while loop and this is what I tried.
Executing it:
Cool, correct result.
And then I modified the above script for bash while loop:
Executing it:
Oops!!! what went wrong ?
In Bash shell, piping directly to bash while loop causes the bash shell to function in a sub shell.
So in the above example the scope of the 'sum' variable is limited to the sub-shell of the while loop and so the modified value of 'sum' is not reflected when we exit the loop. Value of sum is still 0 (local value) as we initialized it to 0 at the beginning of the script.
The solution of this variable scoping problem with while and direct piping will be:
Remove the direct pipe and feed the list of file names under '~/original' directory as stdin to the while loop as shown below (Basically create a temp file with the file names of the directory '~/original')
Executing it:
And the result is correct.
My first approach was:
$ ls | xargs -i grep -c ^s {} | awk '{sum+=$0} END {print sum}'
190978
And I got my result. Then I thought of performing the same using bash scripting for and while loop and this is what I tried.
#!/bin/sh
sum=0
DIR=~/original
for file in $(ls $DIR)
do
Slines=$(grep -c ^s $DIR/$file)
((sum+=Slines))
#You can also use
#sum=$(expr $sum + $Slines)
#sum=`expr $sum + $Slines`
done
echo $sum
Executing it:
$ ./usingfor.sh
190978
Cool, correct result.
And then I modified the above script for bash while loop:
#!/bin/sh
sum=0
DIR=~/original
ls $DIR | while read file
do
Slines=$(grep -c ^s $DIR/$file)
((sum+=Slines))
done
echo $sum
Executing it:
$ ./usingwhile.sh
0
Oops!!! what went wrong ?
In Bash shell, piping directly to bash while loop causes the bash shell to function in a sub shell.
So in the above example the scope of the 'sum' variable is limited to the sub-shell of the while loop and so the modified value of 'sum' is not reflected when we exit the loop. Value of sum is still 0 (local value) as we initialized it to 0 at the beginning of the script.
The solution of this variable scoping problem with while and direct piping will be:
Remove the direct pipe and feed the list of file names under '~/original' directory as stdin to the while loop as shown below (Basically create a temp file with the file names of the directory '~/original')
#!/bin/sh
sum=0
DIR=~/original
ls $DIR > /tmp/filelist
while read file
do
Slines=$(grep -c ^s $DIR/$file)
((sum+=Slines))
done < /tmp/filelist
echo $sum
Executing it:
$ ./usingwhile_1.sh
190978
And the result is correct.
Date: Wednesday, 28 Oct 2009 10:00
One of my input file had some control characters (^B i.e. hex \x02)

On my Ubuntu 8.04.3 and GNU grep version of
I can grep for any control characters like this:

Also if you know what to grep for, say in above example the control character is ^B (hex \x02); then you can directly grep for it like this
* ^B to be typed as ctrl V and ctrl B
And to match any non printable characters, here is another way using grep
To display non printable characters, here is a way using GNU cat command (My cat version : cat GNU coreutils 6.10)
Output:


On my Ubuntu 8.04.3 and GNU grep version of
$ grep --version
GNU grep 2.5.3
I can grep for any control characters like this:
$ grep '[[:cntrl:]]' /tmp/file.txt
$ grep '[[:cntrl:]]' /tmp/file.txt | less
Also if you know what to grep for, say in above example the control character is ^B (hex \x02); then you can directly grep for it like this
$ grep ^B /tmp/file.txt
* ^B to be typed as ctrl V and ctrl B
And to match any non printable characters, here is another way using grep
$ grep '[^[:print:]]' /tmp/file.txt
To display non printable characters, here is a way using GNU cat command (My cat version : cat GNU coreutils 6.10)
$ cat -v -e -t /tmp/s
Output:
Date: Monday, 26 Oct 2009 23:20
Question: In vi editor, how can I find or locate the nth occurrence of a particular search pattern ?
Answer: With new vim editor, once you search a pattern say /queryname , type 4n in command mode which will leap to the 4th occurrence of the word 'queryname' from where you are.
So to find or locate the 10th occurrence of a particular pattern, go to the top of the file (:1), search for the pattern (/pattern) and then in command mode type 10n.
Answer: With new vim editor, once you search a pattern say /queryname , type 4n in command mode which will leap to the 4th occurrence of the word 'queryname' from where you are.
So to find or locate the 10th occurrence of a particular pattern, go to the top of the file (:1), search for the pattern (/pattern) and then in command mode type 10n.
Date: Saturday, 24 Oct 2009 23:16
shuf - generate random permutations
Lets discuss the command line options available with Linux/UNIX 'shuf' command
From SHUF(1) man page:
1) -e, --echo
treat each ARG as an input line
2) -i, --input-range=LO-HI
treat each number LO through HI as an input line
To shuffle the numbers between 100 and 200
Also, 'shuf' command can be used along with UNIX/Linux 'seq' or 'jot' command to perform the same as shuf "-i" option.
3) -n, --head-lines=LINES
output at most LINES lines
To print a random word in Linux/UNIX
Note: /usr/share/dict/words is a standard file on UNIX like operating system and is a newline delimited list of dictionary words.
4) -o, --output=FILE
write result to FILE instead of standard output
Also you can use UNIX/Linux redirection for the same
You can shuffle the lines of file and print the output to standard output like this
Related post:
- Generate random words in Linux in bash
Lets discuss the command line options available with Linux/UNIX 'shuf' command
From SHUF(1) man page:
1) -e, --echo
treat each ARG as an input line
$ shuf -e 3 5 6 7
7
6
5
3
$ shuf -e 3 5 6 7
7
5
6
3
$ shuf -e 3 5 6 7
3
6
5
7
2) -i, --input-range=LO-HI
treat each number LO through HI as an input line
To shuffle the numbers between 100 and 200
$ shuf -i 100-200
Also, 'shuf' command can be used along with UNIX/Linux 'seq' or 'jot' command to perform the same as shuf "-i" option.
$ shuf -e $(seq 100 200)
$ shuf -e $(jot 100 100)
3) -n, --head-lines=LINES
output at most LINES lines
$ shuf -i 100-200 -n 3
118
133
117
$ shuf -i 100-200 -n 3
193
188
145
To print a random word in Linux/UNIX
$ shuf -n 1 /usr/share/dict/words
disrupted
$ shuf -n 1 /usr/share/dict/words
festered
Note: /usr/share/dict/words is a standard file on UNIX like operating system and is a newline delimited list of dictionary words.
4) -o, --output=FILE
write result to FILE instead of standard output
$ shuf -n 3 /usr/share/dict/words -o /tmp/dict.txt
$ cat /tmp/dict.txt
heartlands
temple
unsatisfied
Also you can use UNIX/Linux redirection for the same
$ shuf -n 3 /usr/share/dict/words > /tmp/dict.txt
You can shuffle the lines of file and print the output to standard output like this
$ shuf < /tmp/file.txt
Related post:
- Generate random words in Linux in bash
Date: Monday, 19 Oct 2009 01:19
In one of my Debian box with mawk 1.3.3 (mawk is an interpreter for the AWK Programming Language), if I try to add the 2nd fields of the following file using awk:
The output:
So, awk is giving sum output as exponential format as seen above.
To get the above sum output in integer, here is a way:
But on my Ubuntu 8.04.3 with awk version:
$ cat data.txt
a:99540232
b:89795683
a:08160808
c:0971544
d:99500728
a:12212539898
d:98065599
e:92640031
a:3129013
c:4085555
The output:
$ awk -F ":" '{sum+=$NF} END {print sum}' data.txt
1.27084e+10
So, awk is giving sum output as exponential format as seen above.
To get the above sum output in integer, here is a way:
$ awk -F ":" '{sum+=$NF} END { printf ("%0.0f\n", sum)} ' data.txt
12708429091
But on my Ubuntu 8.04.3 with awk version:
$ awk --version | head -1
GNU Awk 3.1.6
$ awk -F ":" '{sum+=$NF} END {print sum}' data.txt
12708429091
$ awk -F ":" '{sum+=$NF} END { printf ("%d\n", sum)} ' data.txt
12708429091
$ awk -F ":" '{sum+=$NF} END { printf ("%0.0f\n", sum)} ' data.txt
12708429091
Date: Friday, 16 Oct 2009 00:43
I have already put a post on - how we can split a file into multiple sub-files based on different conditions (that was basically a horizontal splitting of file); lets see how we can split a file vertically.
Input file 'file.txt' is a csv file:
Required:
Split the above file into two sub-files such that 1st 3 columns are written to sub-file1 and rest of the columns to sub-file2.
i.e.
sub-file1 content will be
And sub-file2 content will be
Well, this is a pretty simple task using Linux/UNIX cut command
Awk solution:
Sub-files generated after running the above awk script:
Input file 'file.txt' is a csv file:
$ cat file.txt
A,B,C,D,E,F,G,H,I
1,2,3,4,5,6,7,8,9
I,II,III,IV,V,VII,VIII,IX
a,b,c,d,e,f,g,h,i
Required:
Split the above file into two sub-files such that 1st 3 columns are written to sub-file1 and rest of the columns to sub-file2.
i.e.
sub-file1 content will be
A,B,C
1,2,3
I,II,III
a,b,c
And sub-file2 content will be
D,E,F,G,H,I
4,5,6,7,8,9
IV,V,VII,VIII,IX
d,e,f,g,h,i
Well, this is a pretty simple task using Linux/UNIX cut command
#Printing first 3 columns of 'file.txt'
$ cut -d"," -f1-3 file.txt
or
$ cut -d"," -f-3 file.txt
and
#Printing from 4th column till end
$ cut -d"," -f4-9 file.txt
or
$ cut -d"," -f4- file.txt
Awk solution:
$ awk -F "," '
{
for(i=1;i<=NF;i++) {
if(i <= 3) {
printf "%s,", $i >> "sub-file1"
if(i==3){
printf "\n" >> "sub-file1"
}
} else {
printf "%s,", $i >> "sub-file2"
if(i==NF){
printf "\n" >> "sub-file2"
}
}
}
}' file.txt
Sub-files generated after running the above awk script:
$ cat sub-file1
A,B,C,
1,2,3,
I,II,III,
a,b,c,
$ cat sub-file2
D,E,F,G,H,I,
4,5,6,7,8,9,
IV,V,VII,VIII,IX,
d,e,f,g,h,i,
Date: Wednesday, 14 Oct 2009 11:11
I have already put a post on some good uses of Linux/UNIX 'paste' command; lets check another practical one using paste command.
Input files:
Output required:
For every single line of 'leader.txt'; insert 3 lines from file 'contestant.txt'; so that the output looks like this:
The step by step solution using Linux/UNIX paste command
Another similar one liner for the same:
Input files:
$ cat contestant.txt
Christopher
Williams
Darwin
Ajay
Brain
Amay
Jiten
Lila
$ cat leader.txt
Mr B
Mrs C
Mrs A
Output required:
For every single line of 'leader.txt'; insert 3 lines from file 'contestant.txt'; so that the output looks like this:
Mr B
Christopher
Williams
Darwin
Mrs C
Ajay
Brain
Amay
Mrs A
Jiten
Lila
The step by step solution using Linux/UNIX paste command
$ cat contestant.txt | paste - - -
Output:
Christopher Williams Darwin
Ajay Brain Amay
Jiten Lila
$ cat contestant.txt | paste - - - | paste leader.txt -
Output:
Mr B Christopher Williams Darwin
Mrs C Ajay Brain Amay
Mrs A Jiten Lila
$ cat contestant.txt | paste - - - |paste leader.txt - |tr "\t" "\n"
Output:
Mr B
Christopher
Williams
Darwin
Mrs C
Ajay
Brain
Amay
Mrs A
Jiten
Lila
Another similar one liner for the same:
$ < contestant.txt paste - - - | paste leader.txt - | tr "\t" "\n"
Date: Friday, 09 Oct 2009 10:57
My directory contains a set of log files with filename of the following pattern:
debug.vendor-name.some-serial-number.epoch-time-stamp. device-class.log
where:
epoch-time-stamp
is the UNIX time stamp when the log file is generated.
device-class
first 4 character of this number represent the service-name of the device and next 6 character is for device class name
Lets try to group similar files (under different conditions) and count number of files in each of the groups.
One: Group based on vendor-name(2nd field)
Two: Group based on vendor-name(2nd field) and serial-number(3rd field)
Three: Group based on vendor-name(2nd field) , serial-number(3rd field) and UNIX-time-stamp(4th field) in hour bucketing*
*hour bucketing :
e.g: 'Fri Oct 9 09:51:55 UTC 2009' and 'Fri Oct 9 09:01:55 UTC 2009' will fall to the same bucket of Fri Oct 9 09:00:00 UTC 2009
Four: Group based on vendor-name(2nd field) and first 4 characters of device-class (5th field)
Five: Group based on
vendor-name(2nd field),
serial-number(3rd field) ,
UNIX-time-stamp(4th field) in hour bucketing
and first 4 characters of device-class (5th field)
Hope you find it useful.
Related post:
- SQL Sum of and group by using awk
- Group by Clause functionality using awk
- Associative array in awk
debug.vendor-name.some-serial-number.epoch-time-stamp. device-class.log
where:
epoch-time-stamp
is the UNIX time stamp when the log file is generated.
device-class
first 4 character of this number represent the service-name of the device and next 6 character is for device class name
$ ls -1
debug.cisco.0001.1254059837.svc1class2.log
debug.cisco.0001.1255058827.svc1class3.log
debug.cisco.0001.1255058827.svc2class3.log
debug.cisco.0001.1255058837.svc1class2.log
debug.cisco.0001.1255059834.svc2class3.log
debug.cisco.0002.1255059819.svc1grade2.log
debug.cisco.0002.1255059849.svc1class1.log
debug.cisco.0002.1255059849.svc2class1.log
debug.juniper.0001.1255059831.svc1class2.log
Lets try to group similar files (under different conditions) and count number of files in each of the groups.
One: Group based on vendor-name(2nd field)
$ ls | awk -F "." '{count[$2]++}END{for(j in count) print j,"["count[j]"]"}'
Output:
cisco [8]
juniper [1]
Two: Group based on vendor-name(2nd field) and serial-number(3rd field)
$ ls | awk -F "." '{count[$2" "$3]++}END{for(j in count) print j,"["count[j]"]"}'
Output:
cisco 0002 [3]
juniper 0001 [1]
cisco 0001 [5]
Three: Group based on vendor-name(2nd field) , serial-number(3rd field) and UNIX-time-stamp(4th field) in hour bucketing*
$ ls | awk -F "." '{count[$2" "$3" "$4-($4%3600)]++}
END{for(j in count) print j,"["count[j]"]"}'
Output:
juniper 0001 1255057200 [1]
cisco 0001 1254056400 [1]
cisco 0002 1255057200 [3]
cisco 0001 1255057200 [4]
*hour bucketing :
e.g: 'Fri Oct 9 09:51:55 UTC 2009' and 'Fri Oct 9 09:01:55 UTC 2009' will fall to the same bucket of Fri Oct 9 09:00:00 UTC 2009
Four: Group based on vendor-name(2nd field) and first 4 characters of device-class (5th field)
$ ls | awk -F "." '
{ $5 = substr($5, 0, 4) }
{count[$2" "$5]++}
END{for(j in count) print j,"["count[j]"]"}'
Output:
juniper svc1 [1]
cisco svc1 [5]
cisco svc2 [3]
Five: Group based on
vendor-name(2nd field),
serial-number(3rd field) ,
UNIX-time-stamp(4th field) in hour bucketing
and first 4 characters of device-class (5th field)
$ ls | awk -F "." '
{ $5 = substr($5, 0, 4) }
{count[$2" "$3" "$4-($4%86400)" "$5]++}
END {for(j in count) print j,"["count[j]"]"}'
Output:
cisco 0002 1255046400 svc1 [2]
cisco 0002 1255046400 svc2 [1]
juniper 0001 1255046400 svc1 [1]
cisco 0001 1255046400 svc1 [2]
cisco 0001 1255046400 svc2 [2]
cisco 0001 1254009600 svc1 [1]
Hope you find it useful.
Related post:
- SQL Sum of and group by using awk
- Group by Clause functionality using awk
- Associative array in awk
Date: Wednesday, 07 Oct 2009 11:18
Suppose:
Now, if you need to print the parent path from the above path (i.e. print '/dir1/dir2/dir3')
Using Sub-string Removal ways in Bash shell
${string%substring}
It deletes shortest match of $substring from 'back' of $string.
If you need to print the last directory name from the above mypath, here are few ways:
Using Sub-string Removal ways in Bash shell
${string##substring}
It deletes the "longest" match of $substring from 'front' of $string.
Similar post:
- Truncate string using bash script
$ mypath=/dir1/dir2/dir3/dir4
$ echo $mypath
/dir1/dir2/dir3/dir4
Now, if you need to print the parent path from the above path (i.e. print '/dir1/dir2/dir3')
$ dirname $mypath
/dir1/dir2/dir3
$ parentpath=$(dirname $mypath)
$ echo $parentpath
/dir1/dir2/dir3
Using Sub-string Removal ways in Bash shell
${string%substring}
It deletes shortest match of $substring from 'back' of $string.
$ echo ${mypath%/*}
/dir1/dir2/dir3
or
$ printf '%s\n' "${mypath%/*}"
/dir1/dir2/dir3
If you need to print the last directory name from the above mypath, here are few ways:
Using Sub-string Removal ways in Bash shell
${string##substring}
It deletes the "longest" match of $substring from 'front' of $string.
$ echo ${mypath##*/}
dir4
Another way using awk:
$ echo $mypath | awk '{print $NF}' FS=\/
dir4
Similar post:
- Truncate string using bash script
Date: Friday, 02 Oct 2009 21:45
Directory '/home/user/work/demo/' contains a few regular files and two directories say "part2"(size=41236 KB) and "libs"(size=20620 KB).
From Linux/UNIX DU(1) command man page:
-s, --summarize
display only a total for each argument.
So, the following command is going to display the total size of the directory '/home/user/work/demo/'
Now, if you need to find the size of the '/home/user/work/demo/' directory excluding the size of the sub-directories, there is a command line option with DU(1):
-S, --separate-dirs
do not include size of sub-directories
So
Now,
$ du ~/work/demo/
41236 /home/user/work/demo/part2
20620 /home/user/work/demo/libs
87640 /home/user/work/demo/
From Linux/UNIX DU(1) command man page:
-s, --summarize
display only a total for each argument.
So, the following command is going to display the total size of the directory '/home/user/work/demo/'
$ du -s ~/work/demo/
87640 /home/user/work/demo/
Now, if you need to find the size of the '/home/user/work/demo/' directory excluding the size of the sub-directories, there is a command line option with DU(1):
-S, --separate-dirs
do not include size of sub-directories
So
$ du -S ~/work/demo/
41236 /home/user/work/demo/part2
20620 /home/user/work/demo/libs
25784 /home/user/work/demo/
Now,
$ du -S --max-depth=0 ~/work/demo/
25784 /home/user/work/demo/
or
$ du -S ~/work/demo/ | awk 'END {print}'
25784 /home/user/work/demo/
Date: Wednesday, 30 Sep 2009 11:23
Input file:
Output required:
For all the items, calculate the amount left after expense i.e.
For an item:
Amount Left = (BudgetAmount - (Expense@2006 + Expense@2007 + Expense@2008 + Expense@2009))
i.e. required output:
Microsoft Excel representation of the above:

The awk program:
Related post:
- Bash script for sequential subtraction of numbers
$ cat expense.txt
Particulars,Item1,Item2,Item3
BudgetAmount,12000,4560,5000
Expense@2006,1800,3000,250
Expense@2007,2210,2100,3000
Expense@2008,100,1500,320
Expense@2009,0,100,20
Output required:
For all the items, calculate the amount left after expense i.e.
For an item:
Amount Left = (BudgetAmount - (Expense@2006 + Expense@2007 + Expense@2008 + Expense@2009))
i.e. required output:
BudgetAmount,12000,4560,5000
Expense@2006,1800,3000,250
Expense@2007,2210,2100,3000
Expense@2008,100,1500,320
Expense@2009,0,100,20
Amount Left,7890,-2140,1410
Microsoft Excel representation of the above:
The awk program:
$ awk 'BEGIN { FS=OFS="," }
$1 == "BudgetAmount" {
bI1 = $2
bI2 = $3
bI3 = $4
}
/^Expense@/ {
bI1 -= $2
bI2 -= $3
bI3 -= $4
}
END {
print "Amount Left",bI1, bI2, bI3
}' expense.txt
Related post:
- Bash script for sequential subtraction of numbers
Date: Tuesday, 29 Sep 2009 10:36
Input file: Each line of 'num.txt' contains 2 numbers (say A and B).
Required: Calculate and print percentage (A/B)*100 with the following conditions:
- If percentage is less than 100, print the calculated actual percentage
- If percentage is more than 100, print the percentage as 100
First solution:
Lets do some text alignment and formatting using awk.
Or a different look of the above script:
Another way of writing if else in AWK.
Related post:
- Calculate percentage using awk in bash
- Align text with awk printf function
$ cat num.txt
34,140
190,140
89,120
110,110
210,115
Required: Calculate and print percentage (A/B)*100 with the following conditions:
- If percentage is less than 100, print the calculated actual percentage
- If percentage is more than 100, print the percentage as 100
First solution:
$ awk '
BEGIN {FS=OFS=","}
{if($1>$2) {print $0,100}
else {print $0,($1/$2)*100}
}' num.txt
Output:
34,140,24.2857
190,140,100
89,120,74.1667
110,110,100
210,115,100
Lets do some text alignment and formatting using awk.
$ awk '
BEGIN {FS="," ; {printf "%-10s%-8s%s\n","A","B","% age"}}
{if($1>=$2) {printf "%-10s%-8s%s\n",$1,$2,100}
else {printf "%-10s%-8s%2.2f\n",$1,$2,($1/$2)*100}
}' num.txt
Output:
A B % age
34 140 24.29
190 140 100
89 120 74.17
110 110 100
210 115 100
Or a different look of the above script:
$ awk '
BEGIN {
FS="," ; FORMAT="%-10s%-8s%s\n" ;
{printf FORMAT,"A","B","% age"}
}
{
if($1>=$2) {printf FORMAT,$1,$2,100}
else {printf FORMAT,$1,$2,($1/$2)*100}
}' num.txt
Output:
A B % age
34 140 24.2857
190 140 100
89 120 74.1667
110 110 100
210 115 100
Another way of writing if else in AWK.
$ awk '
{printf("%-10s%-8s%2.2f\n",\
$1,$2, ($1<=$2) ? ($1/$2)*100 : 100)
}' FS="," num.txt
Output:
34 140 24.29
190 140 100.00
89 120 74.17
110 110 100.00
210 115 100.00
Related post:
- Calculate percentage using awk in bash
- Align text with awk printf function
Date: Friday, 25 Sep 2009 00:55
$ add="20010db885a3000000008a2e03707334"
$ echo $add
20010db885a3000000008a2e03707334
Required output: Insert a colon ':' after every 4 characters in the above line.
So the output required:
2001:0db8:85a3:0000:0000:8a2e:0370:7334
Using awk:
$ echo $add | awk -F "" '
{for(i=1;i<=NF;i++){printf("%s%s",$i,i%4?"":":")}}'|awk '{sub(/:$/,"")};1'
Note: Mind the use of "" as the field separator.
Using sed:
$ echo $add | sed 's/..../&:/g;s/:$//'
Related post:
- Break a line into multiple lines using awk and sed
Date: Wednesday, 23 Sep 2009 00:23
For example:
Output required:
Replace "**" with a newline, so that above line becomes:
The sed replacement:
So you would need to escape the asterisk above.
i.e.
Another way using sed:
Similarly:
Related post:
1)
Suppose your i/p line is:
And you wish to split the above line into multiple lines (each line with say 3 entries)
i.e.
here is a post
2)
One more related post of breaking a line into multiple lines based on length
$ countries="**India **South Africa **Sri Lanka **West Indies"
$ echo $countries
**India **South Africa **Sri Lanka **West Indies
Output required:
Replace "**" with a newline, so that above line becomes:
**India
**South Africa
**Sri Lanka
**West Indies
The sed replacement:
$ echo $countries | sed 's! **!\n**!g'
sed: -e expression #1, char 12: Invalid preceding regular expression
So you would need to escape the asterisk above.
i.e.
$ echo $countries | sed 's! \*\*!\n**!g'
Another way using sed:
$ echo $countries | sed 's! \*\*!\
\*\*!g'
Similarly:
$ echo "a,b,c,d" | sed 's!,!\n!g'
Output:
a
b
c
d
Related post:
1)
Suppose your i/p line is:
1 b 3 4 e 6 g 8 i j k
And you wish to split the above line into multiple lines (each line with say 3 entries)
i.e.
1 b 3
4 e 6
g 8 i
j k
here is a post
2)
One more related post of breaking a line into multiple lines based on length
Date: Monday, 21 Sep 2009 21:47
Input file:
Required:
Print all columns from the above file except column number 2 and 7.
i.e. required output:
Basically for the above file we have to print column # 1,3,4,5,6,8
i.e.
But if number of fields is very large on the input file, the above method is not going to be so useful. So here is another technique.
And if you want to exclude a range of column numbers (say exclude column 3 to column 6) here is my earlier post
An additional tip:
Suppose you need to generate the print 'statement' for printing a number of consecutive fields for an awk program, here is a quick way:
Or you can use the the for loop mentioned above.
$ cat details.txt
AX|23.45|1932323|A|VI|-|Y|0
TY|93.45|2932323|B|VI|-|Y|1
RE|63.25|8932323|A|VI|0|N|1
AY|83.85|0932323|C|VI|-|Y|0
Required:
Print all columns from the above file except column number 2 and 7.
i.e. required output:
AX|1932323|A|VI|-|0
TY|2932323|B|VI|-|1
RE|8932323|A|VI|0|1
AY|0932323|C|VI|-|0
Basically for the above file we have to print column # 1,3,4,5,6,8
i.e.
$ awk '
BEGIN{FS=OFS="|"}{print $1,$3,$4,$5,$6,$8}
' details.txt
But if number of fields is very large on the input file, the above method is not going to be so useful. So here is another technique.
$ awk '
BEGIN{FS=OFS="|"}
{ for (i=1; i<=NF;i++)
if( i==2 || i==7 ) continue
else
printf("%s%s", $i,(i!=NF) ? OFS : ORS)}
' details.txt
And if you want to exclude a range of column numbers (say exclude column 3 to column 6) here is my earlier post
An additional tip:
Suppose you need to generate the print 'statement' for printing a number of consecutive fields for an awk program, here is a quick way:
$ seq -s ",$" 1 8 | sed 's/.*/{print $&}/'
Output:
{print $1,$2,$3,$4,$5,$6,$7,$8}
Or you can use the the for loop mentioned above.
Date: Saturday, 19 Sep 2009 00:54
Question. How to find out how long a process is running in an UNIX system ?
Ans: Here are some tips to find the process' running time in an UNIX system.
----------------
For 2.6 kernels:
----------------
Identify your process Id
and then do a
So the modification time listed on the above file(directory) is the time that the process has started.
e.g. I have started a process say "sleep 10000" few minutes back
So, "2009-09-18 22:14" is the start time of the above sleep process; if I subtract this time from the current time I can find how long this process has been running.
For subtraction you can have a script like this:
So that you can execute like this:
----------------
For 2.4 kernels:
----------------
For 2.4 kernels, the modification time on the "/proc/PID-OF-YOUR-PROCESS" will be the current system time (unlike 2.6 kernels where its the actual process start time)
So how to find the running time of a process on a 2.4 kernel UNIX system ?
Here is the way (this is going to work for 2.6 also)
e.g.
so 7702 is the pid of the above process.
$ pd=7702
$ expr $(awk '{print $1}' FS=\. /proc/uptime) - $(awk '{printf ("%10d\n",$22/100)}' /proc/$pd/stat)
The output will show the number of seconds the process(with pid=pd) is running.
Ans: Here are some tips to find the process' running time in an UNIX system.
----------------
For 2.6 kernels:
----------------
Identify your process Id
and then do a
ls -ld /proc/PID-OF-YOUR-PROCESS
So the modification time listed on the above file(directory) is the time that the process has started.
e.g. I have started a process say "sleep 10000" few minutes back
$ ps -ef | grep "[s]leep 10000"
jsaikia 24375 23306 0 22:13 pts/10 00:00:00 sleep 10000
$ ls -ld /proc/24375
dr-xr-xr-x 6 jsaikia staff 0 2009-09-18 22:14 /proc/24375
So, "2009-09-18 22:14" is the start time of the above sleep process; if I subtract this time from the current time I can find how long this process has been running.
For subtraction you can have a script like this:
#!/bin/sh
T1=$(date +%s -d "$1")
T2=$(date +%s -d "$2")
((diffsec=T1-T2))
echo - \
| awk -v D=$diffsec '{printf "%d:%d:%d\n",D/(60*60),D%(60*60)/60,D%60}'
So that you can execute like this:
$ sh cal-tdiff.sh "$(date)" "2009-09-18 22:14"
0:22:56
----------------
For 2.4 kernels:
----------------
For 2.4 kernels, the modification time on the "/proc/PID-OF-YOUR-PROCESS" will be the current system time (unlike 2.6 kernels where its the actual process start time)
So how to find the running time of a process on a 2.4 kernel UNIX system ?
Here is the way (this is going to work for 2.6 also)
e.g.
$ ps -ef | grep [s]leep
root 7702 7689 0 17:34 pts/0 00:00:00 sleep 100000
so 7702 is the pid of the above process.
$ pd=7702
$ expr $(awk '{print $1}' FS=\. /proc/uptime) - $(awk '{printf ("%10d\n",$22/100)}' /proc/$pd/stat)
The output will show the number of seconds the process(with pid=pd) is running.
Date: Friday, 18 Sep 2009 00:16
Contents of my "inputdir" is a set of files with filename like this:
Directory "testcfgs" contains a set of config xmls.
Required:
For each file of name "log.X.Y.timestamp.txt" in "inputdir", copy the corresponding "cfg_X_Y.xml" config file from "testcfgs" to a directory say "requiredcfgs".
A simple practical bash one liner script:
The two lines above for finding X and Y value can be replaced by a single line using 'eval with awk', like this:
Contents of "requiredcfgs" directory after execution of the above bash script.
Related post on eval with awk:
- Subdivide an ip address - assign each part to an variable using awk
$ ls -1 inputdir/
log.10.16.1253168140.txt
log.11.5.1253168345.txt
log.11.9.1253168347.txt
log.12.1.1253168347.txt
log.19.1.1253168140.txt
Directory "testcfgs" contains a set of config xmls.
$ ls -1 testcfgs/
cfg_10_16.xml
cfg_10_5.xml
cfg_11_5.xml
cfg_11_9.xml
cfg_12_1.xml
cfg_19_1.xml
cfg_19_2.xml
cfg_91_9.xml
Required:
For each file of name "log.X.Y.timestamp.txt" in "inputdir", copy the corresponding "cfg_X_Y.xml" config file from "testcfgs" to a directory say "requiredcfgs".
A simple practical bash one liner script:
$ for filename in $(ls -1 inputdir/)
> do
> X=$(echo "$filename" | cut -d"." -f2)
> Y=$(echo "$filename" | cut -d"." -f3)
> cp testcfgs/cfg_$X\_$Y.xml requiredcfgs/
> done
The two lines above for finding X and Y value can be replaced by a single line using 'eval with awk', like this:
$ for filename in $(ls -1 inputdir/)
> do
> eval $(echo "$filename" | awk -F "." '{print "X="$2";Y="$3}')
> cp testcfgs/cfg_$X\_$Y.xml requiredcfgs/
> done
Contents of "requiredcfgs" directory after execution of the above bash script.
$ ls -1 requiredcfgs/
cfg_10_16.xml
cfg_11_5.xml
cfg_11_9.xml
cfg_12_1.xml
cfg_19_1.xml
Related post on eval with awk:
- Subdivide an ip address - assign each part to an variable using awk
Date: Tuesday, 15 Sep 2009 10:37
I have already post on Linux/UNIX seq command, using which we can generate sequence of numbers. Seq is very useful to generate loop arguments in UNIX bash scripting.
One of very useful seq command line option is -f
-f, --format=FORMAT
use printf style floating-point FORMAT
Lets see some simple examples on the same.
Now, to create 10 files with names logfile01.txt, logfile02.txt ,....., logfile10.txt
FIRST INCREMENT LAST
Sequence numbers between 1.0003 and 1.0012 with an increment of .00002
-s is to specify separator between sequence numbers.
Another good command for printing sequential and random data is jot, read here
Related post:
- Ways of writing for loops in bash scripting
- Print text within style box in bash scripting
One of very useful seq command line option is -f
-f, --format=FORMAT
use printf style floating-point FORMAT
Lets see some simple examples on the same.
$ seq -f "%04g" 3
Output:
0001
0002
0003
$ seq -f "logfile%02g.txt" 10
Output:
logfile01.txt
logfile02.txt
logfile03.txt
logfile04.txt
logfile05.txt
logfile06.txt
logfile07.txt
logfile08.txt
logfile09.txt
logfile10.txt
Now, to create 10 files with names logfile01.txt, logfile02.txt ,....., logfile10.txt
$ touch $(seq -f "logfile%02g.txt" 10)
$ ls -1
logfile01.txt
logfile02.txt
logfile03.txt
logfile04.txt
logfile05.txt
logfile06.txt
logfile07.txt
logfile08.txt
logfile09.txt
logfile10.txt
FIRST INCREMENT LAST
Sequence numbers between 1.0003 and 1.0012 with an increment of .00002
$ seq -f "1.%04g" 3 2 12
1.0003
1.0005
1.0007
1.0009
1.0011
-s is to specify separator between sequence numbers.
$ seq -s "+" -f "1.%04g" 3 2 12
1.0003+1.0005+1.0007+1.0009+1.0011
$ seq -s "+" -f "1.%04g" 3 2 12 | bc
5.0035
Another good command for printing sequential and random data is jot, read here
Related post:
- Ways of writing for loops in bash scripting
- Print text within style box in bash scripting
» © All content and copyrights belong to their respective authors.«
» © FeedShow - Online RSS Feeds Reader







