Feb 27, 2015

Powerful awk

Awk is really a powerful script language. We can perform a large tasks within a few line of codes. Just for example:

Consider we have a sample spreadsheet with the information as below:

test_awk.csv

Consider we want to find how many different records are in the 3rd column, we can use the following command in the terminal or bash:

           awk '{print $3}' test_awk.csv | sort -n | uniq -c

Output:

       1 x121REB000
     12 x121REB227
      6 x131REB198
      4 x131REB205
     20 x131REB221
      4 x131REB254
      3 x131REB294
      1 x131REB317
      5 x131REB392


And, consider we want to find number of TRUE and number of FALSE, for each uniuqe record in the column 3, we can run following commands:

          awk '{print $3, /TRUE/, /FALSE/}' test_awk.csv | sort -n | uniq -c

Output:

 Count   Code         T  F 
      1  x121REB000 0 1
     11 x121REB227 0 1
      1  x121REB227 1 0
      1  x131REB198 0 1
      4  x131REB198 1 0
      1  x131REB198 1 1
      3  x131REB205 0 1
      1  x131REB205 1 0
     17 x131REB221 0 1
      3  x131REB221 1 0
      2  x131REB254 0 1
      1  x131REB254 1 0
      1  x131REB254 1 1
      2  x131REB294 0 1
      1  x131REB294 1 0
      1  x131REB317 0 1
      4  x131REB392 0 1
      1  x131REB392 1 0

Lastly, for each record in column no. 3,  to find the unique record in field number 2, we can use the command:

          awk '{print $3, $2}' test_awk.csv | sort -n | uniq

Output:

Code-3 Code-2
x121REB000 5921
x121REB227 5724
x121REB227 9087
x131REB198 1443
x131REB198 1841
x131REB198 2339
x131REB198 5261
x131REB198 7284
x131REB198 9758
x131REB205 1446
x131REB205 1958
x131REB205 2285
x131REB205 5999
x131REB221 3866
x131REB221 6171
x131REB221 9616
x131REB221 9898
x131REB254 4732
x131REB254 6078
x131REB254 7679
x131REB254 9112
x131REB294 1025
x131REB294 3369
x131REB294 6006
x131REB317 6452
x131REB392 4045
x131REB392 4898



No comments:

Post a Comment