Thursday, September 2, 2010

Shell scripting technique for finding unique strings

I recently had to search through a bunch of log files to find a bunch of entries and count how many times they occurred. Shell scripting (via Cygwin) to the rescue!

I was looking for strings in this format

calc_name=RegularIRA
calc_name=Savings

Here is the solution:

grep -oh "calc_name=\w*" * | sort | uniq -c > calculator_counts.txt

This searches all files in the current directory for the pattern "calc_name=\w*" (which stops as soon as a non word character (like a symbol) is found. Then it sorts them, and runs the "uniq" command to get a count of unique occurrences. Then the output is piped to a file.

The output looks like this:

1332 Annuity
  59 AssetAllocator
4411 AutoEquityLoan
 119 AutoLoan
   4 AutoPayoff
 333 AutoRebate