Thursday, September 2, 2010

Shell scripting technique for finding unique strings

I recently had to search through a bunch of log files to find a bunch of entries and count how many times they occurred. Shell scripting (via Cygwin) to the rescue!

I was looking for strings in this format

calc_name=RegularIRA
calc_name=Savings

Here is the solution:

grep -oh "calc_name=\w*" * | sort | uniq -c > calculator_counts.txt

This searches all files in the current directory for the pattern "calc_name=\w*" (which stops as soon as a non word character (like a symbol) is found. Then it sorts them, and runs the "uniq" command to get a count of unique occurrences. Then the output is piped to a file.

The output looks like this:

1332 Annuity
  59 AssetAllocator
4411 AutoEquityLoan
 119 AutoLoan
   4 AutoPayoff
 333 AutoRebate

2 comments:

  1. I should also note, if you want to use Perl-compatible regular expressions, add the "-P" flag to the "grep" command, like so:

    grep -Poh "calc_name=[\w.]*"

    ReplyDelete
  2. That is why you were johnny on the spot with the suggestion earlier... :)

    ReplyDelete

Note: Only a member of this blog may post a comment.