I have a script which takes a tab-delmited file for regression tests, and converts it xml. I want to do a sanity check, to make sure that the number of utterances in my xml files matches the number in the tab-delimited.txt file. I can do this in 2 lines in UNIX


robert_felty$ wc -l samples2.txt
72148 samples2.txt
robert_felty$ find . -name '*.xml' | xargs grep -c "
In the first line, I count the number of lines (there is a heade line, so I will be expecting 1 fewer lines)

In the next line, I find all the .xml file using find, then pipe that to xargs, where I use "grep -c" to count the number of matches to the utternace pattern. grep -c outputs rows like this
filename:count
I want to sum up all the counts, so I cut out just the count field using cut, then I use awk to sum up all the counts.

I love UNIX pipelines!