Awk Snippet for Extracting XML Messages with a Given ID from a Log File
I recently had occasion to extract XML messages with a specific id from log files, resulting in the following gnarly awk snippet. The sed statement at the end is for stripping out some unwanted debug statements. There must be a better way to do this, let me know in the comments (except don’t, because they are disabled, because apparently only spammers read this blog).
#$1 id tag (e.g. if the identifier is specified with <id>1234</id>, use id) #$2 id (e.g. if the identifier is enclosed with <id>1234</id>, use id) #$3 message separator (e.g. </message>) #$4 file prefix (e.g. if there are multiple messages in the file with the same id, use this prefix when writing out the resulting files) #$5 log file (the name of the file to read) extract_message(){ awk -v var="<${1}>${2}<\\\/${1}>" \ -v ORS="</${3}>\n" \ -v RS="<\\\/${3}>" \ '$0 ~ var {print}' $5 | sed 's/^[0-9][0-9].*DEBUG.*</</' \ | awk -v pre="${4}" '{print $0 > pre NR}' RS='\\n\\n' - }