extract the contents in between the tags

Friday, December 28, 2012 1 Comments


<?xml version="1.0" encoding="utf-8"?>
<job xmlns="http://www.sample.com/">programming</job>

I need a way to extract what is in the <job..> </job> tags, programmin in this case.
This should be done on linux command prompt, using grep/sed/awk.

solution-1
----------
grep '<job' file_name | cut -f2 -d">"|cut -f1 -d"<"

solution-2
----------
sed -ne '/<\/job>/ { s/<[^>]*>\(.*\)<\/job>/\1/; p }'
notes: -n stops it outputting everything automatically;
-e means it's a one-liner (aot a script) /<\/job> acts like a grep;
s strips the opentag + attributes and endtag;
; is a new statement;
p prints;
{} makes the grep apply to both statements, as one

1 comment: