extract the contents in between the tags
Friday, December 28, 2012 1 Comments

<?xml version="1.0" encoding="utf-8"?>

<job xmlns="http://www.sample.com/">programming</job>

I need a way to extract what is in the <job..> </job> tags, programmin in this case.

This should be done on linux command prompt, using grep/sed/awk.

solution-1

----------

grep '<job' file_name | cut -f2 -d">"|cut -f1 -d"<"

solution-2

----------

sed -ne '/<\/job>/ { s/<[^>]*>\(.*\)<\/job>/\1/; p }'

notes: -n stops it outputting everything automatically;

-e means it's a one-liner (aot a script) /<\/job> acts like a grep;

s strips the opentag + attributes and endtag;

; is a new statement;

p prints;

{} makes the grep apply to both statements, as one

>The Unix Shell