>The Unix Shell

Recently I found some difficulty performing some string handling in shell script especially in bash.
Even If I google for any string handling in shell script, I am bombarded with a lot of stuff but with tools like awk, perl, sed, python but very rarely in bash. So I thought it would be good to have useful string handling at one place. Below are some use case I would like to show:

Length of a string.
Substring from a particular position till the end.
Substring from one position to another inside the same string.
Delete a shortest substring(if exists) from front of a string.
Delete a shortest substring(if exists) from back of a string.
Delete a longest substring(if exists) from front of a string.
Delete a longest substring(if exists) from back of a string.
Find and Replace single match
Find and Replace all match

Note:The red ones are more frequently used and more confusing as well.

Length of a string

Normally while working on PL/SQL stored procedures , we use
DBMS_TRACE for knowing about the values of various variables and put some print statements.
This looks fine until you use your stored procedures on cli. But Lets assume your stored procedure is being called from a different process,lets say a c++ or a Java process. In this case you dont have a way to send your log statements to a log file unless you have some admin permissions. But as a designer you might not have the rights with you.

This procedure below will present you a way to put traces in your PLSQL procedure and see the values.
Below is a shell script which provides you with that provision.
Simple idea is insert our trace statements in a table.

#!/bin/ksh
#Get the filename and filepath of the input files for MD process

CONNECT_STRING=${DATABASE}

if [ $1 = "create" ];then
CRE_RESULT=`sqlplus -s ${CONNECT_STRING} << EOF
create table DEBUG_MESSAGES(
 current_date DATE,
 message varchar2(500)
);

CREATE OR REPLACE PROCEDURE DEBUG_OUTPUT( p_text IN VARCHAR2 ) AS
BEGIN
      insert into DEBUG_MESSAGES (current_date,message)
      values (sysdate, p_text);
      commit;
END;

exit;
EOF`

echo $CRE_RESULT
fi

if [ $1 = "clean" ];then
CLN_RESULT=`sqlplus -s ${CONNECT_STRING} << EOF
drop table DEBUG_MESSAGES;
drop procedure DEBUG_OUTPUT;
exit;
EOF`

echo $CLN_RESULT

fi

Now You can add your trace statements inside your procedure like below:

 DEBUG_OUTPUT('Hello World!');
 DEBUG_OUTPUT('Hello World!'||vMyVariable);

After you execute your procedure , your trace statements will be available in the table DEBUG_MESSAGES . You can go there and check

Lets say I have C/C++ file which has some pattern in the line at the start. I also know that there will be 2 lines following the line that will match my pattern. I want to remove these three lines and add a new line which has a different string which is nothing but I want to replace a all those 3 three lines with a different line.

MTTRACE("ARG1",
      "ARG2",
     "ARG3");

//some code follows
MTTRACE("ARG1",
      "ARG2",
     "ARG3");

New fIle should look as below:

MYTRACE(ARG);
//some code follows
MYTRACE(ARG);

Solution:

perl -pe '$x=3  if /^\s*MTTRACE.*/;
if($x>0){$x-- and undef $_;$_="MYTRACE(ARG);\n" if $x==0}' myfile.cc

With the look-ahead and look-behind constructs ,you can "roll your own" zero-width assertions to fit your needs. You can look forward or backward in the string being processed, and you can require that a pattern match succeed (positive assertion) or fail (negative assertion) there.
Every extended pattern is written as a parenthetical group with a question mark as the first character. The notation for the look-arounds is fairly mnemonic, but there are some other, experimental patterns that are similar, so it is important to get all the characters in the right order.

(?=pattern)

is a positive look-ahead assertion

(?!pattern)

is a negative look-ahead assertion

(?<=pattern)

is a positive look-behind assertion

(?<!pattern)

is a negative look-behind assertion

EXAMPLES
Look-Ahead:

echo $mytmp2
uvw_abc uvw_def uvw_acb

Positive:

echo $mytmp2 | perl -pe 's/uvw_(?=(abc|def))/xyz_/g'
xyz_abc xyz_def uvw_acb

Description: replace every occurance of uvw_ with xyz_ where uvw_ followed by abc or def

Negative:

echo $mytmp2 | perl -pe 's/uvw_(?!(abc|def))/xyz_/g'
uvw_abc uvw_def xyz_acb

Description: replace every occurance of uvw_ with xyz_ where uvw_ is not followed by abc or def

Look-Behind:

echo $mytmp
abc_uvw def_uvw acb_uvw

Positive:

echo $mytmp | perl -pe 's/(?<=(abc|def))_uvw/_xyz/g'
abc_xyz def_xyz acb_uvw

Description: replace every occurance of _uvw with _xyz where _uvw is preceeded by abc or def

Negative:

echo $mytmp | perl -pe 's/(?<!(abc|def))_uvw/_xyz/g'
abc_uvw def_uvw acb_xyz

Description: replace every occurance of _uvw with _xyz where _uvw is not preceeded by abc or def

Have you ever tried this. Dont go on writing big perl code for this. Here's a simple solution for this.

my @arr=split /\S+/,$str;

where
$str is your string
\s obviously matches a white space character. But \S matches a non white space character.
So \S+ matches atleast one non white space character.

This can be handy many a times when you are working on a C++ application. There is a no direct method in the standard to do the same except when you are using a boost library. Below is a simple function that I use regularly in my applications which comes in handy for me all the time

template<class T>
int inline findAndReplace(T& source, const T& find, const T& replace)
{
    int num=0;
    int fLen = find.size();
    int rLen = replace.size();
    for (int pos=0; (pos=source.find(find, pos))!=T::npos; pos+=rLen)
    {
        num++;
        source.replace(pos, fLen, replace);
    }
    return num;
}

I have input file that look's like :

cellIdentity="42901"
cellIdentity="42902"
cellIdentity="42903"
cellIdentity="52904"

Numbers inside the quotes can be anything. The output needed is original line followed by the copy of same line except the last digit of the number should be a series of 5,6,7. So the output should look like below:

cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"
cellIdentity="52904"
cellIdentity="52905"

Below is the Perl command that I have written.

perl -pe 'BEGIN{$n=4}$n++;
          $n>7?$n=5:$n;
          $a=$_;print $a;
          s/(\d).$/$n."\""/ge'

This is an awk assignment given to one of my friend. Its quite challenging. We have two files: File1:(List of companies)

Joe's Garage
Pip Co
Utility Muffin Research Kitchen

File2:(List of payments and dues of the companies in File1)

Pip Co                          $20.13   due
Pip Co                          $20.3   due
Utility Muffin Research Kitchen $2.56    due
Utility Muffin Research Kitchen 2.56    due
Joe's Garage                    $120.28  due
Joe's Garage                    $100.24 payment

Now the challenge is we need to create an output file which states the total amount due by each company. Additionally there is one more requirement where we need to handle the format errors in teh File2.

The list of fomrat errors to be handled are:
The dollor symbol not present in the amount

There should be exactly 2 decimals after the decimal point.

If any of the above format errors are encountered, then the complete line should be ignored and proceed to the next line.
The expected output here is:

Joe's Garage $20.04
Utility Muffin Research Kitchen $2.56
Pip Co $20.13

Below is the awk script that I have written for this. and its working at my side.

{
   if(FNR==NR)
   {
          for(i=1;i<=NF;i++)
          str=str","$i;
          a[str]=1;str="";
          next;
   }
   {
   if($(NF-1)!~/^\$/)
   {
   print "Format Error!-No dollor sign"FNR,FILENAME,$(NF-1);
   next;
   }
   if($(NF-1)!~/\.[0-9][0-9]$/)
   {
   print "Format Error!-should have 2 digits after a decimal point"FNR,FILENAME,$(NF-1);
   next;
   }
   for(i=1;i<(NF-1);i++)str=str","$i;
   if(a[str]){
   gsub(/\$/,"",$(NF-1));
   if($NF~/payment/){
     a[str]-=$(NF-1);}
   else if($NF~/due/){
     a[str]+=$(NF-1);}
   }
   str="";
  }
}
END{ 
   for(i in a)
   {
    t=i;
    gsub(/,/," ",t);
    print t,"$"(a[i]-1);
   }
}

I am sure that this can be optimized. I put it long so that its more convincing to all. Below is the way we have to execute this. I am using nawk on solaris.Others can use awk itself. Copy the above code in a file and name it as mycode.awk and then execute the awk command as below:

nawk -f mycode.awk File1 File2

Out that I have got with the above command is:

> nawk -f temp.awk temp2 temp1
Format Error!-should have 2 digits after a decimal point2 temp1 $20.3
Format Error!-No dollor sign4 temp1 2.56
 Joe's Garage $20.04
 Utility Muffin Research Kitchen $2.56
 Pip Co $20.13
>

Let's say I have a input file which looks like below:

Apr 24 2014;
is;
a;
sample;
;
Jun 24 2014 123;
may 25 2014;
is;
b;
sample;
;
Dec 21 2014 987

I want to merge 6 lines at a time. Which means my output should look like:

Apr 24 2014;is;a;sample;;Jun 24 2014 123
may 25 2014;is;b;sample;;Dec 21 2014 987

Below is a simple command that I would use:

awk '{ORS=(NR%6?"":RS)}1' file

Explanation:
By doing,

ORS=(NR%6?"":RS)

I am setting the output record separator to actual record separator only if line number is a multiple of 6.

In general if there is a need for us to iterate though a string character by character, then we normally split the string using a statement like:

@chars=split("",$var);

Now after the array is created we iterate through that array.But an easy way of doing this in Perl without creating an array is :

while ($var =~ /(.)/sg) {
   my $char = $1;
   print $char."\n"
}

Below is the explanation for the same:

$var =~ /(.)/sg

Match any character though out the string and round braces "()" captures the matched character.
/s
Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
/g
Match all occurrences of the regexp throughout the line instead of only the first occurrence.

>The Unix Shell

Bash strings handling Examples

Debugging a PL/SQL Stored procedure

Seach a string and replace consequent lines in perl

Look ahead and Look behind in perl

Split a string by anything other than spaces

Find and replace a string in c++

Inserting lines in a file using Perl

Comparing two files using awk - An assignement

Joining lines using Awk

Iterating a string through each character

Popular Posts

Facebook

About The Author

Labels

Search This Blog

Blog Archive

Pages

Gallery

Contact Us

Popular Posts

About us

Labels