Bash strings handling Examples

Recently I found some difficulty performing some string handling in shell script especially in bash.
Even If I google for any string handling in shell script, I am bombarded  with a lot of stuff but with tools like awk, perl, sed, python but very rarely in bash. So I thought it would be good to have useful string handling at one place. Below are some use case I would like to show:

  1. Length of a string.
  2. Substring  from a particular position till the end.
  3. Substring  from one position to another inside the same string.
  4. Delete a shortest substring(if exists) from  front of a string.
  5. Delete a shortest substring(if exists) from  back of a string.
  6. Delete a longest substring(if exists) from  front of a string.
  7. Delete a longest substring(if exists) from  back of a string.
  8. Find and Replace single match
  9. Find and Replace all match
Note:The red ones are more frequently used and more confusing as well.

Length of a string




Debugging a PL/SQL Stored procedure

Normally while working on PL/SQL stored procedures , we use
DBMS_TRACE for knowing about the values of various variables and put some print statements.
This looks fine until you use your stored procedures on cli. But Lets assume your stored procedure is being called from a different process,lets say a c++ or a Java process. In this case you dont have  a way to send your log statements to a log file unless you have some admin permissions. But as a designer you might not have  the rights with you.

This procedure below will present you a way to put traces in your PLSQL procedure and see the values.
Below is a shell script which provides you with that provision.
Simple idea is insert our trace statements in a table.

#!/bin/ksh
#Get the filename and filepath of the input files for MD process

CONNECT_STRING=${DATABASE}

if [ $1 = "create" ];then
CRE_RESULT=`sqlplus -s ${CONNECT_STRING} << EOF
create table DEBUG_MESSAGES(
 current_date DATE,
 message varchar2(500)
);

CREATE OR REPLACE PROCEDURE DEBUG_OUTPUT( p_text IN VARCHAR2 ) AS
BEGIN
      insert into DEBUG_MESSAGES (current_date,message)
      values (sysdate, p_text);
      commit;
END;

exit;
EOF`

echo $CRE_RESULT
fi

if [ $1 = "clean" ];then
CLN_RESULT=`sqlplus -s ${CONNECT_STRING} << EOF
drop table DEBUG_MESSAGES;
drop procedure DEBUG_OUTPUT;
exit;
EOF`

echo $CLN_RESULT

fi
Now You can add your trace statements inside your procedure like below:
 DEBUG_OUTPUT('Hello World!');
 DEBUG_OUTPUT('Hello World!'||vMyVariable);
After you execute your procedure , your trace statements will be available in the table DEBUG_MESSAGES . You can go there and check

Seach a string and replace consequent lines in perl

Lets say I have C/C++  file which has some pattern in the line at the start. I also know that there will be 2 lines following the line that will match my pattern. I want to remove these three lines and add a new line which has a different string which is nothing but I want to replace a all those 3 three lines with a different line.

MTTRACE("ARG1",
      "ARG2",
     "ARG3");

//some code follows
MTTRACE("ARG1",
      "ARG2",
     "ARG3");
New fIle should look  as below:

MYTRACE(ARG);
//some code follows
MYTRACE(ARG);
Solution:

perl -pe '$x=3  if /^\s*MTTRACE.*/;
if($x>0){$x-- and undef $_;$_="MYTRACE(ARG);\n" if $x==0}' myfile.cc

Look ahead and Look behind in perl

With the look-ahead and look-behind constructs ,you can "roll your own" zero-width assertions to fit your needs. You can look forward or backward in the string being processed, and you can require that a pattern match succeed (positive assertion) or fail (negative assertion) there.
Every extended pattern is written as a parenthetical group with a question mark as the first character. The notation for the look-arounds is fairly mnemonic, but there are some other, experimental patterns that are similar, so it is important to get all the characters in the right order.
(?=pattern)
is a positive look-ahead assertion
(?!pattern)
is a negative look-ahead assertion
(?<=pattern)
is a positive look-behind assertion
(?<!pattern)
is a negative look-behind assertion
EXAMPLES
Look-Ahead:
echo $mytmp2
uvw_abc uvw_def uvw_acb
Positive:
echo $mytmp2 | perl -pe 's/uvw_(?=(abc|def))/xyz_/g'
xyz_abc xyz_def uvw_acb
Description: replace every occurance of uvw_ with xyz_ where uvw_ followed by abc or def
Negative:
echo $mytmp2 | perl -pe 's/uvw_(?!(abc|def))/xyz_/g'
uvw_abc uvw_def xyz_acb
Description: replace every occurance of uvw_ with xyz_ where uvw_ is not followed by abc or def
Look-Behind:
echo $mytmp
abc_uvw def_uvw acb_uvw
Positive:
echo $mytmp | perl -pe 's/(?<=(abc|def))_uvw/_xyz/g'
abc_xyz def_xyz acb_uvw
Description: replace every occurance of _uvw with _xyz where _uvw is preceeded by abc or def
Negative:
echo $mytmp | perl -pe 's/(?<!(abc|def))_uvw/_xyz/g'
abc_uvw def_uvw acb_xyz
Description: replace every occurance of _uvw with _xyz where _uvw is not preceeded by abc or def

Split a string by anything other than spaces

Have you ever tried this. Dont go on writing big perl code for this. Here's a simple solution for this.
my @arr=split /\S+/,$str;
where
$str is your string
\s obviously matches a white space character. But \S matches a non white space character.
So \S+ matches atleast one non white space character.

Find and replace a string in c++

This can be handy many a times when you are working on a C++ application. There is a no direct method in the standard to do the same except when you are using a boost library. Below is a simple function that I use regularly in my applications which comes in handy for me all the time
template<class T>
int inline findAndReplace(T& source, const T& find, const T& replace)
{
    int num=0;
    int fLen = find.size();
    int rLen = replace.size();
    for (int pos=0; (pos=source.find(find, pos))!=T::npos; pos+=rLen)
    {
        num++;
        source.replace(pos, fLen, replace);
    }
    return num;
}

Inserting lines in a file using Perl

I have input file that look's like :
cellIdentity="42901"
cellIdentity="42902"
cellIdentity="42903"
cellIdentity="52904"
Numbers inside the quotes can be anything. The output needed is original line followed by the copy of same line except the last digit of the number should be a series of 5,6,7. So the output should look like below:
cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"
cellIdentity="52904"
cellIdentity="52905"
Below is the Perl command that I have written.
perl -pe 'BEGIN{$n=4}$n++;
          $n>7?$n=5:$n;
          $a=$_;print $a;
          s/(\d).$/$n."\""/ge'

Comparing two files using awk - An assignement

This is an awk assignment given to one of my friend. Its quite challenging. We have two files: File1:(List of companies)
Joe's Garage
Pip Co
Utility Muffin Research Kitchen
File2:(List of payments and dues of the companies in File1)
Pip Co                          $20.13   due
Pip Co                          $20.3   due
Utility Muffin Research Kitchen $2.56    due
Utility Muffin Research Kitchen 2.56    due
Joe's Garage                    $120.28  due
Joe's Garage                    $100.24 payment
Now the challenge is we need to create an output file which states the total amount due by each company. Additionally there is one more requirement where we need to handle the format errors in teh File2.
  1. The list of fomrat errors to be handled are:
  2. The dollor symbol not present in the amount
There should be exactly 2 decimals after the decimal point.
If any of the above format errors are encountered, then the complete line should be ignored and proceed to the next line.
The expected output here is:
Joe's Garage $20.04
Utility Muffin Research Kitchen $2.56
Pip Co $20.13
Below is the awk script that I have written for this. and its working at my side.
{
   if(FNR==NR)
   {
          for(i=1;i<=NF;i++)
          str=str","$i;
          a[str]=1;str="";
          next;
   }
   {
   if($(NF-1)!~/^\$/)
   {
   print "Format Error!-No dollor sign"FNR,FILENAME,$(NF-1);
   next;
   }
   if($(NF-1)!~/\.[0-9][0-9]$/)
   {
   print "Format Error!-should have 2 digits after a decimal point"FNR,FILENAME,$(NF-1);
   next;
   }
   for(i=1;i<(NF-1);i++)str=str","$i;
   if(a[str]){
   gsub(/\$/,"",$(NF-1));
   if($NF~/payment/){
     a[str]-=$(NF-1);}
   else if($NF~/due/){
     a[str]+=$(NF-1);}
   }
   str="";
  }
}
END{ 
   for(i in a)
   {
    t=i;
    gsub(/,/," ",t);
    print t,"$"(a[i]-1);
   }
}
I am sure that this can be optimized. I put it long so that its more convincing to all. Below is the way we have to execute this. I am using nawk on solaris.Others can use awk itself. Copy the above code in a file and name it as mycode.awk and then execute the awk command as below:
nawk -f mycode.awk File1 File2
Out that I have got with the above command is:
> nawk -f temp.awk temp2 temp1
Format Error!-should have 2 digits after a decimal point2 temp1 $20.3
Format Error!-No dollor sign4 temp1 2.56
 Joe's Garage $20.04
 Utility Muffin Research Kitchen $2.56
 Pip Co $20.13
>

Joining lines using Awk

Let's say I have a input file which looks like below:
Apr 24 2014;
is;
a;
sample;
;
Jun 24 2014 123;
may 25 2014;
is;
b;
sample;
;
Dec 21 2014 987
I want to merge 6 lines at a time. Which means my output should look like:
Apr 24 2014;is;a;sample;;Jun 24 2014 123
may 25 2014;is;b;sample;;Dec 21 2014 987
Below is a simple command that I would use:
awk '{ORS=(NR%6?"":RS)}1' file
Explanation:
By doing,
ORS=(NR%6?"":RS)
I am setting the output record separator to actual record separator only if line number is a multiple of 6.

Iterating a string through each character

In general if there is a need for us to iterate though a string character by character, then we normally split the string using a statement like:
@chars=split("",$var);
Now after the array is created we iterate through that array.But an easy way of doing this in Perl without creating an array is :
while ($var =~ /(.)/sg) {
   my $char = $1;
   print $char."\n"
}
Below is the explanation for the same:
$var =~ /(.)/sg
Match any character though out the string and round braces "()" captures the matched character.
/s 
Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
/g
Match all occurrences of the regexp throughout the line instead of only the first occurrence.