Showing posts with label perl. Show all posts
Showing posts with label perl. Show all posts

Look ahead and Look behind in perl

5.6.14
With the look-ahead and look-behind constructs ,you can "roll your own" zero-width assertions to fit your needs. You can look forward or backward in the string being processed, and you can require that a pattern match succeed (positive assertion) or fail (negative assertion) there.
Every extended pattern is written as a parenthetical group with a question mark as the first character. The notation for the look-arounds is fairly mnemonic, but there are some other, experimental patterns that are similar, so it is important to get all the characters in the right order.
(?=pattern)
is a positive look-ahead assertion
(?!pattern)
is a negative look-ahead assertion
(?<=pattern)
is a positive look-behind assertion
(?<!pattern)
is a negative look-behind assertion
EXAMPLES
Look-Ahead:
echo $mytmp2
uvw_abc uvw_def uvw_acb
Positive:
echo $mytmp2 | perl -pe 's/uvw_(?=(abc|def))/xyz_/g'
xyz_abc xyz_def uvw_acb
Description: replace every occurance of uvw_ with xyz_ where uvw_ followed by abc or def
Negative:
echo $mytmp2 | perl -pe 's/uvw_(?!(abc|def))/xyz_/g'
uvw_abc uvw_def xyz_acb
Description: replace every occurance of uvw_ with xyz_ where uvw_ is not followed by abc or def
Look-Behind:
echo $mytmp
abc_uvw def_uvw acb_uvw
Positive:
echo $mytmp | perl -pe 's/(?<=(abc|def))_uvw/_xyz/g'
abc_xyz def_xyz acb_uvw
Description: replace every occurance of _uvw with _xyz where _uvw is preceeded by abc or def
Negative:
echo $mytmp | perl -pe 's/(?<!(abc|def))_uvw/_xyz/g'
abc_uvw def_uvw acb_xyz
Description: replace every occurance of _uvw with _xyz where _uvw is not preceeded by abc or def
Read more ...

Split a string by anything other than spaces

19.5.14
Have you ever tried this. Dont go on writing big perl code for this. Here's a simple solution for this.
my @arr=split /\S+/,$str;
where
$str is your string
\s obviously matches a white space character. But \S matches a non white space character.
So \S+ matches atleast one non white space character.
Read more ...

Inserting lines in a file using Perl

14.5.14
I have input file that look's like :
cellIdentity="42901"
cellIdentity="42902"
cellIdentity="42903"
cellIdentity="52904"
Numbers inside the quotes can be anything. The output needed is original line followed by the copy of same line except the last digit of the number should be a series of 5,6,7. So the output should look like below:
cellIdentity="42901"
cellIdentity="42905"
cellIdentity="42902"
cellIdentity="42906"
cellIdentity="42903"
cellIdentity="42907"
cellIdentity="52904"
cellIdentity="52905"
Below is the Perl command that I have written.
perl -pe 'BEGIN{$n=4}$n++;
          $n>7?$n=5:$n;
          $a=$_;print $a;
          s/(\d).$/$n."\""/ge'
Read more ...

Iterating a string through each character

29.4.14
In general if there is a need for us to iterate though a string character by character, then we normally split the string using a statement like:
@chars=split("",$var);
Now after the array is created we iterate through that array.But an easy way of doing this in Perl without creating an array is :
while ($var =~ /(.)/sg) {
   my $char = $1;
   print $char."\n"
}
Below is the explanation for the same:
$var =~ /(.)/sg
Match any character though out the string and round braces "()" captures the matched character.
/s 
Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
/g
Match all occurrences of the regexp throughout the line instead of only the first occurrence.
Read more ...

Butterfly in Perl command line "}{"

26.4.14
I recently came to know about this and I thought its worth sharing. I will try to keep it very simple.
Lets say I have a file as below:
1
2
3
4
5
I need to join all the lines with a pipe so that my output should look like below:
1|2|3|4|5
Normally I use the below command to achieve the same:
perl -lne 'push @a,$_;END{print join "|",@a}' File
Now, here's another option for you below:
perl -lne 'push @a,$_;}{ print join "|",@a' File
The change here is:
}{
This is called butterfly option in perl. Basically it closes while loop imposed by -n switch, and what follows }{ is block executed after while loop.
Read more ...

Searching multiple strings in multiple files in a directory

11.3.14
I have a list of strings in a file separated by a new line.
for example:
input.txt
temp1
temp2
temp3
Now I have a directory with multiple dat files like:
>ls -1 *.dat
one.dat
two.dat
three.dat
And many more dat like like above with random names. Now I want to search for all the strings in input.txt in all the dat files present in  directory(let's say current working directory).This is what I came up with:
create a perl script given below and name it as anything you wish(I named here as temp.pl).place the file input.txt in the current working directory.
#!/usr/bin/perl -w

open (INP,"input.txt") or die $!;
while(<INP>)
{
my $cmd="find . -name \"*.dat\"|xargs grep -w -i $_";
my $output=`$cmd`;
 if($output!~/^\s*$/)
 {
 print $_."\n";
 print "------------------\n";
 print $output."\n";
 print "-------------------\n";
 }
}
exit;
Run this script as :
>./temp.pl
This solved my need.I hope it solves yours too :)
Read more ...

Finding Memory leaks in C++ on solaris Unix

29.11.13
There are lot of ways where one can identify memory leaks and lots of blogs and materieals
are available to help a designer to identify the same.

Some say using dbx and others say use mdb or gdb. I agree that all those tools will be useful, but it takes a lot of effort for understanding as well as putting it to work takes some time.

I have thought to break the legacy rules and do something new with the knowledge that
I have on various tools of solaris. So I thought of giving it a try to use tools like
dtrace, perl, ps command etc along with c++.

Putting this toll to action is an effort less job.

I am happy to have created a tool which I think would be very use for any solaris unix programmer.
This tool will give the list of all memory leaks of a process running in the background.

It uses dtrace to identify the memory allocations,
It uses the ps command to identify the memory used by a process.
It uses perl to filter and generate reports of meory leaks.

Main functionalities of this tool are:

  • Generate a report which contains actual physical memory used by the process, Virtual memory used, % of cpu used and %memory used.
  • Generate a report which contains all the memory leaks of the process due to inefficient coding techniques.
  • Generate a separate report for each and every process.

I have tested this on:
SunOS 5.10 Generic_147440-27 sun4u sparc SUNW,SPARC-Enterprise

Thanks to Frederic, I have taken part of his logic written in his blog:

You can download the zip file from the link here:

The file name is cpuleaks.gz.
Unzip this file using gzip -d cpuleaks.gz and you can place it on any server and give the execute permissions and run this process as a root user(dtrace can be run only using root access on my server. If dtrace can be used by a user other than root, well and fine).

Execution of the process should be as below:
 

cpuleaks <comma separated pid list> -st <time interval for mem usage>
 
ex: cpuleaks 1234,2341 -st 2

This above command will give 4 files as output after either the processes 1234,2341 complete or if you press CTRL +C(an interrupt signal).
1234.cpumem
1234.leaks
2341.cpumem
2341.leaks

 
where
  • .leaks report will have the list of all memory leaks
  • .cpumem report will have the memory usage of the process for every interval passed as an argument to the cpuleaks binary.

Read more ...

Search Multiple strings and print them in the order of search strings

5.9.13
One of my colleagues asked this query to me. And I liked the query very much.
So I started thinking about that.
For example i have a below file:
1 name1
2 name2
3 name3
4 name1
5 name4
6 name2
7 name1
Now i want to find strings in the file , lets say in the order name1, name2, name3, name4.
And i also want the output to be in the same order in the way i searched as below:
1 name1
4 name1
7 name1
2 name2
6 name2
3 name3
5 name4
Obviously we cannot do this with a single grep command without any temporary files being created.
So I thought perl would be a better option for this. So I came up with a simple perl solution.
perl -lne '/name1/?push @a,$_:
          (/name2/?push @b,$_:
          (/name3/?push @c,$_:
           /name4/?push @d,$_:next));
          END{print join "\n",@a,@b,@c,@d}' your_file

And this worked like a a charm.
Read more ...

Some Useful Perl one liner's

2.9.13
We'll start with a brief refresher on the basics of perl one-liners before we begin. The core of any perl one-liner is the -e switch, which lets you pass a snippet of code on the command-line:perl -e 'print "hi\n"' prints "hi" to the console.
The second standard trick to perl one-liners are the -n and -p flags. Both of these make perl put an implicit loop around your program, running it once for each line of input, with the line in the $_ variable. -p also adds an implicit print at the end of each iteration.
Both of these use perl's special "ARGV" magic file handle internally. What this means is that if there are any files listed on the command-line after your -e, perl will loop over the contents of the files, one at a time. If there aren't any, it will fall back to looping over standard input.
perl -ne 'print if /foo/'
acts a lot like grep foo, and
perl -pe 's/foo/bar/'
replaces foo with bar
Most of the rest of these tricks assume you're using either -n or -p, so I won't mention it every time.

The top 10 one-liner tricks

One Liner-1:   -l
Smart newline processing. Normally, perl hands you entire lines, including a trailing newline. With -l, it will strip the trailing newline off of any lines read, and automatically add a newline to anything you print (including via -p).
Suppose I wanted to strip trailing whitespace from a file. I might naïvely try something like
perl -pe 's/\s*$//'
The problem, however, is that the line ends with "\n", which is whitespace, and so that snippet will also remove all newlines from my file! -l solves the problem, by pulling off the newline before handing my script the line, and then tacking a new one on afterwards:
perl -lpe 's/\s*$//'

One Liner-2: -0
Occasionally, it's useful to run a script over an entire file, or over larger chunks at once. -0 makes -n and -p feed you chunks split on NULL bytes instead of newlines. This is often useful for, e.g. processing the output of find -print0. Furthermore, perl -0777 makes perl not do any splitting, and pass entire files to your script in $_.
find . -name '*~' -print0 | perl -0ne unlink
Could be used to delete all ~-files in a directory tree, without having to remember how xargs works.
One Liner-3: -i
-i tells perl to operate on files in-place. If you use -n or -p with -i, and you pass perl filenames on the command-line, perl will run your script on those files, and then replace their contents with the output. -i optionally accepts an backup suffix as argument; Perl will write backup copies of edited files to names with that suffix added.
perl -i.bak -ne 'print unless /^#/' script.sh
Would strip all whole-line commands from script.sh, but leave a copy of the original in script.sh.bak.
One Liner-4: The .. operator
Perl's .. operator is a stateful operator -- it remembers state between evaluations. As long as its left operand is false, it returns false; Once the left hand returns true, it starts evaluating the right-hand operand until that becomes true, at which point, on the next iteration it resets to false and starts testing the other operand again.
What does that mean in practice? It's a range operator: It can be easily used to act on a range of lines in a file. For instance, I can extract all GPG public keys from a file using:
perl -ne 
                       'print 
                        if /-----BEGIN PGP PUBLIC KEY BLOCK-----/
                        ..
                        /-----END PGP PUBLIC KEY BLOCK-----/
                       ' FILE
One Liner-5: -a
-a turns on autosplit mode – perl will automatically split input lines on whitespace into the @F array. If you ever run into any advice that accidentally escaped from 1980 telling you to use awk because it automatically splits lines into fields, this is how you use perl to do the same thing without learning another, even worse, language.
As an example, you could print a list of files along with their link counts using
ls -l | perl -lane 'print "$F[7] $F[1]"'

One Liner-6: -F
-F is used in conjunction with -a, to choose the delimiter on which to split lines. To print every user in /etc/passwd (which is colon-separated with the user in the first column), we could do:
perl -F: -lane 'print $F[0]' /etc/passwd
One Liner-7: \K
\K is undoubtedly my favorite little-known-feature of Perl regular expressions. If \K appears in a regex, it causes the regex matcher to drop everything before that point from the internal record of "Which string did this regex match?". This is most useful in conjunction with s///, where it gives you a simple way to match a long expression, but only replace a suffix of it.
Suppose I want to replace the From: field in an email. We could write something like
perl -lape 's/(^From:).*/$1 Nelson Elhage <nelhage\@ksplice.com>/'
But having to parenthesize the right bit and include the $1 is annoying and error-prone. We can simplify the regex by using \K to tell perl we won't want to replace the start of the match:
perl -lape 's/^From:\K.*/ Nelson Elhage <nelhage\@ksplice.com>/'
One Liner-8: $ENV{}
When you're writing a one-liner using -e in the shell, you generally want to quote it with ', so that dollar signs inside the one-liner aren't expanded by the shell. But that makes it annoying to use a ' inside your one-liner, since you can't escape a single quote inside of single quotes, in the shell.
Let's suppose we wanted to print the username of anyone in /etc/passwd whose name included an apostrophe. One option would be to use a standard shell-quoting trick to include the ':
perl -F: -lane 'print $F[0] if $F[4] =~ /'"'"'/' /etc/passwd
But counting apostrophes and backslashes gets old fast. A better option, in my opinion, is to use the environment to pass the regex into perl, which lets you dodge a layer of parsing entirely:
env re="'" perl -F: -lane 'print $F[0] if $F[4] =~ /$ENV{re}/' /etc/passwd
We use the env command to place the regex in a variable called re, which we can then refer to from the perl script through the %ENV hash. This way is slightly longer, but I find the savings in counting backslashes or quotes to be worth it, especially if you need to end up embedding strings with more than a single metacharacter.
One Liner-9: BEGIN and END
BEGIN { ... } and END { ... } let you put code that gets run entirely before or after the loop over the lines.
For example, I could sum the values in the second column of a CSV file using:
perl -F, -lane '$t += $F[1]; END { print $t }'
One Liner-10: -MRegexp::Common
Using -M on the command line tells perl to load the given module before running your code. There are thousands of modules available on CPAN, numerous of them potentially useful in one-liners, but one of my favorite for one-liner use is Regexp::Common, which, as its name suggests, contains regular expressions to match numerous commonly-used pieces of data.
The full set of regexes available in Regexp::Common is available in its documentation, but here's an example of where I might use it:
Neither the ifconfig nor the ip tool that is supposed to replace it provide, as far as I know, an easy way of extracting information for use by scripts. The ifdata program provides such an interface, but isn't installed everywhere. Using perl and Regexp::Common, however, we can do a pretty decent job of extracing an IP from ip's output:
ip address list eth0 | \
  perl -MRegexp::Common -lne 
  'print $1 if /($RE{net}{IPv4})/'
So, those are my favorite tricks, but I always love learning more. What tricks have you found or invented for messing with perl on the command-line? What's the most egregious perl "one-liner" you've wielded, continuing to tack on statements well after the point where you should have dropped your code into a real script?
Read more ...

Removal of unnecessary white spaces

20.6.13
I have a text file which looks something like this:
1, 2, 3, "Test, Hello"
4, 5, 6, "Well, Hi There!"
You can see that there is a space after comma(,) which i feel its not needed for me. But I need that space in between the string in last field. So the output I am expecting is :
1,2,3,"Test, Hello"
4,5,6,"Well, Hi There!"
I used the below command for doing the same:
perl -lne 'if(/(.*?\")(.*)/)
          {$b=$2;$a=$1;$a=~s/,[\s]/,/g;
           print "$a$b"}' your_file
explanation:
(.*?\")(.*)
capture thye string till " in $1 and and remaining in $2.
$2 should be obviously unchanged. so store it in $b
$1 should be changed . so store it in $a and replace all ", " with ",".

Same thing can be achieved in awk as below:
nawk -F'\"' -v OFS='\"' '{gsub(/ /,"",$1)}1' your_file
Read more ...

Replace last occurrence of a string

17.6.13
How do you replace  instances of a string only in the line where they last occured.
for example if you have a file like below:
a
b
c
a
d
The output should look like below:
a
b
c
x
d
Below is the perl command that will perform the same:
perl -e '@a=reverse<>;
         END{
            for(@a){
             if(/a/){s/a/c/;last}
             }
            print reverse @a}' your_file
Read more ...

Range of lines from a file

5.6.13
Below is the command to extract certain lines in Perl based on line numbers.
for eg: If I need to extract lines 5 -10 then i need to do:
perl -lne 'print if($.>=5&&$.<=10) your_file

Also this can done in awk  in a much simpler way:
awk 'NR>=5&&NR<=10 your_file

Classic way of doing this is :
head -10 your_file | tail -6
Read more ...

Pattern Match Using Perl-Append file contents

29.5.13
If i want to append all the lines of one file  to a line in second file only if there is a pattern match:

For example:
first_file.txt:
111111
1111
11
1

second_file.txt:

122221
2222
22
2

pattern to match:

2222
output:

122221
111111
1111
11
1
2222
111111
1111
11
1
22
2

Below is the perl command to achieve this:

perl -lne 'BEGIN{open(A,"first_file.txt");@f=<A>}
           print;if(/2222/){print @f}' 
           second_file.txt
Read more ...

Retrieving a List of files from Remote Server

17.5.13
Below is the perl script for retrieving the list of files from Remote server:

    #!/usr/bin/perl
    use strict;
    use warnings;
    use Net::FTP;
    my $ftp_site     = 'localhost';
    my $ftp_dir      = '/home/simon/software';
    my $ftp_user     = 'a_login_name';
    my $ftp_password = 'a_password';
    my $glob         = 'ex*';
    my @remote_files;
    my $ftp = Net::FTP-&gt;new($ftp_site)
        or die "Could not connect to $ftp_site: $!";
    $ftp-&gt;login($ftp_user, $ftp_password)
        or die "Could not login to $ftp_site with user $ftp_user: $!";
    $ftp-&gt;cwd($ftp_dir)
        or die "Could not change remote working " .
                 "directory to $ftp_dir on $ftp_site";
    @remote_files = $ftp-&gt;ls($glob);
    foreach my $file (@remote_files) {
        print "Saw: $file\n";
    }
    $ftp-&gt;quit();
Read more ...

Semicolon and Perl

13.5.13
I guess most of the people have noticed about this and have just ignored it. Or they must not have noticed it at all. Do you know this fact about perl:
If you write this :
for(@arr) { print $_ }
Its not an error in Perl. According to Perldoc:
Every simple statement must be terminated with a semicolon, unless it is the final statement in a block, in which case the semicolon is optional.
Read more ...

Consequetive pattern match in Perl

7.5.13
Make a pattern that will match three consecutive copies of whatever is currently contained in $what. That is, if
$what="fred" , your pattern should match "fredfredfred".
If
$what is "fred|barney", your pattern should match
"fredfredbarney" and
"barneyfredfred" and
"barneybarneybarney" and  many other variations.
Well, This is quite tricky. But the magic match to do over here is :
/($str){3}/

 Below is the example code:
#!/usr/bin/perl
use warnings;
use strict;
# notice the example has the `|`. Meaning
# match "fred" or "barney" 3 times.
my $str = 'fred|barney';
my @tests = qw(fred fredfredfred barney barneybarneybarny barneyfredbarney);
for my $test (@tests) {
if( $test =~ /^($str){3}$/ ) {
print "$test matched!\n";
} else {
print "$test did not match!\n";
}
}

Below is the execution:
> ./temp9.pl
fred did not match!
fredfredfred matched!
barney did not match!
barneybarneybarny did not match!
barneyfredbarney matched!
Read more ...

Sorting a file with related lines

17.4.13
I have a file in which first row contains the number and second row contains a statement associated with it and so on like the below example:
12
stat1 
18
stat2
15
stat3
I need to print the output like sorting in descending order of numbers along with the statement related to it. So that the output should look like:
Time = 18
Stat = stat2
Time = 15
Stat = stat3
Time = 12
Stat = stat1
Below is the perl command that I have written which does the needed thing.
perl -lne 'if(/^\d+/){$k=$_}
           else{$x{$k}=$_}
           END{ 
               for(sort {$b<=>$a} keys %x){ 
               print "time=$_\nStat=$x{$_}";}
           }' file
time=18
Stat=stat2
time=15
Stat=stat3
time=12
Stat=stat1
Note:Precondition is time will come first and then the statement associated with it. 
Below is the explanation.

1. So store in a variable $k if the line start with a decimal point number.
2. Else case it is nothing but the statement. So store this line as a value to the previous stored $k.
   After parsing complete file.sort the keys of the hash in descending order by
3. sort  $b<=>$a and then print each and ever key and value in needed format.
Also please note:
1. sort $a<=>$b will sort the numbers in ascending order.
2. sort $b<=>$a will sort the numbers in descending order
Read more ...

Converting single column to multiple columns

26.3.13
I have a file which contains all the entries in a single column like:
0
SYSCATSPACE
16384
13432
2948
1
1
TEMPSPACE1
1
1
applicable
1
2
USERSPACE1
4096
1888
2176
1
If I want to convert this in a tabular form of 3*6:
0 SYSCATSPACE 16384 13432 2948       1
1 TEMPSPACE1  1     1     applicable 1
2 USERSPACE1  4096  1888  2176       1
Below is the command that I will use:
perl -lne '$a.="$_ ";
           if($.%6==0){push(@x,$a);$a=""}
           END{for(@x){print $_}}' your_file
output would be :
> perl -lne '$a.="$_ ";if($.%6==0){push(@x,$a);$a=""}END{for(@x){print $_}}' temp
0 SYSCATSPACE 16384 13432 2948 1 
1 TEMPSPACE1 1 1 applicable 1 
2 USERSPACE1 4096 1888 2176 1
Read more ...

Capture all the letters which are repeated more than once

12.3.13
Recently i came across a need where i need to fetch all the letters in a line which are repeated more than once in a line contiguously.

for example :

lets say there a word "foo bar". I want the letter 'o' in this.
lets say there a word "foo baaar". I want the letters 'o','a' in this.
lets say there a word "foo baaar foo". I want the letters 'o','a' in this again.

Below is code which worked for me:

perl -lne 'push @a,/(\w)\1+/g;END{print @a}' your_file

The above command will scan complete file and prints just the letters that are repeated continguously in the file.
Read more ...

Summing up column values based upon ranges in another file

22.2.13
File 1 has ranges 3-9, 2-6 etc
    3 9
    2 6
    12 20

File2 has values: column 1 indicates the range and column 2 has values.
    1 4
    2 4
    3 5
    4 4
    5 4
    6 1
    7 1
    8 1
    9 4

I would like to calculate the sum of values (file2, column2) for ranges in file1). Eg: If range is 3-9, then sum of values will be 5+4+4+1+1+1+4 = 20

Below is the script for doing the same:

use strict;
use warnings;
use feature qw( say );

use List::Util qw( sum );
my $file1 = 'file1.txt';
my $file2 = 'file2.txt';

my @file2;
{   
   open(my $fh, '<', $file2)
      or die "Can't open $file2: $!\n";
   while (<$fh>) {
      my ($k, $v) = split;
      $file2[$k] = $v;
   }
}

{   
   open(my $fh, '<', $file1)
      or die "Can't open $file1: $!\n";
   while (<$fh>) {
      my ($start, $end) = split;
      say sum grep defined, @file2[$start .. $end];
   }
}
Read more ...