[vdr] xmltv2vdr speedup and modification

jori.hamalainen at teliasonera.com jori.hamalainen at teliasonera.com
Mon Feb 12 17:11:33 CET 2007


> I recently worked on xmltv2vdr.pl (version 1.0.6) and checked 
> why it was so slow on my mighty Celeron 233. So I modified it
> a little to avoid reading all the xmltv file for each channel
> defined in the channels.conf. The result is good : I can process 
> my 5Mo xmltv file in less than 10 minutes whereas it took at least
> 1 hour with vanilla 1.0.6 release.

Something more what you can do (just by looking source you provided)..

--

Caching of &xmltime2vdr, like

    return $timecache{$xmltime}->{$skew} if defined $timecache{$xmltime}->{$skew}
    $secs = &Date::Manip::UnixDate($xmltime, "%s") + $skew*60;
    $timecache{$xmltime}->{$skew} = $secs
    return secs;

But it depends on how much this function is called.. But hash lookup is probably
faster than running UnixDate from library. So it is a memory tradeoff.

--

I see that there is still some basic Perl based optimizations for this.

For example there is browsing through @xmllines array, and every iteration
you recompile *ALL* regexp's. That is as many times as @xmllines has lines.
And if one recompile takes 1ms -> you waste time @xmllines * 1ms just for
compiling and not doing anything usefull.

Perl switch "o" is recompile once flag, use that everywhere where it is 
possible. Variable is not a problem unless variable changes in every iteration.

# New XML Program - doesn't handle split programs yet
if ( ($xmlline =~ /\<programme/o ) && ( $xmlline !~ /clumpidx=\"1\/2\"/o ) && ( $chanevent == 0 ) )
{
  ( $null, $xmlst, $null, $xmlet, @null ) = split(/\"/, $xmlline);
  ( $chan ) = ( $xmlline =~ m/channel\=\"(.*?)\"/o );
...

And all the lines in subroutine &xmltvtranslate should be with o -flag.
    $line=~s/ und uuml;/ü/go;
    $line=~s/ und auml;/ä/go; 
    $line=~s/ und ouml;/ö/go;

and you are running twice same for UAO UML's, with and without spaces. You
don't need to run it with spaces if you are running without spaces.
    $line=~s/ und auml;/ä/go;  <- this is unnecessary because later will match.  
    $line=~s/und auml;/ä/go;

--

As there is many times $xmlline is matched with regexps etc. You should experiment
with "study $xmlline;" after chomp $xmlline. Study makes internal search tables
for string matches. So see which way the code is faster, with study or without 
study. Use Unix shell's time-command for this. For extra boost with study you
probably would need to take away subroutine "xmltvtranslate" as for it $xmlline
is copied to subroutine's parameter space, and what is matched. And study would
not affect it. So instead of calling "$xmlline=xmltvtranslate($xmlline);" cut&paste
subroutines code here, and use $xmlline instead of $line.

    foreach $xmlline (@xmllines)
    {
        chomp $xmlline;
        study $xmlline;
        $xmlline=~s/und uuml;/ü/go;
        $xmlline...

This isn't pretty but could probably help a bit. You save time for @xmllines times calling
subroutine, and study would help you a lot as you use the same string all the time.

--

For constant string you could use ' ' instead of " ". " causes string to be
evaluated for variables

if ( $chanCur eq "" ) --> if ( $chanCur eq '' )

But this would be very minor effect..

--

Split is heavy operation because of creating arrays, but you can limit it.

( $null, $xmlst, $null, $xmlet, @null ) = split(/\"/, $xmlline);

=> ( $null, $xmlst, $null, $xmlet, $null ) = split(/\"/, $xmlline, 5);

or even using regexp for this. I don't know input line for this, but if it is
foo,"something","something",...

($xmlst,$xmlet) = $xmlline =~ m:\"(.*?)\",\"(.*?)\":o;

or probably combine 2 regexp to a single

($xmlst,$xmlet,$channel) = $xmlline =~ m:\"(.*?)\",\"(.*?)\".*?channel=\"(.*?)\":o;

--

Again something very weird:

        if ( ($xmlline =~ /\<title/ ) )
        {
            #print $xmlline . "\n";
            ( $null, $tmp ) = split(/\>/, $xmlline);
            ( $vdrtitle, @null ) = split(/\</, $tmp);
            
            # Send VDR Title
            
            SVDRPsend("T $vdrtitle");
        }

Why not?

	SVDRPsend("T $1") if $xmlline =~ m:\<title\>(.*?)\</title\>:o;

Same for XML subtitle
	SVDRPsend("T $1") if $xmlline =~ m:\<sub-title\>(.*?)\</sub-title\>:o;

Generally
       if ( ($xmlline =~ /\<desc/ ) && ( $desccount == $dc ))
        {
            ( $null, $tmp ) = split(/\>/, $xmlline);
            ( $vdrdesc, @null ) = split(/\</, $tmp);

this is not a clever way to parse XML data in Perl. Just us regexp's which
match strings with Boyer-Moore algorithm (same as Unix grep) and compile once.

--

Some logical errors

if ( ($xmlline =~ /\<programme/ ) && ( $xmlline !~ /clumpidx=\"1\/2\"/ ) && ( $chanevent == 0 ) )

=>         if ( ( $chanevent == 0 ) && ($xmlline =~ /\<programme/ ) && ( $xmlline !~ /clumpidx=\"1\/2\"/ ) )

so program execution can skip if $chanevent != 0 much faster. 
So Regexp would not be ran. This is normal short circuit operation.


Then
            elsif ( $chanCur ne $chan )
            {
                SVDRPsend("c");
                SVDRPsend(".");
                SVDRPreceive(250);

I think programmer wanted outout of "." -command, and see if it's status is 250?
But now I think code is checking status of "c" -command? As socket is not read
between calls, and there should be data in buffer for c-command. But I cannot be
sure as I don't know SVDRP command that well.

> c
< 250 foo
< 250-foo
> .
< 354-not ok

It could still succeed if from socket buffer "250-" is read. Also the 2 substr calls
in SVDRPreceive is a bit weird, but I am uncertain if regexp would help that. At least
change "-" to '-'.

------

There is a LOT to improve, but if you do these, you Celeron will fly. I hope you'll get
to minute scale (or even better). Look with "time xmltv2vdr" to see how much process 
time is used for user code and how much for kernel code. And to see if optimizations help.

Have fun.. :)

Best regards,
Jori

Ps. complete untested code, there might be some problems as I didn't run this code even once.


#!/usr/bin/perl

# xmltv2vdr.pl
#
# Converts data from an xmltv output file to VDR - tested with 1.2.6
#
# The TCP SVDRSend and Receive functions have been used from the getskyepg.pl
# Plugin for VDR.
#
# This script requires: -
#
# The PERL module date::manip (required for xmltv anyway)
#
# You will also need xmltv installed to get the channel information:
# http://sourceforge.net/projects/xmltv
#
# This software is released under the GNU GPL
#
# See the README file for copyright information and how to reach the author.
# 20070210 / SL / Little optimization to make it work on old hardware, 
#                 Added support for sub title.
# 20070212 / JH / Perl and logistic optimization

# $Id: xmltv2vdr.pl 1.0.6 2004/04/19 20:01:04 psr Exp $

use Getopt::Std;
use Time::Local;
use Date::Manip; # Instead of using this, for less dependency could use posix mktime

# Convert XMLTV time format (YYYYMMDDmmss ZZZ) into VDR (secs since epoch)

sub xmltime2vdr
{
    my $xmltime=shift;
    my $skew=shift;
    if (defined $timecache{$xmltime}->{$skew}) {
        return $timecache{$xmltime}->{$skew};
    } else {
        $secs = &Date::Manip::UnixDate($xmltime, "%s") + $skew*60;
        $timecache{$xmltime}->{$skew} = $secs;
        return $secs;
    }
}

# Send info over SVDRP (thanks to Sky plugin)

sub SVDRPsend
{
    my $s = shift;
    if ($sim == 0)
    {
        print SOCK "$s\r\n";
    }
    else 
    {
        print "$s\r\n";
    } 
}

# Recv info over SVDRP (thanks to Sky plugin)

sub SVDRPreceive
{
    my $expect = shift | 0;
    
    if ($sim == 1)
    { return 0; }
    
    my @a = ();
    while (<SOCK>) {
        s/\s*$//; # 'chomp' wouldn't work with "\r\n"
        push(@a, $_);
        if (substr($_, 3, 1) ne "-") {
            my $code = substr($_, 0, 3);
            die("expected SVDRP code $expect, but received $code") if ($code != $expect);
            last;
        }
    }
    return @a;
}

# Process info from XMLTV file / channels.conf and send via SVDRP to VDR

sub ReadChannels {
    open(CHANNELS, "$channelsfile") || die "cannot open channels.conf file";

    while ( $chanline=<CHANNELS> )
    {
        # Split a Chan Line
        
        chomp $chanline;
        
        ($channel_name, $freq, $param, $source, $srate, $vpid, $apid, $tpid, $ca, $sid, $nid, $tid, $rid, $xmltv_channel_name) = split(/:/, $chanline);
        
        if ( $source eq 'T' )
        { 
            $epgfreq=substr($freq, 0, 3);
        }
        else
        { 
            $epgfreq=$freq;
        }
        
        if (!$xmltv_channel_name) {
            if(!$channel_name) {
                $chanline =~ m/:(.*$)/;
                if ($verbose == 1 ) { warn("Ignoring header: $1\n"); }
            } else {
                if ($verbose == 1 ) { warn("Ignoring channel: $channel_name, no xmltv info\n"); } 
            }
            next;
        }
        @channels = split ( /,/, $xmltv_channel_name);
        foreach $myChannel ( @channels )
        {
        	$chanName{$myChannel} = $channel_name;
        	# Save the Channel Entry
        	if ($nid>0) 
        	{
                $chanId{$myChannel} = "C $source-$nid-$tid-$sid $channel_name";
        	}
        	else 
        	{
                $chanId{$myChannel} = "C $source-$nid-$epgfreq-$sid $channel_name";
        	}
        }
    }
    close CHANNELS;
}

sub ProcessEpg
{
    my %chanId;
    my %chanName;
    my %chanMissing;
    $desccount = shift; # Verbosity
    
    # Set XML parsing variables    
    $chanevent = 0;
    $dc = 0;
    $founddesc=0;
    $chanCur = "";
    $nbEventSent = 0;
    $atLeastOneEpg = 0;
    
    # Find XML events
    
    foreach $xmlline (@xmllines)
    {
        chomp $xmlline;
        study $xmlline; # measure program performance with and without this
        if ($xmlline =~ m:;:o) { # run the rest only if ; is found on line, study should help finding ; a lot
          $line=~s/ und szlig;/ß/go; 
          $line=~s/ und amp;/\&/go; 
          $line=~s/ und middot;/·/go; 
          $line=~s/ und eacute;/é/go;
          $line=~s/ und aacute;/á/go;
          $line=~s/ und deg;/°/go;
          $line=~s/ und ordm;/º/go;
          $line=~s/ und ecirc;/ê/go;
          $line=~s/und curren;/¤/go;
          $line=~s/und Ccedil;/Ç/go;
          $line=~s/ und ocirc;/ô/go;
          $line=~s/ und egrave;/è/go;
          $line=~s/ und agrave;/à/go;
          $line=~s/und quot;/"/go;
          $line=~s/und Ouml;/Ö/go;
          $line=~s/und Uuml;/Ü/go;
          $line=~s/und Auml;/Ä/go;
          $line=~s/und ouml;/ö/go;
          $line=~s/und uuml;/ü/go;
          $line=~s/und auml;/ä/go;
          $line=~s/&amp;/&/go;
        }
        
        # New XML Program - doesn't handle split programs yet
        if ( ( $chanevent == 0 ) && ($xmlline =~ /\<programme/o ) && ( $xmlline !~ /clumpidx=\"1\/2\"/o ) )
        {
		# Check this for correct syntax, add string source match if necessary
		($xmlst,$xmlet,$chan) = $xmlline =~ m:\"(.*?)\",\"(.*?)\".*,channel=\"(.*?)\":o;
            # This should match "...","...",(greedy match),channel="..." so first two quoted parameters..

            if ( !exists ($chanId{$chan}) )
            {
                if ( !exists ($chanMissing{$chan}) )
                {
                    if ($verbose == 1 ) { warn("$chan unknown in channels.conf\n"); }
                    $chanMissing{$chan} = 1;
                }
                next;
            }
            if ( $chanCur eq '' )
            {
                # Send VDR PUT EPG
                SVDRPsend('PUTE');
                SVDRPreceive(354);
                SVDRPsend($chanId{$chan});
                $atLeastOneEpg = 1;
            }
            elsif ( $chanCur ne $chan )
            {
                SVDRPsend('c');
                SVDRPsend('.');
                SVDRPreceive(250);
                if ($verbose == 1 ) { warn("$nbEventSent event(s) sent for $chanName{$chanCur}\n"); }
                $nbEventSent = 0;
                # Send VDR PUT EPG
                SVDRPsend('PUTE');
                SVDRPreceive(354);
                SVDRPsend($chanId{$chan});
            }
            $chanCur = $chan;
            $nbEventSent++;
            $chanevent = 1;
            $vdrst = &xmltime2vdr($xmlst, $adjust);
            $vdret = &xmltime2vdr($xmlet, $adjust);
            $vdrdur = $vdret - $vdrst;
            $vdrid = $vdrst / 60 % 0xFFFF;
            
            # Send VDR Event
            
            SVDRPsend("E $vdrid $vdrst $vdrdur 0");
        }
        
        next if !$chanevent;
        
        # XML Program Title
        SVDRPsend("T $1") if $xmlline =~ m:\<title\>(.*?)\</title\>:o;
        
        # XML Program Sub Title
        SVDRPsend("S $1") if $chanevent==1 && $xmlline =~ m:\<sub-title\>(.*?)\</sub-title\>:o;
        
        # XML Program description at required verbosity
        if ( $xmlline =~ m:\<desc\>(.*?)\</desc\>:o )
        {
 
          if ($desccount == $dc) {
           # Send VDR Description
            SVDRPsend("D $1");
            
            # Send VDR end of event
            
            SVDRPsend('e');
            $founddesc=1;
          }
          $dc++;
        }
        
        # No Description found at required verbosity
        
        if ( ($xmlline =~ /\<\/programme/o ) )
        {
            if ( $founddesc == 0 )
            { 
                SVDRPsend('D Info Not Available');
                SVDRPsend('e');
            }
            $chanevent=0 ;
            $dc=0 ;
            $founddesc=0 ;
        }
    }
    
    if ( $atLeastOneEpg )
    {
        # Send End of Channel, and end of EPG data
        SVDRPsend('c');
        SVDRPsend('.');
        SVDRPreceive(250);
        if ($verbose == 1 ) { warn("$nbEventSent event(s) sent for $chanName{$chanCur}\n"); }
    }
}

#---------------------------------------------------------------------------
# main

use Socket;

$Usage = qq{
    Usage: $0 [options] -c <channels.conf file> -x <xmltv datafile> 
    
    Options:
    -a (+,-) mins  	Adjust the time from xmltv that is fed
    into VDR (in minutes) (default: 0)	 
    -c channels.conf	File containing modified channels.conf info
    -d hostname            destination hostname (default: localhost)
    -h			Show help text
    -l description length  Verbosity of EPG descriptions to use
    (0-2, 0: more verbose, default: 0)
    -p port                SVDRP port number (default: 2001)
    -s			Simulation Mode (Print info to stdout)
    -t timeout             The time this program has to give all info to 
    VDR (default: 300s) 
    -v             	Show warning messages
    -x xmltv output 	File containing xmltv data
    
};

$sim=0;


die $Usage if (!getopts('a:d:p:l:t:x:c:vhs') || $opt_h);

$verbose = 1 if $opt_v;
$sim = 1 if $opt_s;
$adjust = $opt_a || 0;
$Dest   = $opt_d || "localhost";
$Port   = $opt_p || 2001;
$descv   = $opt_l || 0;
$Timeout = $opt_t || 300; # max. seconds to wait for response
$xmltvfile = $opt_x  || die "$Usage Need to specify an XMLTV file";
$channelsfile = $opt_c  || die "$Usage Need to specify a channels.conf file";

# Check description value

if ( ( $descv < 0 ) || ( $descv > 2 ) )
{
    die "$Usage Description out of range. Try 0 - 2";
}

# Read all the XMLTV stuff into memory - quicker parsing
# This still reads line by line, slurping could be faster but let it be for now
open(XMLTV, "$xmltvfile") || die "cannot open xmltv file";
@xmllines=<XMLTV>;
close(XMLTV);

# Now open the VDR channel file
&ReadChannels();

# Connect to SVDRP socket (thanks to Sky plugin coders)

if ( $sim == 0 )  
{
    $SIG{ALRM} = sub { die("timeout"); };
    alarm($Timeout);
    
    $iaddr = inet_aton($Dest)                   || die("no host: $Dest");
    $paddr = sockaddr_in($Port, $iaddr);
    
    $proto = getprotobyname('tcp');
    socket(SOCK, PF_INET, SOCK_STREAM, $proto)  || die("socket: $!");
    connect(SOCK, $paddr)                       || die("connect: $!");
    select((select(SOCK), $| = 1)[0]);
}

# Look for initial banner
SVDRPreceive(220);
SVDRPsend("CLRE");
SVDRPreceive(250);

# Do the EPG stuff
ProcessEpg($descv);

# Lets get out of here! :-)

SVDRPsend("QUIT");
SVDRPreceive(221);

close(SOCK);



More information about the vdr mailing list