[vdr] xmltv2vdr speedup and modification
Sebastien Lucas
sebastien.lucas at gmail.com
Tue Feb 13 13:31:22 CET 2007
On 2/12/07, jori.hamalainen at teliasonera.com
<jori.hamalainen at teliasonera.com> wrote:
> > I recently worked on xmltv2vdr.pl (version 1.0.6) and checked
> > why it was so slow on my mighty Celeron 233. So I modified it
> > a little to avoid reading all the xmltv file for each channel
> > defined in the channels.conf. The result is good : I can process
> > my 5Mo xmltv file in less than 10 minutes whereas it took at least
> > 1 hour with vanilla 1.0.6 release.
>
> Something more what you can do (just by looking source you provided)..
>
> --
>
> Caching of &xmltime2vdr, like
>
> return $timecache{$xmltime}->{$skew} if defined $timecache{$xmltime}->{$skew}
> $secs = &Date::Manip::UnixDate($xmltime, "%s") + $skew*60;
> $timecache{$xmltime}->{$skew} = $secs
> return secs;
>
> But it depends on how much this function is called.. But hash lookup is probably
> faster than running UnixDate from library. So it is a memory tradeoff.
I still haven't tested it but I doubt it'll help ..... I'll check later.
> I see that there is still some basic Perl based optimizations for this.
>
> For example there is browsing through @xmllines array, and every iteration
> you recompile *ALL* regexp's. That is as many times as @xmllines has lines.
> And if one recompile takes 1ms -> you waste time @xmllines * 1ms just for
> compiling and not doing anything usefull.
>
> Perl switch "o" is recompile once flag, use that everywhere where it is
> possible. Variable is not a problem unless variable changes in every iteration.
>
[....]
I didn't know that (I'm not really a perl guru ... far from it). I'll
update my version. But it didn't help at all with my benchmark.
>
> --
>
> As there is many times $xmlline is matched with regexps etc. You should experiment
> with "study $xmlline;" after chomp $xmlline. Study makes internal search tables
> for string matches. So see which way the code is faster, with study or without
> study. Use Unix shell's time-command for this. For extra boost with study you
> probably would need to take away subroutine "xmltvtranslate" as for it $xmlline
> is copied to subroutine's parameter space, and what is matched. And study would
> not affect it. So instead of calling "$xmlline=xmltvtranslate($xmlline);" cut&paste
> subroutines code here, and use $xmlline instead of $line.
>
> foreach $xmlline (@xmllines)
> {
> chomp $xmlline;
> study $xmlline;
> $xmlline=~s/und uuml;/ü/go;
> $xmlline...
>
> This isn't pretty but could probably help a bit. You save time for @xmllines times calling
> subroutine, and study would help you a lot as you use the same string all the time.
>
I'll check that later.
>
> For constant string you could use ' ' instead of " ". " causes string to be
> evaluated for variables
>
> if ( $chanCur eq "" ) --> if ( $chanCur eq '' )
>
> But this would be very minor effect..
>
I'll surely be too lazy to test that. sorry.
>
> Split is heavy operation because of creating arrays, but you can limit it.
>
> ( $null, $xmlst, $null, $xmlet, @null ) = split(/\"/, $xmlline);
>
> => ( $null, $xmlst, $null, $xmlet, $null ) = split(/\"/, $xmlline, 5);
>
> or even using regexp for this. I don't know input line for this, but if it is
> foo,"something","something",...
>
> ($xmlst,$xmlet) = $xmlline =~ m:\"(.*?)\",\"(.*?)\":o;
>
> or probably combine 2 regexp to a single
>
> ($xmlst,$xmlet,$channel) = $xmlline =~ m:\"(.*?)\",\"(.*?)\".*?channel=\"(.*?)\":o;
>
> --
>
> Again something very weird:
>
> if ( ($xmlline =~ /\<title/ ) )
> {
> #print $xmlline . "\n";
> ( $null, $tmp ) = split(/\>/, $xmlline);
> ( $vdrtitle, @null ) = split(/\</, $tmp);
>
> # Send VDR Title
>
> SVDRPsend("T $vdrtitle");
> }
>
> Why not?
>
> SVDRPsend("T $1") if $xmlline =~ m:\<title\>(.*?)\</title\>:o;
>
> Same for XML subtitle
> SVDRPsend("T $1") if $xmlline =~ m:\<sub-title\>(.*?)\</sub-title\>:o;
Yes I'll also prefer shorter code. I'll check further if something
like <title lang="en"> is also allowed to adapt the regex. For
information that change has no impact on my bench.
> Generally
> if ( ($xmlline =~ /\<desc/ ) && ( $desccount == $dc ))
> {
> ( $null, $tmp ) = split(/\>/, $xmlline);
> ( $vdrdesc, @null ) = split(/\</, $tmp);
>
> this is not a clever way to parse XML data in Perl. Just us regexp's which
> match strings with Boyer-Moore algorithm (same as Unix grep) and compile once.
>
Agree. I'll try to modify it.
>
> Some logical errors
>
> if ( ($xmlline =~ /\<programme/ ) && ( $xmlline !~ /clumpidx=\"1\/2\"/ ) && ( $chanevent == 0 ) )
>
> => if ( ( $chanevent == 0 ) && ($xmlline =~ /\<programme/ ) && ( $xmlline !~ /clumpidx=\"1\/2\"/ ) )
>
> so program execution can skip if $chanevent != 0 much faster.
> So Regexp would not be ran. This is normal short circuit operation.
>
In fact the check $chanevent == 0 is only usefull if the xml is not
well formed so it doesn't change anything.
> Then
> elsif ( $chanCur ne $chan )
> {
> SVDRPsend("c");
> SVDRPsend(".");
> SVDRPreceive(250);
>
> I think programmer wanted outout of "." -command, and see if it's status is 250?
> But now I think code is checking status of "c" -command? As socket is not read
> between calls, and there should be data in buffer for c-command. But I cannot be
> sure as I don't know SVDRP command that well.
>
> > c
> < 250 foo
> < 250-foo
> > .
> < 354-not ok
>
> It could still succeed if from socket buffer "250-" is read. Also the 2 substr calls
> in SVDRPreceive is a bit weird, but I am uncertain if regexp would help that. At least
> change "-" to '-'.
>
I have no answer about this as I didn't modify that code ..... it
seems to work so for now I don't touch it.
>
> There is a LOT to improve, but if you do these, you Celeron will fly. I hope you'll get
> to minute scale (or even better). Look with "time xmltv2vdr" to see how much process
> time is used for user code and how much for kernel code. And to see if optimizations help.
>
> Have fun.. :)
Thanks a lot for your hints.
Sebastien
More information about the vdr
mailing list