XMLTV, Kazer & French categories

Added by Stephane Chauveau about 4 years ago

SEE THE REPLY POSTS BELOW FOR AN UPDATED SCRIPT THAT CAN PROCESS ANY INPUT LANGUAGE.

The following information are mostly intended for french users of www.kazer.org but the scripts below can probably be adapted to other tv services. I am on Ubuntu/Linux using MythTV as frontend.

I assume in the following that the user has a Kazer account and that the tv_grab_fr_kazer command (from package xmltv-utils) is already configured. If so, running the following command should give you a nice XML file.

tv_grab_fr_kazer > tv.xml

Some XBMC themes such as Confluence can colorize the tv programs according to their categories but unfortunately that does not work well with Kazer because the categories are given in French instead of
using the names defined in ETSI standard EN 300 468.

Ideally, it should be possible to configure tvheaded to access other strings but this is not yet implemented (see the array _epg_genre_names in epg.c) so I made a quick and dirty perl script to translate the categories.

The first step is to create an executable script /usr/local/bin/tv_grab_fr_kazer_2 containing:

#!/bin/bash
if [ "$1" == "--description" ] ; then 
   echo "France (Kazer2)" 
elif [ "$#" == 0 ] ; then 
  /usr/bin/tv_grab_fr_kazer | /usr/local/bin/category-filter.pl
else
  /usr/bin/tv_grab_fr_kazer "$@" 
fi
The conditions for that script to be recognized as a grabber by xmltv are
  1. it must be executable and located in one of the $PATH directories used when running tvheadend
  2. its name must start by tv_grab_

XMTLV and Tvheadend shall now be aware of an new grabber named "France (Kazer2)" which can be checked from the command line by running the command tv_find_grabbers

$ tv_find_grabbers
/usr/local/bin/tv_grab_fr_kazer_2|France (Kazer2)
/usr/bin/tv_grab_ch_search|Switzerland (tv.search.ch)
/usr/bin/tv_grab_es_laguiatv|Spain (laguiatv.com)
/usr/bin/tv_grab_huro|Hungary/Romania
...

The file /usr/local/bin/category-filter.pl is given below. It is a perl script that reads an xml file from standard input, translates the categories and emits the result to standard output.

#!/usr/bin/perl -w

#
# The categories recognized by tvheadend (see epg.c) 
#  

my $MOVIE             =    "Movie / Drama";
my $THRILLER          =    "Detective / Thriller";
my $ADVENTURE         =    "Adventure / Western / War";
my $SF                =    "Science fiction / Fantasy / Horror";
my $COMEDY            =    "Comedy";
my $SOAP              =    "Soap / Melodrama / Folkloric";
my $ROMANCE           =    "Romance";
my $HISTORICAL        =    "Serious / Classical / Religious / Historical movie / Drama";
my $XXX               =    "Adult movie / Drama";

my $NEWS              =    "News / Current affairs";
my $WEATHER           =    "News / Weather report";
my $NEWS_MAGAZINE     =    "News magazine";
my $DOCUMENTARY       =    "Documentary";
my $DEBATE            =    "Discussion / Interview / Debate";
my $INTERVIEW         =    $DEBATE ;

my $SHOW              =    "Show / Game show";
my $GAME              =    "Game show / Quiz / Contest";
my $VARIETY           =    "Variety show";
my $TALKSHOW          =    "Talk show";

my $SPORT             =    "Sports";
my $SPORT_SPECIAL     =    "Special events (Olympic Games; World Cup; etc.)";
my $SPORT_MAGAZINE    =    "Sports magazines";
my $FOOTBALL          =    "Football / Soccer";
my $TENNIS            =    "Tennis / Squash";
my $SPORT_TEAM        =    "Team sports (excluding football)";
my $ATHLETICS         =    "Athletics";
my $SPORT_MOTOR       =    "Motor sport";
my $SPORT_WATER       =    "Water sport";

my $KIDS              =    "Children's / Youth programmes";
my $KIDS_0_5          =    "Pre-school children's programmes";
my $KIDS_6_14         =    "Entertainment programmes for 6 to 14";
my $KIDS_10_16        =    "Entertainment programmes for 10 to 16";
my $EDUCATIONAL       =    "Informational / Educational / School programmes";
my $CARTOON           =    "Cartoons / Puppets";

my $MUSIC             =    "Music / Ballet / Dance";
my $ROCK_POP          =    "Rock / Pop";
my $CLASSICAL         =    "Serious music / Classical music";
my $FOLK              =    "Folk / Traditional music";
my $JAZZ              =    "Jazz";
my $OPERA             =    "Musical / Opera";

my $CULTURE           =    "Arts / Culture (without music)";
my $PERFORMING        =    "Performing arts";
my $FINE_ARTS         =    "Fine arts";
my $RELIGION          =    "Religion";
my $POPULAR_ART       =    "Popular culture / Traditional arts";
my $LITERATURE        =    "Literature";
my $FILM              =    "Film / Cinema";
my $EXPERIMENTAL_FILM =    "Experimental film / Video";
my $BROADCASTING      =    "Broadcasting / Press";

my $SOCIAL            =    "Social / Political issues / Economics";
my $MAGAZINE          =    "Magazines / Reports / Documentary";
my $ECONOMIC          =    "Economics / Social advisory";
my $VIP               =    "Remarkable people";

my $SCIENCE           =    "Education / Science / Factual topics";
my $NATURE            =    "Nature / Animals / Environment";
my $TECHNOLOGY        =    "Technology / Natural sciences";
my $DIOLOGY           =    $TECHNOLOGY
my $MEDECINE          =    "Medicine / Physiology / Psychology";
my $FOREIGN           =    "Foreign countries / Expeditions";
my $SPIRITUAL         =    "Social / Spiritual sciences";
my $FURTHER_EDUCATION =    "Further education";
my $LANGUAGES         =    "Languages";

my $HOBBIES           =    "Leisure hobbies";
my $TRAVEL            =    "Tourism / Travel";
my $HANDICRAF         =    "Handicraft";
my $MOTORING          =    "Motoring";
my $FITNESS           =    "Fitness and health";
my $COOKING           =    "Cooking";
my $SHOPPING          =    "Advertisement / Shopping";
my $GARDENING         =    "Gardening";

#
# This is the 
#
#
#

my %REPLACE=(
    "Météo"              => $WEATHER ,
    "Film"               => $MOVIE ,
    "Théâtre"            => $PERFORMING,
    "Ballet"             => $OPERA ,
    "Clips"              => $MUSIC ,
    "Concert"            => $MUSIC ,
    "Court métrage"      => $EXPERIMENTAL_FILM,
    "Débat"              => $SOCIAL ,
    "Dessin animé"       => $CARTOON ,
    "Divertissement"     => $VARIETY ,
    "Documentaire"       => $DOCUMENTARY ,
    "Drame"              => $SOAP ,
    "Émission"           => 0,
    "Feuilleton"         => $SOAP ,
    "Fin"                => 0,
    "Fin des programmes" => 0 ,
    "Interview"          => $INTERVIEW ,
    "Jeu"                => $GAME ,
    "Jeunesse"           => $KIDS ,
    "Journal"            => $NEWS ,
    "Loterie"            => 0 ,
    "Magazine"           => $MAGAZINE ,
    "Opéra"              => $OPERA ,
    "Série"              => $MOVIE  ,
    "Spectacle"          => $PERFORMING ,
    "Sport"              => $SPORT ,
    "Talk show"          => $TALKSHOW ,
#    "Téléfilm"           => $MOVIE ,
    "Télé-réalité"       => $VARIETY ,
    "Téléréalité"        => $VARIETY ,
    "Tiercé"             => $SPORT ,
    "Variétés"           => $VARIETY ,
 ) ; 

my $PRE  = '<category lang=\"fr\">' ;
my $POST = '</category>'  ;

sub myfilter {
  my ($a) = @_;
  if ( exists $REPLACE{$a} ) {     
      return $REPLACE{$a} ;
  } else {
      print STDERR "Warning: Unmanaged category: '$a'\n" ;
      return $a ;
  }
}

while (<>) {
    my $line = $_ ;
    $line =~ s/($PRE)(.*)($POST)/"$1".myfilter("$2")."$3"/ge ;
    print $line;
} 

Assuming that you have generated a kazer xml file as indicated below, you can try the script manually as follow:

   /usr/local/bin/category-filter.pl < tv.xml > new.xml  

The resulting file new.xml should contain categories followind the ETSI standard EN 300 468.

Categories that were not recognized, if any, are printed on standard error.

The variables such as $MOVIE and $THRILLER are the EN 300 468 categories. They should not be modified.

The array %REPLACE can be modified. It provides the translations from the french categories to the EN 300 468 categories. Use 0 for categories that you do not care about. Be aware that tvheadend (or is that XBMC) does not manage sub-categories well. In practice, that mean that all categories from the same group will have the same color in XBMC.

The variables $PRE and $POST specify the regular expression used to perform the replacement. They may have to be modified if you want to adapt the script to another service than Kazer.

For information, the categories in Kazer xml files look like that

 <category lang="fr">Magazine</category>

Using regular expressions to perform the replacements is uggly but simple. In the future, I may write a longer version using a proper XML parser and advanced features such as selecting the category according to multiple criterias (title, duration, channel, ... )


Replies (79)

RE: XMLTV, Kazer & French categories - Added by thierry castelot over 1 year ago

hi Renato,

i think it does, i will try tonight after work and let you know.

RE: XMLTV, Kazer & French categories - Added by Nicolas Rioja over 1 year ago

Stephane Chauveau wrote:

For Nicolas,

The easiest way to log the errors is to redirect the error output stream (number 2) to a file.

For example, you can run the script as follow

/usr/local/bin/category-filter.pl 2> /tmp/category-filter.log

or if you want to APPEND to the log file, use a double >> instead

/usr/local/bin/category-filter.pl 2>> /tmp/category-filter.log

The alternative is to modify the perl script itself.

Add the following line at the beginning of the script to open the log file in append mode:

open(LOG, ">>", "/tmp/category-filter.log") or die "Can't open LOG file: $!";

Then clone the 'print STDERR' line and replace 'STDERR' by 'LOG':

print STDERR "Warning: Unmanaged category: '$a'\n" ;
print LOG "Warning: Unmanaged category: '$a'\n" ;

If you do not want to repeat the same error hundreds or thousands of times, you can memorize the wrong categories as follow to emit a single error for each.

At the beginning of the script, create an empty map:

my %BAD ;

Then modify the prints to STDERR and LOG as follow

if ( ! exists $BAD{$a} ) {
print STDERR "Warning: Unmanaged category: '$a'\n" ;
print LOG "Warning: Unmanaged category: '$a'\n" ;
  1. Record in BAD map so next error won't produce a message
    $BAD{$a} = 1 ;
    }

Hi Stephane,
I´m trying to implement your script with the BAD map now because I´m started to get more Unmanaged categories from some time ago since I´ve included more grabbers to fill my .xml file.

The problem is that I´m doing something wrong since I´m recieving this output:

Missing right curly or square bracket at /share/CACHEDEV1_DATA/homes/nico/wg++/categorias/cambia_categorias.pl line 330, at end of line
syntax error at /share/CACHEDEV1_DATA/homes/nico/wg++/categorias/cambia_categorias.pl line 330, at EOF
Execution of /share/CACHEDEV1_DATA/homes/nico/wg++/categorias/cambia_categorias.pl aborted due to compilation errors.

I´ve tried some things but without success. Check the script attached in this post to review it and comment me what is wrong.

Thank you very much

RE: XMLTV, Kazer & French categories - Added by james Bond over 1 year ago

thierry castelot wrote:

i assume that perl is already installed into your pi3 and have the same path as ubuntu.

move the two tv_grab into usr/bin and category into usr/local/bin, make them executable and restart tvh, you should be able to pick tv_grab_fr_alacarte_2.

many thanks : it works!

At least partialy since categories are not all colored inside Kodi.
But at least the grabber is display in Tvheadend interface.

EDIT : IT IS 100% WORKING! and it is maybe 20 time faster than the original script.
that was my fault...
First I was using your zguidetv account :D
Second, I had trash EPG datas preventing the correct mapping of new EPG datas.

RE: XMLTV, Kazer & French categories - Added by thierry castelot over 1 year ago

@ james Bond

your welcome :)

@ Renato

works perfect into Synology ds112j and it's much much faster than my previous script.

you just need to update categories.pl to minimize the number of unmanaged categories.

RE: XMLTV, Kazer & French categories - Added by james Bond over 1 year ago

thierry castelot wrote:

@ james Bond

your welcome :)

@ Renato

works perfect into Synology ds112j and it's much much faster than my previous script.

you just need to update categories.pl to minimize the number of unmanaged categories.

I think your modded script should be posted in its own thread since it is working unlike 98% of the code posted on this thread :D

RE: XMLTV, Kazer & French categories - Added by Renato Moscardini over 1 year ago

@Thierry,
Many thanks, I will try it

RE: XMLTV, Kazer & French categories - Added by thierry castelot about 1 year ago

i made some corrections on this one.

category-filter.pl Magnifier (10.4 KB)

RE: XMLTV, Kazer & French categories - Added by John Mcenroy 12 months ago

Thank you very much for script. But how to make it case insensetive,
so that "Fin" = "fin" ?

I have found function ucfirst() to make first letter upper case, maybe
this can help here, but I can't understand how to implement it.

Thanks

RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau 12 months ago

The lc() function convert a string to lowercase so replace the while loop at the end of the script with


foreach my $key (keys %REPLACE) {
    $REPLACE{lc($key)} = $REPLACE{$key}  ;
}

while (<>) {
    my $line = $_ ;
    $line =~ s/($PRE)(.*)($POST)/"$1".myfilter(lc("$2"))."$3"/ge ;
    print $line;
} 

The purpose of the foreach loop is to clone each entry in %REPLACE with a lowercase key.

In the while loop, lc() is applied to the argument passed to the my filter function.

RE: XMLTV, Kazer & French categories - Added by John Mcenroy 12 months ago

Thanks Stephane, but the problem is that all entries I have in script are upper case,
for example "Fin" => 0 , and my provider in xmltv has "Fin" and also "fin".
So as I understand I need upper case the first letter of xmltv entries of category.

RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau 12 months ago

My previous post using lc is not tested.

Be aware that if you already have lowercase entries in %REPLACE then they may be replaced by a corresponding non-lowercase value.

If this is a problem then you may want to use something like that to prevent existing entries to be overwritten


foreach my $key (keys %REPLACE) {
  if (!exists $REPLACE{lc($key)} ) {
    $REPLACE{lc($key)} = $REPLACE{$key}  ;
  }
}

RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau 12 months ago

Ho! Your keyboard is bro

RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau 12 months ago

The while loop is calling myfilter(lc("$2")). For example, if your provider uses "fin", "Fin" or "FIN" then myfilter will always be called with "fin".

So that means that %REPLACE only need to contain the lowercase key.

The foreach loop takes care of that.

The original non-lowercase values such as "Fin" are still present in %REPLACE but they will never be used.

RE: XMLTV, Kazer & French categories - Added by John Mcenroy 12 months ago

Can't make it work. My provider use two in the same xmltv - "Fin" and "fin" as for example. So there is category "Movie" and "movie" in the same xmltv file. "Movie" works and "movie" not. I am very beginner with perl :)

RE: XMLTV, Kazer & French categories - Added by Stephane Chauveau 12 months ago

Strange. That should work!
Can you send me the script you are currently using and a sample XML input file?

RE: XMLTV, Kazer & French categories - Added by John Mcenroy 12 months ago

Strange. I have just tested the original script with samples. All works. Understand nothing )

P.S. Seems that I am very tired because I have put Movie and movie in replace pl script :)

So it must be

my %REPLACE=(

    "Movie"                           => $MOVIE ,
 ) ;

And in this case it will be one missing categorie - movie.
This sample works with your script. Thanks. Seems somewhere my error.

Update:
Your script works well. Thanks again. But I have investigated a problem that
it works only for english letters if there are chinese, russian, greek and so on letters
it doesn't work and must be modified with these strings:

use utf8 ;
use Encode ;

foreach my $key (keys %REPLACE) {
    $REPLACE{lc($key)} = $REPLACE{$key}  ;
}

while (<>) {
    my $line = $_ ;
    $line = decode_utf8($line);
    $line =~ s/($PRE)(.*)($POST)/"$1".myfilter(lc("$2"))."$3"/ge ;
    $line = encode_utf8($line);
    print $line;
}

test.xml Magnifier (74 Bytes)

xmltv_convert_categories.pl Magnifier (543 Bytes)

RE: XMLTV, Kazer & French categories - Added by Meindert Oldenburger 7 months ago

I like to have the "Unmanaged categories" unique and how often they appear and ordered:

Add/replace the following code:

my %Categories;
my $PRE = '<category lang=\"nl\">';
my $POST = '</category>';

sub myfilter {
my ($a) = @_;
if (exists $REPLACE{$a}) {
return $REPLACE{$a} ;
} else {
if (exists $Categories{$a}) {
$Categories{$a} = $Categories{$a} + 1;
} else {
$Categories{$a} = 1;
}
return $a ;
}
}

while (<>) {
my $line = $_ ;
$line =~ s/($PRE)(.*)($POST)/"$1".myfilter("$2")."$3"/ge ;
print $line;
}

foreach my $category (sort { $Categories{$a} <=> $Categories{$b} } keys %Categories) {
print STDERR "WARNING: Unmanaged category: '$category' is $Categories{$category}\n";
}

RE: XMLTV, Kazer & French categories - Added by Alexandre E 3 months ago

The last source of XMLTV is not available anymore.
http://xmltv.dtdns.net/alacarte/

Have anyone found a different source ?
In the end, has KAZER been able to source HeadTvEnd on Synology DSM6+ ?
Right now, I am stuck with no program input, which is a shame as all the rest has proved to work perfectly !

Thanks for your sharing

RE: XMLTV, Kazer & French categories - Added by thierry castelot 3 months ago

hello Alexandre,

i will take a look this week end, i've found some news xml but i need rewrite the category.pl to match with.

RE: XMLTV, Kazer & French categories - Added by Alexandre E 3 months ago

Thierry,
Thanks for your answer !
Actually, I am a bit disappointed about the loss of programs.
Even a straight XML input (without categories) would please me enough :-)
Would be nice if we could get in touch.
Is my email available in my profile ?

I much thank you for your contribution.

Regards

RE: XMLTV, Kazer & French categories - Added by thierry castelot 3 months ago

Alexandre,

just replace your tv_grab_fr_alacarte by this one, it should works. i will update category.pl later

tv_grab_fr_alacarte (340 Bytes)

RE: XMLTV, Kazer & French categories - Added by Renato Moscardini 3 months ago

Hello Thierry,
Many thanks for your solution.
Is there a way to have episode # in the episode column of tvheadend, not in the title ?
In this way I could manage the naming of file as I prefer.

Anyway in the wait time, it is a nice solution.

Kind regards

RE: XMLTV, Kazer & French categories - Added by Alexandre E 3 months ago

Many thanks Thierry
I am off for a few days, and will try and report as soon as I am back

Once again thanks for your help !

RE: XMLTV, Kazer & French categories - Added by Alexandre E 3 months ago

Works again !
But, for some reason, the Categories don't work (they never have)
In the log, I can see many many lines with "UNMANAGED CATEGORY" as if it know none.
Is there something I have missed ?

RE: XMLTV, Kazer & French categories - Added by thierry castelot 3 months ago

hello Renato,

according to racacax (author of this xmltv) it should be fixed soon:

" ◘ Modification de l'affichage des épisodes des séries. Les modifications seront prises en compte progressivement avant que ce soit entièrement fonctionnel dès la semaine prochaine. Cela concerne la majorité des chaines."

<programme start="20170909190000 +0200" stop="20170909194500 +0200" channel="6ter">
<title>Rénovation impossible</title>
<sub-title lang="fr">Ambiance country</sub-title>
<episode-num system="onscreen">S05E09</episode-num>

Alex, it's sadly normal, i'm waiting for a new update from racacax to clean up the xmltv ( too many categories because he put the length into categories).

1 2 3 4 (51-75/79)