scraping season/episode info from the event description in EPG data, for EIT streams
|Assignee:||Adam Sutton||% Done:|
|Category:||EPG - Grabbers|
Feature 2270 implemented user specified parsing of the event description field in opentv EPG data, searching for unique strings that indicate the location of season and episode numbers.
It would be great if a similar parsing mechanism was available for EIT based titles. For example, Canal Digitaal stores a string
(s 3/afl 4)
to indicate Season 3.Episode 4.
If we could create a file with patterns in epggrab/eit/dict or so, and the eit parser could use it, that would be very helpful.
XMLTV does not help us out much, the networks seem to strip the season/episode info before sending it to tvgids.nl, tvgids.tv and others...
See also Feature 508 and 766 for a sample from Poland.
#1 Updated by Rob vh over 2 years ago
- File episodes.pl added
From the json data in dvr/log I was able to parse the (dutch) description field and rename the recording files so that my nfs based clients can at least see the SnEnn info. Obviously this doesn't update the info that HTS clients will see, but then, I don't have those . The attached Perl script now runs every morning to rename the catch of the previous day.
#2 Updated by Damian Gołda over 2 years ago
If I understand correctly, https://github.com/tvheadend/tvheadend/blob/master/src/epggrab/module/opentv.c#L391-L416 does for opentv more less the same what I've proposed 3 years ago for EIT in https://tvheadend.org/issues/766
The difference is that opentv.c performs macthing using summary and I need it using title.
#3 Updated by Damian Gołda over 2 years ago
#5 Updated by Damian Gołda over 2 years ago
- File epg.png added
See attached screenshot with current EPG (from EIT):
You can see for example:
Title: "CSI: Kryminalne zagadki Las Vegas - s. V, odc. 7"
Summary: "W pokoju hotelowym znaleziono zwłoki dziewczyny. Została uduszona. Okazuje się, że była na imprezie, z której uprowadzono również jej koleżankę. Ojciec porwanej nie jest przejęty tym faktem."
"s. V, odc. 7" is abbreviation from "s" for "sezon/seria" (season), "odc" for "odcinek" (episode) and means: "season 5, episode 7"Other examples:
- "Fala zbrodni - s. III, odc. 31" - season 3, episode 31
- "M jak miłość - odc. 1109" - episode 1109 (season unknown)
- "Obsesje - odc. 5/6" - episode 5 from 6 total
- "Graceland - s. I, odc. 7/13" - season 1, episode 7 from total 13
- "Autostrada do nieba - odc. 3, Dotknąć księżyca" - episode 3, episode title: "Dotknąć księżyca"
- "Libera - Przewodnik po sztuce - (s. III, odc. 2) - ERNA ROSENSTEIN" - numbers in parenthesis and episode title : "ERNA ROSENSTEIN"
- episode between slashes: "Muzeum Polskiej Piosenki - /14/ - "Dziwny jest ten świat" - Czesław Niemen"
- episode number without "odc." abbreviation (odc. for odcinek/episode): "Polonia w Komie - (647) Emigracze 2"
All of them have episode and season numbers in title. Description/summary has no information about episodes.
It is common for Polish DVB-T EIT EPG.
And also common is using roman numerals (I, II, III, IV, V ...) for season number.