Samir Parikh / Blog


Originally published on 31 December 2019

Contents

Background

If you have a toddler in the house, chances are that you are quite familiar with the antics of the super pups Chase, Marshall, Zuma, Skye and the rest of the gang known as "PAW Patrol". This happens to be my son's favorite show which we try to record on the TiVo whenever it airs on Nickelodeon. The challenge is that the shows don't air as frequently as we'd like and when we do record them, we have to deal with the inevitable commercials (or have someone fast forward through them). The good news though, is that Nickelodeon does make a few of the full episodes of this and other shows available on their kids site nickjr. Since our TV has a crude but built-in web browser1, we can stream these episodes directly to the TV but the process is clunky and the streaming sometimes pauses in between the chapters. Since I've recently been playing around with Plex, I thought to myself, "How hard would it be to download the episodes into our library and then play them off our Plex server?"2 It turns out this was much more difficult than I imagined but ultimately allowed me to sharpen my command line skills in the process!

Downloading the Episodes

The first thing I had to do was download the episodes locally. To do this, I created a file called episodeURLs.txt which is nothing more than a text file containing just the URLs of the episodes I wanted to download:

http://www.nickjr.com/paw-patrol/videos/paw-patrol-4-pups-save-a-white-wolf-pups-save-a-wrong-way-explorer-s6-ep619-full-episode/
http://www.nickjr.com/paw-patrol/videos/paw-patrol-4-pups-save-a-bat-family-pups-save-a-mud-monster-s6-ep614-full-episode/
http://www.nickjr.com/paw-patrol/videos/paw-patrol-322-the-pups-winter-wonder-show-s3-ep322-full-episode/
http://www.nickjr.com/paw-patrol/videos/paw-patrol-8-pups-save-the-balloon-pups-pups-save-the-spider-spies-s6-ep616-full-episode/
http://www.nickjr.com/paw-patrol/videos/paw-patrol-1-pups-save-the-squirrels-pups-save-a-roo-s6-ep620-full-episode/
...

Next, I wrote a simple Bash script which downloads the episodes locally using the powerful command line tool youtube-dl. If you've never played around with youtube-dl, I really encourage you to give it a go. The syntax is crazy simple. Just append the URL of the video you're trying to download to the command:

$ youtube-dl https://www.youtube.com/watch?v=dQw4w9WgXcQ

and it will download a copy of the video to your current directory. Our Bash script, downloadScript.sh, which just has to read through the URLs in episodeURLs.txt, is pretty straightforward:3

#!/bin/bash
EPISODES="episodeURLs.txt"
while IFS= read -r url 
    do
        youtube-dl -f best "$url"
    done < "$EPISODES"

You can run the script via the commands:

$ chmod +x downloadScript.sh
$ ./downloadScript.sh

But once you peek in the directory where the files were just downloaded, you'll notice a big problem: each episode is represented not just by one mp4 file but by five, sometimes six files! In addition, the filenames themselves, which come nowhere near Plex's notoriously opinionated naming convention, are a mess! Here's an example of what the filenames look like just for first episode in the download list:

PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S1-c4d8ef2c-97e8-4963-b667-322dd2a0f0d8.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S2-0f6bcf44-e577-4d43-b560-ecde897bdb39.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S3-f393af48-c3c7-42f9-a42a-2dd943d97aa9.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S4-d5a9f7f1-86d5-4f68-b88f-49be24c46b3b.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S5-b9fd8cc5-3a0a-4472-8700-26f849699bb7.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S6-2d31faaa-bf1d-453e-9f9f-a241b0a9f805.mp4

I tried searching for ways to have youtube-dl download each episode as only one file but couldn't find a way to do so.4 I then tasked myself with seeing if I could come up with another Bash script that would both consolidate the disparate files into one mp4 file and rename it into something that would allow Plex to fetch the appropriate metadata allowing me to find the right episode in my library.

I knew from some prior research I had done that the extremely powerful command line utility ffmpeg allows you to concatenate various video files into one.5 The way ffmpeg does this is by taking as input a text file containing the names of the individual files that you want to concatenate along with the name of the output file you want, for example:

$ ffmpeg -f concat -safe 0 -i mylist.txt -c copy output.mp4

For some reason, as I originally read through the documentation, I thought that I would have to first clean up the filenames of the downloaded video segments and then list them in the input text file in order to have ffmpeg merge them into one file. The majority of the rest of this blog post will detail how I ended up using another powerful command line program, awk, to do that.6

"No Job is Too Big, No awk is Too Small!"

Before generating the mylist.txt file that I would feed to ffmpeg, I wanted to clean up the filenames generated by youtube-dl. Specifically, I wanted to remove the spaces and single quotation marks from the filename. In addition, I wanted to jettison the characters appearing after the S1, S2, etc. in each of the filenames. In other words, I wanted to take a filename like:

PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S1-c4d8ef2c-97e8-4963-b667-322dd2a0f0d8.mp4

and convert it into something like:

PAWPatrol-PupsSaveaWhiteWolf_PupsSaveaWrongWayExplorerS1.mp4

There are many Unix command line tools available to help achieve this, such as sed or tr or pure regular expressions. I have briefly dabbled with sed but have always wanted to learn how to use awk. And based on some superficial Stack Overflow advice, going with awk seemed to make some sense. Therefore, the first step for me was to see if I could just loop through the files in the directory and print the filenames out "as is" using awk.

Note: Going forward, I am going to assume that your current working directory only contains the individual files from one episode. The script I ultimately developed does not have the capability to go through a folder with all the files that were downloaded using the downloadScript.sh script we wrote earlier. Therefore, you have to manually copy over just the files for one episode into the directory containing the rename.sh script we are writing.

My first attempt at creating the rename.sh script using awk looked something like this:

#!/bin/bash
for file in *.mp4
    do
        newfile=$(awk '{ print $0 }' <<< $file)
        echo $newfile
    done

While this doesn't do anything more than just print the filenames of the mp4 files in the directory, it did take me some work to figure out how to pass the $file variable to awk. I ended up going with the here string approach as suggested by another Stack Overflow answer. In the version above, the $0 portion of the print command prints the entire filename. Next I wanted to omit anything between the S1 portion of the filename and the extension (i.e. get rid of the c4d8ef2c... string). To do that, I decided to use awk's field separator option to break out the filename into segments separated by the dash, or -. The resulting update looks like:

#!/bin/bash
for file in *.mp4
    do
        newfile=$(awk -F"-" '{ print $1 "-" $2 ".mp4" }' <<< $file)
        echo $newfile
    done

where $1 represents the first segment (i.e. "PAW Patrol") and $2 represents the second segment (i.e. "'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S1") of the filename. Running this version of the script results in filenames that now look like:

PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S1.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S2.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S3.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S4.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S5.mp4
PAW Patrol - 'Pups Save a White Wolf_Pups Save a Wrong Way Explorer' S6.mp4

We are getting close, but not quite where I want to be. The last awk power that I wanted to summon was to remove any spaces or single quotation marks from the filename. Fortunately, Stack Overflow is our friend again which introduces us to awk's string-manipulation functions, specifically gsub. This function allows you perform a global substitution (i.e. "find and replace") by taking three arguments: (1) the text you want to find, (2) the text you want to replace it with, and (3) an optional target (e.g. $1, $2) for the substitution. If no target is provided, gsub defaults to performing the substitution on the entire record ($0). While the gsub syntax is fairly simple, figuring out the regular expression to match either a single quote or a space was not so easy. I knew that the basic form would take on something like /[|]/ with the pipe (|) denoting an or condition. The challenge was how to represent the single quote ('). I tried a variety of options such as:

/['| ]/
/[\'| ]/
/["'"| ]/

but nearly all of them resulted in unexpected EOF while looking for matching and unexpected end of file errors. I finally stumbled upon this Stack Overflow answer which suggested using the octal code for the single quotation mark which is \047. Adding the gsub function with the correct regular expression to the awk statement now gets us to the following script:

#!/bin/bash
for file in *.mp4
    do
        newfile=$(awk -F"-" '{ gsub(/[\047| ]/, ""); \
            print $1 "-" $2 ".mp4" }' <<< $file)
        echo $newfile
    done

When run, we finally get the new filenames we have been looking for:

PAWPatrol-PupsSaveaWhiteWolf_PupsSaveaWrongWayExplorerS1.mp4
PAWPatrol-PupsSaveaWhiteWolf_PupsSaveaWrongWayExplorerS2.mp4
PAWPatrol-PupsSaveaWhiteWolf_PupsSaveaWrongWayExplorerS3.mp4
PAWPatrol-PupsSaveaWhiteWolf_PupsSaveaWrongWayExplorerS4.mp4
PAWPatrol-PupsSaveaWhiteWolf_PupsSaveaWrongWayExplorerS5.mp4
PAWPatrol-PupsSaveaWhiteWolf_PupsSaveaWrongWayExplorerS6.mp4

Now we can replace the echo command in the script with mv which will actually rename the file:

#!/bin/bash
for file in *.mp4
    do
        newfile=$(gawk -F"-" '{ gsub(/[\047| ]/, ""); \
            print $1 "-" $2 ".mp4" }' <<< $file)
        mv "$file" "${newfile// /}"
    done

You'll note the unusual (at least to me) format of how the $newfile variable is specified in the mv command. Without it, I kept on getting an mv: target is not a directory error. I had to resort to this approach based on (you guessed it) but still don't understand why it works. But now that it does, it's fairly trivial to augment the script to generate the mylist.txt file that we will have to feed to ffmpeg to concatenate the files:

#!/bin/bash
CONCAT_FILE="mylist.txt"
for file in *.mp4
    do
        newfile=$(gawk -F"-" '{ gsub(/[\047| ]/, ""); \
            print $1 "-" $2 ".mp4" }' <<< $file)
        mv "$file" "${newfile// /}"
        echo file \'$newfile\' >> $CONCAT_FILE
    done

Adding the ffmpeg Command and File Cleanup

Now that the script creates the mylist.txt file for us, we can turn our attention to crafting the ffmpeg statement we need to concatenate all of the files into one mp4 file based on the command given near the beginning of this post. As I mentioned earlier, Plex is extremely opinionated on how media files are to be named, especially if you want it to use and display the correct metadata in your library. For TV shows, Plex uses TheTVDB.com database and naming convention. That means for each episode, you need to find how TheTVDB.com categorizes the show. For example, the episode of "PAW Patrol" that we have been working with in the examples above aired as part of Season 6 of the series, which I gleaned from the S6 in the URL used to download it. Based on TheTVDB.com page for Season 6, we can see that this show actually contains two episodes:

Therefore, according to Plex's naming convention for TV shows with multiple episodes, the new filename should be something like:

Paw Patrol - s06e37-e38 - Pups Save a White Wolf_Pups Save a Wrong Way Explorer.mp4

With that sorted, we can now add our ffmpeg command to our script:

#!/bin/bash
CONCAT_FILE="mylist.txt"
for file in *.mp4
    do
        newfile=$(gawk -F"-" '{ gsub(/[\047| ]/, ""); \
            print $1 "-" $2 ".mp4" }' <<< $file)
        mv "$file" "${newfile// /}"
        echo file \'$newfile\' >> $CONCAT_FILE
    done
ffmpeg -f concat -safe 0 -i $CONCAT_FILE -c copy "Paw Patrol - s06e37-e38 - Pups Save a White Wolf_Pups Save a Wrong Way Explorer.mp4"

Finally, we can add a few lines to help clean up our renaming directory after the script runs as well as to move the final concatenated file into the correct Season folder for Plex to see:

#!/bin/bash
CONCAT_FILE="mylist.txt"
TARGET_DIR="/home/samir/Videos/Paw Patrol/Season 06"
for file in *.mp4
    do
        newfile=$(gawk -F"-" '{ gsub(/[\047| ]/, ""); \
            print $1 "-" $2 ".mp4" }' <<< $file)
        mv "$file" "${newfile// /}"
        echo file \'$newfile\' >> $CONCAT_FILE
    done
ffmpeg -f concat -safe 0 -i $CONCAT_FILE -c copy "Paw Patrol - s06e37-e38 - Pups Save a White Wolf_Pups Save a Wrong Way Explorer.mp4"
rm mylist.txt
rm PAWPatrol*
mv Paw* "$TARGET_DIR"

Future Enhancements

The script we just created isn't pretty, user-friendly or very portable to other shells but it works. If I had more time to spend on the script, I would have tried having it accept command line arguments to allow me to enter the episode's season, number and new title. This way, I wouldn't have to edit the line in the script containing the ffmpeg command each time I had to process a new show. At a certain point, I just ran out of steam and wanted to process the shows I had already downloaded and move on. If nickjr decides to post additional shows that my kids like, I may go back and update the script to do this.

Another (but much more difficult) improvement would be to augment the script's functionality to allow it go through a directory of files containing the segments of multiple episodes and stitch them together. I presume this would require more complex logic as the script would have to determine what the final output filename would be for each episode on its own, perhaps by making some type of API call out to TheTVDB.com site. For now, this is well beyond the limits of my ability and I would probably just resort to using something like Sonarr.

Epilogue

As part of writing up this blog post, I went back to reread the part of the ffmpeg documentation detailing how it concatenates multiple video files into one file. As I mentioned earlier, for some reason, I never got past the section of the documentation where it describes how ffmpeg can take as input a text file containing the filenames of the individual files that you want to merge. Had I continued reading, I would have found an example of how you can also accomplish this by automatically generating the input file without having to massage the filenames themselves. It didn't occur to me until I was putting this post together that had I just gone ahead and done that, I wouldn't have had to faff about with all of the awk and regex gymnastics to accomplish what I wanted to do. I could have just had ffmpeg process the files as they were originally named after downloading them.

It also hasn't escaped me that it also would have been much easier to have just paid for the episodes themselves on a streaming service such as Amazon Prime Video or have bought the DVDs of the show and ripped them to my Plex library. But had I done that, I wouldn't have learned as much about shell scripting, command line tools such as youtube-dl, awk, or ffmpeg, or how Plex manages your library and metadata. And while I haven't fully explored my options, the scripts do seem to work for a handful of other nickjr shows such as "Abby Hatcher", which my daughter really enjoys.

Finally, this exercise has taught me how powerful Unix command line tools such as awk, regular expressions and Bash scripting really are and how much more I have to learn. While I wish I had the time to read through the myriad of official documentation, tutorials and books on these tools, the sad reality is that most of what I learned came about by searching for Stack Overflow answers online. For now, this got the job done but I hope to find more time in the coming New Year to really spend some dedicated time learning these tools in depth.


  1. Yes I know.

  2. Before moving ahead with my plan, I did read through the nickjr site Terms of Use to make sure I wasn't running afoul of any of terms and conditions. I'm not a lawyer but my interpretation of Section 1 on "Ownership of Intellectual Property", specifically this bit here:

    Except as otherwise specifically noted on this Site or in connection with any particular Material, VMN hereby grants you a personal, non-exclusive, non-assignable and non-transferable license to use and display, for your noncommercial and personal use only, one copy of any Material and/or software that you may print or download from this Site, including, without limitation, any files, codes, audio or images incorporated in or generated by the software, provided that you maintain (without alteration) all copyright and other notices contained in such Material and software.

    is that I am ok with doing this within the bounds of what I describe here.

  3. Neither this Bash script, nor the one that follows below, are what I would call "production grade" scripts. They are just quick-and-dirty scripts I threw together to work within the Bash shell. I don't think they are portable to other shells or POSIX-compliant (e.g. #!/bin/sh).

  4. If anyone knows how to do this, please let me know!

  5. It actually does a whole lot more!

  6. In the Epilogue, I describe why, had I read a little bit further in the ffmpeg documentation, I might have been able to solve this problem in a much more straightforward manner.