Improvements to the forum

biell

@lia Can you provide me with the string(s) it doesn't match, but you would like it to? I believe the solution will be simpler than you expect. I might be an expert on regular expressions, but on youtube, I am not.

Also, keep in mind that you might be correctly matching the string you want, but the plugin can't handle the extra information, and so that is why it breaks. If you have a link to the plugin, I can review that also.

Lia

@biell Thanks for chipping in :)

I'm thinking it might be easier to setup the regex to fail if the link has anything after the ID so that it will just show as a link and not replace with the embed. At least that way the site won't try to always generate an embed and mess up anything else intended in the URL like a playlist.

.

The plugin seems pretty simple, it looks like it just allows you to have a template for an output then uses regex to filter out and feed certain parts.

The default regex given is:

(?:<a.*?)?(?:https?:\/\/)?(?:www\.)?(?:youtube\.com\/(?!user|channel)\S*(?:(?:\/e(?:mbed))?\/|watch\?(?:\S*?&?v\=))|youtu\.be\/)([a-zA-Z0-9_-]{6,11})(?:.*?\/a>)?

By default it just looks for a youtube link and then takes the ID at the end like below.

https://www.youtube.com/watch?v= f0i-KnPu4Rw

That then feeds into the template given below. $1 will be whatever was captured by ([a-zA-Z0-9_-]{6,11})

<div class='embed-wrapper'>
<div class='embed-container'>
<iframe src='https://www.youtube.com/embed/$1' frameborder='0' allowfullscreen>
</iframe>
</div>
</div>

Youtube timestamps work by adding ?t= followed by the seconds from start however in embeds for some reason they use ?start=. So the regex would need to look for ?t= and generate a second capture for whatever comes after (numeric only). Link example below.

https://youtu.be/ f0i-KnPu4Rw ?t= 12

Played around in https://regex101.com/ as recommended by @NotSure using the substitution function to feed the info into the embed template. So far I can capture the data fine but none of my attempts generate a valid output under both scenarios (ID only or ID and timestamp).

This change to the template works but I can't manipulate the regex to pass data in both scenarios without 1 of the scenarios failing :(

https://www.youtube.com/embed/$1?start=0$2

Brings me back to my initial thought of having regex consider a link invalid if it has anything following the video ID.

biell

@lia So, right now, as coded, this RE will strip off the ?t=12 from your example and play the video from the start. Are you saying that you would prefer, instead, for the RE to just not match? Because, if so, then that is actually pretty easy. I will supply that in a followup so I can explain it.

If you would like the "t=" to translate to "start=" then I would need to know the name of the plugin, so I can look at the source and see what would need to change. Can you confirm that this is it: https://github.com/NicolasSiver/nodebb-plugin-ns-embed

If that is the plugin, then what we would do to handle both cases is to create multiple items in the "default-rules.json" data structure, one for no time and another for with time. If that is what you want to do, I can provide the stanzas. But, it is important to note that making more changes may make it more difficult to upgrade the plugin. Generally, I don't recommend people make these kinds of changes to plugins because it complicates security patches.

Lia

@biell Nail on the head, currently it will always start at 0 and ignore everything else including playlists. In turn making it so you couldn't share a playlist. For text links that had a video it meant the text goes missing, buried in the embed code that we don't see.

That is indeed the plugin. I think it will be best to have it only embed on links without anything on the end of the ID. That way it doesn't embed a timecode link, playlist or text links where youtube is linked.

biell

@lia The reason you are having trouble is because the {6,11} is a variable width greedy construct. So, if you put something around it, then the greedy match will just shrink to accommodate it. What we need is a zero width negative lookahead assertion, but the key to its success is to ensure it includes the previous expression, to ensure the {6,11} doesn't just match 10 to keep from failing our negative assertion. Ironically, in this case, the greedy RE match isn't greedy enough and we must force it to gobble up the whole string. With the inclusion of the original character class as a subset of our additional negative assertion character class, the RE will now match URLs without any options (time or otherwise) but will still match simple URLs.

(?:<a.*?)?(?:https?:\/\/)?(?:www\.)?(?:youtube\.com\/(?!user|channel)\S*(?:(?:\/e(?:mbed))?\/|watch\?(?:\S*?&?v\=))|youtu\.be\/)([\w-]{6,11})(?![\w?&-])(?:.*?\/a>)?

Two things here. First, I replace a-zA-Z0-9_ with \w because it means the same thing (at least for a URL, \w accepts more in UTF-8, but we can ignore that). So, now that we have that cleaned up, we have between 6 and 11 (inclusive) of [\w-] but, critically, not followed by [\w?&-] which adds two characters to the character class ?& We exclude these because they start variable assignments in an HTTP GET statement, with ? starting the first variable and signifying to the HTTP server that variable arguments are starting, and & separating all subsequent variables. Because we include both ? and &, we can match www.youtube.com /watch?v=f0i-KnPu4Rw but not www.youtube.com /watch?v=f0i-KnPu4Rw&t=12 and we can match youtu.be /f0i-KnPu4Rw without matching youtu.be /f0i-KnPu4Rw?t=12

Lia

@biell That works! Thank you so much :)

I've added a close bracket to the [\w?&-] so it looks like [\w?)&-]in hopes to also have it not embed when in a link text like [Link text](Youtube link) which works in https://regex101.com/ but not here so wonder if it reads that differently.

biell

@lia I cannot envision why that wouldn't work. You could try putting a backslash \ in front of the paren ) but that should be completely unnecessary. I ran this in node, and it matches as expected

$ node
Welcome to Node.js v17.6.0.
Type ".help" for more information.
> var link=/(?:<a.*?)?(?:https?:\/\/)?(?:www\.)?(?:youtube\.com\/(?!user|channel)\S*(?:(?:\/e(?:mbed))?\/|watch\?(?:\S*?&?v\=))|youtu\.be\/)([\w-]{6,11})(?![\w?)&-])(?:.*?\/a>)?/;
undefined
> link.exec("https://youtube.com/watch?v=f0i-KnPu4Rw");
[
  'https://youtube.com /watch?v=f0i-KnPu4Rw',
  'f0i-KnPu4Rw',
  index: 0,
  input: 'https://youtube.com /watch?v=f0i-KnPu4Rw',
  groups: undefined
]
> link.exec("https://youtube.com/watch?v=f0i-KnPu4Rw)");
null
> link.exec("https://youtu.be/f0i-KnPu4Rw");
[
  'https://youtu.be /f0i-KnPu4Rw',
  'f0i-KnPu4Rw',
  index: 0,
  input: 'https://youtu.be /f0i-KnPu4Rw',
  groups: undefined
]
> link.exec("https://youtu.be/f0i-KnPu4Rw)");
null
>

Also, this plugin is broken, in my opinion, because it should not attempt to run in a code block.

NotSure

@biell said in Improvements to the forum:

What we need is a zero width negative lookahead assertion

blech... glad that's over!

regex was designed by lizard ppl.

biell

@notsure I love regular expressions. But, I do agree that it is a hammer, and not all problems are nails.

In this case, there is no reason for youtube.com and youtu.be URLs to be configured from within the same stanza.

    {
      "name": "youtube",
      "displayName": "Youtube",
      "icon": "fa-youtube",
      "regex": "(?:<a.*?)?(?:https?:\\/\\/)?(?:www\\.)?(?:youtube\\.com\\/(?!user|channel)\\S*(?:(?:\\/e(?:mbed))?\\/|watch\\?(?:\\S*?&?v\\=))|youtu\\.be\\/)([a-zA-Z0-9_-]{6,11})(?:.*?\\/a>)?",
      "replacement": "<div class='embed-wrapper'><div class='embed-container'><iframe src='https://www.youtube.com/embed/$1' frameborder='0' allowfullscreen></iframe></div></div>"
    },

Should be two different configurations, and this idea that they support using them within anchor <a> HTML tags was just unnecessary.

NotSure

@biell said in Improvements to the forum:

@notsure I love regular expressions.

interesting... on a totally unrelated subject, what color is ur blood lizard man?

Lia

@biell said in Improvements to the forum:

Also, this plugin is broken, in my opinion, because it should not attempt to run in a code block.

Agreed, I even have the embed plugin after all the others.Markdown occurs before the embed plugin yet still wants to poke at it.

Did wonder if \W instead of \w would work but no joy. What you gave should just work so I think whatever is running it on the back is bugged. At least it doesn't generate embeds for timestamps anymore though so thank you for solving that one :)

Timestamp example: https://www.youtube.com/watch?v=f0i-KnPu4Rw?t=10
Playlist example: https://www.youtube.com/watch?v=gidOwEmVq5w&list=PLinsBwlGP89HoLf9d1VwmLIqjBikA38d3

biell

@lia said in Improvements to the forum:

if \W instead of \w would work but no joy.

\W is actually the inverse of \w, so it matches every character except [a-zA-Z0-9_].

biell

@notsure 😛