Using Regex in Visual Studio Code (Brief)

Here's how I used the Find and Replace tool in Visual Studio Code to quickly regex some text (converting .srt to Skillshare's weird subtitle format).

This is a neat little trick I stumbled upon whilst trying to convert my .srt files (subtitles) into Skillshare’s weird subtitle format.

The Challenge

A normal .srt file has time codes in this format:

But Skillshare’s requires it in this format:

So in other words, I need to:

  • Replace all commas with dots (easy)
  • Get rid of all numbers preceding timecode (not as easy)

…well, it wouldn’t be easy — except we’ve got Regex to the rescue!

Here, rather than build a program, I’m just going to literally open Visual Studio Code and use that instead.

Summarised solution

Full Solution

Since we want to get rid of numbers, naturally, we start with \d (for digit) as what we want to match. Obviously, it’s gonna match every single number though, so we want to then add the defining characteristic: it’s always followed by a new line. In other words, we’re looking for \d\n.

But unfortunately, that happens at the end of a timestamp too, as you can see by the orange highlights.

So we need to make this a little more complex, and therefore make it so that it matches only when the next character after a new line is a number.

Voila! But…how the heck do we replace it properly? After all, we still want to keep the original timestamps intact too.

Well, the reason I’ve done (\d\n)(\d) is to sneakily make it so that the two groups are separated. The \d\n is matched as group 1 ($1 is the shorthand), and then second \d is matched as group 2 (i.e. $2).

So when we replace our full orange highlight (\d\n)(\d) with just $2, it means we replace that entire expression with whatever happens to be in that second group:

And bam! When we replace all, all those pesky leading numbers are gone. Well…they would be, except there’s one problem. Don’t forget that after 9, there’s actually double digits. So we should make sure that that first group more greedily matches \d instead, to cater for more digits.

We can’t use “*” because that will match everything, so instead, we’re going to use \d+.

And so this is the result:

Conclusion

  • Use $1, $2 as groups to match for
  • Use \d+ to match one or more digits
  • Visual Studio Code is good for basic text processing like this