View on GitHub

kfx-guide

Guide for people interested in karaoke effects, hopefully

KFX in terms of music theory

One of the important core concepts for making your KFX look visually pleasing and match the feel of the music is understanding what the sound is actually doing. If this sounds obvious to you, great. If not, don’t worry, I’ll try to explain both what I mean by this, as well as how to actually apply this understanding to your styling.

Envelopes

When talking about sound or music, “envelope” usually describes how a given sound behaves over time. Different instruments can produce very differently-shaped sounds:

Here, we’ll mostly be focusing on the first kind of sounds, though the ideas are transferrable to other kinds of envelopes, too. This kind of envelope is usually thought of as an idealized “ADSR” model, standing for “Attack, Decay, Sustain, Release”. The diagram below demonstrates what each of these mean:

Diagram of the idealized ADSR envelope Diagram of an idealized ADSR envelope. © Abdull / Wikimedia Commons / CC-BY-SA-3.0

The durations of these all can vary, but typically Attack is much shorter than the others. Decay can start immediately after Attack is over (i.e. the sound has reached its peak), but the sound can also be Sustained at or near the peak (such as with the violin), before Release, where the amplitude drops back down to zero. Typically, Release is slightly longer than Attack.

ADSR in KFX

So, what does this mean for KFX? Well, the two biggest takeaways here are these:

  1. Attack should be fast, and generally always the same duration
  2. Long notes should be sustained/decayed before being released

Simple up-and-down effect

It’s much too common a mistake, in my experience, to make a “triangular” syl highlight, that linearly increases, peaks at the middle of the syl, and then linearly decays back to the original size until the end of the syl (see below).

Demonstration of a triangular syl highlight

This could be generated, for instance, with the following simple template. The stock templater would also have the $smid (mid-time of syllable) in-line variable, which could make this even shorter, but as we will see, this is not actually very useful in practice (as we will be doing math to compute transform times anyway).

template syl: {!ln.tag.pos(5,5)! \t(!syl.start_time!,!syl.start_time + 0.5*syl.duration!,\fscx130\fscy130) \t(!syl.start_time + 0.5*syl.duration!,!syl.end_time!,\fscx100\fscy100)}
kara:         {\k50}{\k50}The {\k37}quick {\k38}brown {\k75}fox {\k150}jumps{\k50}

However, this effect has failed on both aforementioned points: The “attack” is very slow, especially on long syls, and thus entirely misses the “peak” of the actual note; and the rest of DSR is also all linear, no matter the length of the syl. Let’s tackle these things one at a time.

Sharper attack

It’s quite simple to fix the first issue: just use a constant duration for the growing part of the effect. Let’s update our template a bit:

template syl: {!ln.tag.pos(5,5)! \t(!syl.start_time!,!syl.start_time + 100!,\fscx130\fscy130) \t(!syl.start_time + 100!,!syl.end_time!,\fscx100\fscy100)}

However, this is not quite perfect. If we have very short syls, the attack will actually be longer than the decay at the end. (TODO: make a sample of this)

One way around this is to limit the duration of the growing part to some portion of the syl’s total duration. This could be done e.g. like so: syl.start_time + math.min(100, syl.duration * 0.4). Naturally, is not the only way to go about it, and we will touch on another way to handle this when we talk about timing to notes.

Slow decay before fast(er) release

Thinking in terms of notes

TODO

Timing peaks to notes

Sustain and release after the syl

TODO (not a real section, just collecting thoughts here)

"ka" syl with \k10 duration, ending as the sibilant of "sa" starts definitely wrong: “sa” starts on the very start of the sibilant, and “ka” is cut very short

"ka" syl with \k21 duration, ending as the vowel of "sa" starts better: “sa” starts at the exact start of the vowel, which is also where the actual beat lands