Operation Guides

CV Japanese

Before you begin:

- This guide was written and coordinated by members of the UTAU community and has no affiliation with the UTAU software itself or its creator.

- This guide was written with users of WINDOWS 10 in mind, advice does not immediately translate to other operating systems or other versions of UTAU, such as UTAU-Synth for MacOS.

- This guide relates to the original UTAU software by Ameya which released in 2008 and NOT OpenUTAU, the fanmade UTAU alternative, as such the utility of this resource may vary.

- While the process of operating UTAU is ultimately safe when done correctly, JOEZCafe and other parties involved in JOEZUTAU projects take no responsibility for any incidents, loss or damage to users or property from following these instructions.

Making your blank UST CV compatible:

If you've been following the previous guides on this site up to this point, you should have a blank UST that's ready to be customised and used by your CV voicebank of choice.

The backbone of every voicebank type in UTAU is how the lyrics are entered, so we'll begin with learning how to enter lyrics using the CV method.

When working with CV, it's simply a matter of changing each note to the Japanese syllable we would like to use in that note, which can be done in two ways:

Method 1:

Double click the note you wish to modify, then enter the new lyric you'd like to give that note.

Method 2:

On the Lyric bar on the top toolbar, enter the lyrics you'd like to give a sequence of notes in one, uninterrupted string, then click and drag across your viewport to highlight the notes you want to allocate the lyrics to.

With the notes highlighted, select the Substitute Lyrics button to the immediate right of the Lyric bar, UTAU will automatically allocate each syllable to each respective highlighted note in the sequence.

Recognising valid aliases:

In UTAU, the term "Alias" refers to the ID a sample is given that tells UTAU to retrieve it from a voicebank's recordings, in the case of CV, the Hiragana you enter into each note is the Alias of each sample in the CV voicebank you're using.

When working with UTAU, it's good to recognise what a note looks like when UTAU recognises the Alias you have entered.

And just like that, you have an UTAU UST formatted for CV Japanese!

You can preview a vocal phrase by clicking and dragging on the notes you want to hear, then pressing Play!

A note in UTAU, visually speaking, consists of two major parts:

- The BOLD LINE at the BOTTOM of the note is the note's length, the length of this line dictates the starting and ending point of the note, and this is the visual you should be following when syncing and timing the notation of your UST.

- The THIN LINE on the TOP AND SIDES of the note are the envelopes, this represents what parts of the recording are audible.

This distinction is important as you may notice on some notes that the thin line extends a bit further to the left than the bold line at the bottom, for example, in the above image, the note is exactly one beat long as shown by the bold line, but the thin line on the left hand side extends to a bit before the beat.

This is because most samples have an "Overlap", an overlap is a small chunk of the recording that plays very briefly before the actual note itself begins, this often consists of smaller sounds that play before the main vowel sound, like opening consonants, and blends with the prior note to make the vocals blend more effectively, even when working with CV Japanese.

When modifying the lyric of the note, you will notice the Overlap on the left hand side changing size, this is because the note recognises the Alias you entered and is recovering the Overlap Value of that sample each time, effectively recalibrating the timing of the note to accomodate the new recording.

To contrast, the note pictured above has an Alias that is not recognised by the voicebank we're using, there are no samples that use the Alias we've entered, and as such, it has no audio or timing values to recover, the Overlap line on the left has tucked into the beginning of the note and when we press Play on this note, no audio of any kind will play.

In other words, we can use the Overlap Line to tell if the note recognised the Alias we've entered and found a matching sample in the voicebank.

When working with UTAU, each note must contain the EXACT ALIAS that was entered in the voicebank's configuration if you want to use a specific sample.

Make sure the character you've entered is spelt and formatted correctly and does not contain any additional characters, like punctuation or spaces.

Advanced - Making voiceless samples in CV

As represented in its name, CV banks have samples that always loop a vowel sound, but there are occasions where you want a standalone consonant sound to string two consonants together.

Let's say for example that you have the phrase [ぼくの / "boku no"], but you want to reduce the vowel sound in "ku" to make the delivery sound more akin to "bok-no".

In other words, we want to make "ku" into a voiceless sample, something most CV Japanese banks will lack, but we have a range of workarounds to mimic this effect.

There are two main ways of accomplishing this effect with CV Japanese voicebanks.

Method 1:

This method incorporates what we've learned about operating the magnetic timeline and reading the Overlap Lines on notes to make something quick and dirty.

Using the CTRL Dragging method, simply shrink the note you want to make into a voiceless sample down so that it's the same size as the Overlap of the note that comes after it.

As shown in the below example, we've CTRL Dragged the tail end of ぼ/bo so that the note containing く/ku is the size of the overlap box to the left hand side of the note containing の/no, which will make it so く/ku only plays during the duration that UTAU is already trying to transition into の/no, likely cutting off the vowel sound of く/ku and leaving the consonant to play on its own.

Method 2:

Method 2 is significantly more hands on and allows some more customisability of where in the sequence the consonant sound plays.

In this method, we right click on the note containing く/ku and enter the Envelope menu.

The Envelope menu lets you adjust the volume levels on different sections of a note, a note consists of four envelopes, labelled P1, P2, P3 and P4 from left to right and represent the attack and release of a note.

In this menu, clicking and dragging P4, the envelope on the far bottom right hand side of the window lets you adjust the time in the recording that the note "releases", gradually muting the sample.

The grey line on the left hand side is the border where the sample's consonant transitions into its vowel, so we just want to slide P4 to the left hand side until it reaches the grey line.

P3, the envelope on the top right above P4, will also move along with P4 automatically.

In effect, we have instructed this note to mute itself as soon as the sample reaches the vowel, making only the consonant sound audible, with a slight fade so it doesn't sound too sudden or "cut out".

Your changes are then reflected on the viewport as soon as you click "OK".

Notice in the below screenshot how く/ku has an Overlap Line, but it immediately sinks down as soon as it reaches the Bold line at the bottom, or in other words: When the vowel sound, the actual note begins.

Blending your CV notes

You should now have a fully formatted CV UST that can be sung from beginning to end using a standard CV Japanese voicebank!

You've made a very impressive step in your journey to learn this software!

As we've demonstrated prior, CV voicebanks are generally choppier in vocal clarity as they lack the overlap samples that allow the notes to merge together smoothly, but some smoothing can be done with CV USTs nevertheless to help alleviate some of CV's limitations.

Firstly, highlight all of the notes in your sequence using CTRL+A, while every note is highlighted, you should notice this selection of buttons on the top toolbar:

These are shortcut buttons that do a range of envelope changes to your notes, but in most workflows there are only two you will be using.

- P2P3 (Second button from the left) crossfades the envelopes that two adjacent notes meet at, making the transition between two samples slightly smoother, this is an absolute necessity when working with UTAU, especially when you begin working with other voicebank types.

- RESET (The last button in the row) restores a note's envelopes to their default values, as if the note was just created and the Alias had just been placed, which is handy for fixing any prior adjustments that could clutter the UST.

When finalising a sequence in UTAU, the most common process users undergo is:

- Resetting all of the notes in the sequence with RESET

- Crossfading all of the notes in the sequence with P2P3

This will ensure your sequence is as smooth as the voicebank type allows.

When using RESET, be mindful of any notes that you made manual envelope adjustments to, such as when making Voiceless samples, as these will also be reset.

If a note isn't responding to RESET or P2P3, the chances are it might have an STP value, this occurs when ACPT (The first button in the row) is selected, which locks the envelopes and timing values in place so they can't be adjusted by any means.

To fix this, right click the note, enter the Note/Region Properties, and make sure the STP box is completely blank (The box must be white, as a grey box means there are multiple values across multiple notes).

Using the Crossfade plugin for vowel samples

For additional smoothness, all copies of UTAU have a Built In Plugin for crossfading standalone vowel samples with each other, allowing multiple adjacent vowel samples to blend without cutting or skipping.

This step should be done after you've done the RESET+P2P3 method with the rest of the notes in your UST.

To begin, highlight a vocal phrase where you have adjacent vowel sounds (or highlight the entire UST with CTRL+A), then press U on your keyboard to shortcut to the Crossfade menu.

In this menu, ensure the checkbox next to "Crossfade" is checked.

When this plugin is activated, any highlighted adjacent notes containing the aliases listed in the "target" box will have their envelopes adjusted to crossfade into eachother in a manner different to just using P2P3 on its own, allowing vowel sounds with no consonants to sound like one smooth sustain.

You can even have the volume level of the sustain gradually reduce as the vocalist advances to the next vowel sound to simulate a smooth release by checking the "Volume" checkbox.

Upon clicking OK, the envelopes of the notes containing standalone vowel sounds will change accordingly, and their transitions will sound so much smoother!

Your UST is now formatted for a smooth and clean output using CV Japanese!
Later guides will walk you through the other processes you can perform to improve your vocal output!

Your UST is now formatted for a smooth and clean output using CV Japanese! Later guides will walk you through the other processes you can perform to improve your vocal output!

Your UST is now formatted for a smooth and clean output using CV Japanese!
Later guides will walk you through the other processes you can perform to improve your vocal output!