Skip to content

MagpieTTS refactor #15504

Open
paarthneekhara wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
paarthneekhara:magpietts_refactor_pr
Open

MagpieTTS refactor #15504
paarthneekhara wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
paarthneekhara:magpietts_refactor_pr

Conversation

@paarthneekhara
Copy link
Collaborator

This change is mainly done because EasyMagpie will be reusing some shared functionalities with Magpie. So to avoid code duplication, we are moving common things together.

After this, i will raise a separate PR for EasyMagpie changes.

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
@github-actions github-actions bot added the TTS label Mar 16, 2026
If the model has a baked context embedding, the context_encoder weights are also excluded
since they are no longer needed for inference.
"""
def state_dict(self, destination=None, prefix='', keep_vars=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add back the docstring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the docstrings.


_speaker_verification_model is only included in older checkpoints with the older single_encoder_sv_tts
model_type that is no longer supported and can likely be removed in a future version.
def _get_state_dict_keys_to_exclude(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring to this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the docstrings.

Comment on lines +311 to +320
def remove_bos_token(codes, codes_len, num_tokens=1):
codes = codes[:, :, num_tokens:]
codes_len = codes_len - num_tokens
return codes, codes_len


def remove_embedded_bos_token(embedded, embedded_len):
embedded = embedded[:, 1:, :]
embedded_len = embedded_len - 1
return embedded, embedded_len
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions look identical. Do we need both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, both of them were there earlier and being used. One does the removal in the embedded tensor and one does the removal in code tensor (before embedding). They are also different in their implementation.

Comment on lines +323 to +341
def remove_eos_token(codes, codes_len):
codes_len = codes_len - 1
codes = codes[:, :, :-1]
mask = get_mask_from_lengths(lengths=codes_len)
codes = codes * mask.unsqueeze(1)
return codes, codes_len


def remove_embedded_eos_token(embedded, embedded_len):
"""Remove the last token from embedded sequences.

Args:
embedded: (B, T', D)
"""
embedded_len = embedded_len - 1
embedded = embedded[:, :-1, :]
mask = get_mask_from_lengths(lengths=embedded_len)
embedded = embedded * mask.unsqueeze(2)
return embedded, embedded_len
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions look identical. Do we need both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the above.

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants