{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# BAYa class Assignment 2023\n", "\n", "In this assignment, your task will be to implement and analyze inference in the Probailistic linear discriminant analysis (PLDA) model. This model was described in the corresponding [slides from BAYa class](http://www.fit.vutbr.cz/study/courses/BAYa/public/slides/2-Graphical%20Models.pdf). You will accomplish this task by completing this Jupyter Notebook, which already comes with a code generating the training data and some plotting functions for presenting the results. If you do not have any experience with Jupyter Notebook, the easiest way to start is to install Anaconda3, run Jupyter Notebook, and open this notebook downloaded from [BAYa_Assignment2023.ipynb](http://www.fit.vutbr.cz/study/courses/BAYa/public/notebooks/BAYa_Assignment2023.ipynb). You can also find some inspiration and pieces of code to reuse in the other [Jupyter Notebooks provided for this class](http://www.fit.vutbr.cz/study/courses/BAYa/public/notebooks).\n", "\n", "The Notebook is organized as follows:\n", "1. First comes a cell with a code of functions that will be later used for presenting the results and the learned models. Specifically, it contains code for plotting 2 dimensional multivariate Gaussian distributions, and plotting 2-dimensional data points. You can skip this cell first as the use of the functions will be demonstrated later.\n", "2. Next comes a code that \"handcrafts\" some parameters of the PLDA model and implements the generative process assumed by the PLDA model. The code generates some artificial training data that you will use for PLDA model training. Please carefully read this code and the comments around it.\n", "3. Through this notebook, there are cells with instructions to fill in your implementations around the PLDA model. There are also fields with other tasks to accomplish and questions to answer. \n", "\n", "**Do not edit the code in the following cell for generating and presenting the training data!**\n", " $$\n", " \\DeclareMathOperator{\\E}{\\mathbb{E}}\n", "\\DeclareMathOperator{\\aalpha}{\\boldsymbol{\\alpha}}\n", "\\DeclareMathOperator{\\bbeta}{\\boldsymbol{\\beta}}\n", "\\DeclareMathOperator{\\NN}{\\mathbf{N}}\n", "\\DeclareMathOperator{\\ppi}{\\boldsymbol{\\pi}}\n", "\\DeclareMathOperator{\\mmu}{\\boldsymbol{\\mu}}\n", "\\DeclareMathOperator{\\SSigma}{\\boldsymbol{\\Sigma}}\n", "\\DeclareMathOperator{\\llambda}{\\boldsymbol{\\lambda}}\n", "\\DeclareMathOperator{\\diff}{\\mathop{}\\!\\mathrm{d}}\n", "\\DeclareMathOperator{\\zz}{\\mathbf{z}}\n", "\\DeclareMathOperator{\\ZZ}{\\mathbf{Z}}\n", "\\DeclareMathOperator{\\XX}{\\mathbf{X}}\n", "\\DeclareMathOperator{\\xx}{\\mathbf{x}}\n", "\\DeclareMathOperator{\\YY}{\\mathbf{Y}}\n", "\\DeclareMathOperator{\\NormalGamma}{\\mathcal{NG}}\n", "\\DeclareMathOperator{\\Tr}{Tr}\n", "$$" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Run this code! But there is no need to pay much attention to this cell at the first pass through the notebook\n", "\n", "%matplotlib inline \n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as sps\n", "\n", "\n", "def rand_gauss(n, mu, cov):\n", " \"\"\"\n", " Sample n data points from 2D Gaussian distribution with mean mu and covariance cov\n", " \"\"\"\n", " return np.atleast_2d(sps.multivariate_normal.rvs(mu, cov, n))\n", "\n", "def logpdf_gauss(x, mu, cov):\n", " \"\"\"\n", " Evaluation of the log probability density function for Gaussian with mean mu and covariance cov\n", " \"\"\"\n", " return sps.multivariate_normal.logpdf(x, mu, cov)\n", "\n", "def plot2dfun(f, limits, resolution, ax=None):\n", " \"\"\"\n", " Greyscale plot of 2D Multivariate distributions.\n", " \n", " \"\"\"\n", " if ax is None:\n", " ax = plt\n", " xmin, xmax, ymin, ymax = limits\n", " xlim = np.arange(ymin, ymax, (ymax - ymin) / float(resolution))\n", " ylim = np.arange(xmin, xmax, (xmax - xmin) / float(resolution))\n", " a, b = np.meshgrid(ylim, xlim)\n", " img = f(np.vstack([np.ravel(a), np.ravel(b)[::-1]]).T)\n", " img = (img - img.min()) /(img.max() - img.min()) # normalize to range 0.0 - 1.0\n", " img = img.reshape(a.shape+img.shape[1:])\n", " return ax.imshow(img, cmap='gray', aspect='auto', extent=(xmin, xmax, ymin, ymax))\n", "\n", " \n", "def gellipse(mu, cov, n=100, *args, **kwargs):\n", " \"\"\"\n", " Contour plot of 2D Multivariate Gaussian distribution.\n", "\n", " gellipse(mu, cov, n) plots ellipse given by mean vector MU and\n", " covariance matrix COV. Ellipse is plotted using N (default is 100)\n", " points. Additional parameters can specify various line types and\n", " properties. See description of matplotlib.pyplot.plot for more details.\n", " \"\"\"\n", " if mu.shape != (2,) or cov.shape != (2, 2):\n", " raise RuntimeError('mu must be a two element vector and cov must be 2 x 2 matrix')\n", "\n", " d, v = np.linalg.eigh(4 * cov)\n", " d = np.diag(d)\n", " t = np.linspace(0, 2 * np.pi, n)\n", " x = v @ np.sign(d) @ np.sqrt(np.abs(d)) @ np.array([np.cos(t), np.sin(t)]) + mu[:,np.newaxis]\n", " return plt.plot(x[0], x[1], *args, **kwargs)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PLDA generative process\n", "\n", "A PLDA model is often used to model speaker embeddings in the speaker verification context.\n", "Such embeddings are obtained by means of a neural network (i.e. ResNet, TDNN, etc.) which is trained for speaker classification.\n", "The neural network transforms variable-length input speech utterances into some fixed-length low-dimensional (i.e. 512, 1024) vector representations (e.g. the embeddings are the output of a hidden layer of the neural network).\n", "\n", "The PLDA model assumes the following generative process for the embeddings (our observations):\n", "\n", "\n", "\n", "\\begin{align}\n", "{\\mathbf{z}_s} &\\sim \\mathcal{N}(\\mathbf{z}_s;\\boldsymbol{\\mu},\\boldsymbol{\\Sigma}_{ac})&& \\text{for } s=1, \\dots, S\\\\\n", "{\\mathbf{x}_{sn}} &\\sim \\mathcal{N}(\\mathbf{x}_{sn};\\mathbf{z}_{s},\\boldsymbol{\\Sigma}_{wc})&& \\text{for } n=1, \\dots, N_s\\\\\n", "\\end{align}\n", "\n", "\n", "\n", "where $\\mathbf{z}$ is the continuous latent random variable related to the distribution of speaker means, $\\boldsymbol{\\mu}$ is the global speaker mean, $\\boldsymbol{\\Sigma}_{ac}$ is the across-class (across-speaker) covariance matrix, $\\mathbf{x}$ is the continuous random variable related to the distribution of per-speaker observations (per-speaker embeddings), $\\mathbf{z}_s$ is the speaker-specific mean for speaker $s$, and $\\boldsymbol{\\Sigma}_{wc}$ is the within-class (within-speaker) covariance matrix, which is shared among (the same for) all speakers.\n", "\n", "\n", "Therefore, we assume that $S$ speaker means were generated from a Gaussian distribution $\\mathcal{N}(\\mathbf{z}_s;\\boldsymbol{\\mu},\\boldsymbol{\\Sigma}_{ac})$, and then $N_s$ embeddings were generated for each of such speakers from the Gaussian distribution $\\mathcal{N}(\\mathbf{x}_{sn};\\mathbf{z}_{s},\\boldsymbol{\\Sigma}_{wc})$. This process can also be visulized in the Bayesian Network shown below.\n", "\n", "Obviously, this assumption is something we make up when defining our model, as the embeddings were generated by the neural network, and not by such PLDA model." ] }, { "attachments": { "PLDA_BN_2.png": { "image/png": "" } }, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Joint probability:\n", "\n", "Given the definition of the PLDA model, we can now write the joint probability of all observed variables $\\XX$ and latent variables $\\ZZ$, where it should be straighforward to see that the joint probability factorizes per speaker (see the Bayesian network).\n", "\n", "Let $\\XX=[\\XX_1,\\XX_2,...,\\XX_S]$, where $\\XX_s = [\\xx_{s1}, \\xx_{s2}, \\dots, \\xx_{sN_s} ]$contain the set of training observations of speaker $s$. Similarly, $\\ZZ=[\\zz_1,\\zz_2,...,\\zz_S]$.\n", "\n", "Then, the joint probability is: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$P(\\XX, \\ZZ) \n", "= \\prod_{s=1}^S p(\\XX_s,\\zz_s)\n", "= \\prod_{s=1}^S \\left( p(\\zz_s) \\prod_{n=1}^{N_s} p(\\xx_{sn}|\\zz_s) \\right)$$\n", "\n", "$$\\ln P(\\XX, \\ZZ) \n", "= \\sum_{s=1}^S \\ln p(\\XX_s,\\zz_s)\n", "= \\sum_{s=1}^S \\ln p(\\zz_s) + \\sum_{n=1}^{N_s} \\ln p(\\xx_{sn}|\\zz_s)$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Handcrafting the PLDA model\n", "\n", "In order to generate some artificial training data, we will handcraft a *ground truth* PLDA model.\n", "We will handcraft the global ground truth speaker mean $\\boldsymbol\\mu^{GT}$, and the covariance matrices $\\boldsymbol{\\Sigma}_{ac}^{GT}$ and $\\boldsymbol{\\Sigma}_{wc}^{GT}$. \n", "We will generate our training data using this PLDA model, and we hope to learn it back (or some close to it) during the PLDA model training. \n", "In order to be able to draw, visualize and interpret our models, we consider only a toy example with $D=2$ dimensional data.\n", "\n", "The cell below handcrafts the PLDA model and plots its parameters. In the plot, the dot corresponds to the global mean, the blue elipse is the coutour plot of $\\boldsymbol{\\Sigma}_{ac}^{GT}$ and the red elipse is the countour plot of $\\boldsymbol{\\Sigma}_{wc}^{GT}$.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "(-5.95685675343921, 7.954118920044291, -2.2998612816384827, 4.299970466102776)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Do not edit this code!\n", "mu_gt = np.array([1, 2]) #ground truth global mean\n", "Sigma_wc_gt = np.array([[1, 0.8], #ground truth within class covariance\n", " [0.8, 1]])\n", "\n", "Sigma_ac_gt = np.array([[10, -2], #ground truth across class covariance\n", " [-2, 1]])\n", "\n", "plt.plot(mu_gt[0], mu_gt[1], '.', ms=10) #plotting the PLDA model\n", "gellipse(mu_gt, Sigma_ac_gt, 100, 'b', lw=2)\n", "gellipse(np.array([0,0]), Sigma_wc_gt, 100, 'r', lw=2) #for mere visualization purposes, we center the within-class covariance on the origin (0,0)\n", "plt.axis('equal')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sampling training data\n", "Once we have the global mean and the matrices $\\boldsymbol\\Sigma_{ac}^{GT}$ and $\\boldsymbol\\Sigma_{wc}^{GT}$, we can sample speakers and their corresponding embeddings.\n", "We sample $S=10$ speaker means and then their corresponding embeddings, where we sample a different number of embeddings per speaker.\n", "\n", "Besides sampling the data, the code below plots the countour of $\\boldsymbol\\Sigma_{ac}^{GT}$, the sampled speaker means, the countour of the per-speaker $\\boldsymbol\\Sigma_{wc}^{GT}$ and the per-speaker sampled embeddings." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Do not edit this code!\n", "S = 10\n", "#N = (np.arange(S)+1)**2 # (somewhat cryptic way of) assigning different N_s to each speaker, ranging from 1 to 10^2 \n", "N = np.random.randint(1, 3, S) \n", "Z = rand_gauss(S, mu_gt, Sigma_ac_gt) # training speaker xvector means\n", "X = [] #Collection of all the X_s\n", "\n", "# For each speaker\n", "for ns, z in zip(N, Z):\n", " X_s = rand_gauss(ns, z, Sigma_wc_gt)\n", " X.append(X_s)\n", " p = plt.plot(X_s[:,0], X_s[:,1], '.', ms=2)\n", " c = p[0].get_color()\n", " plt.plot(z[0], z[1], '.', c=c, ms=10)\n", " gellipse(z, Sigma_wc_gt, 100, c=c)\n", "gellipse(mu_gt, Sigma_ac_gt, 100, 'b', lw=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple maximum-likelihood estimate of parameters\n", "\n", "We can estimate the parameters of the PLDA model using the following formulas (which are the same as the ones used in the linear distriminant analysis or linear Gaussian classifier from SUR classes).\n", "\n", "$N = \\sum_{s=1}^S N_s$\n", "\n", "$\\mmu = \\frac{1}{N} \\sum_{s=1}^S \\sum_{n=1}^{N_s} \\xx_{sn}$\n", "\n", "$\\mmu_s = \\frac{1}{N_s} \\sum_{n=1}^{N_s} \\xx_{sn}$\n", "\n", "$\\SSigma_{ac} = \\frac{1}{N} \\sum_{s=1}^S N_s \\left(\\mmu_s-\\mmu\\right)\\left(\\mmu_s-\\mmu\\right)^T$\n", "\n", "$\\SSigma_{wc} = \\frac{1}{N} \\sum_{s=1}^S N_s \\left(\\frac{1}{N_s} \\sum_{n=1}^{N_s} \\left(\\xx_{sn}-\\mmu_s\\right)\\left(\\xx_{sn}-\\mmu_s\\right)^T\\right)$\n", "\n", "### Task 1 \n", "\n", "Implement such updates for the PLDA parameters and plot the obtained result, together with the ground truth PLDA model (plot the ground truth model for the sampled speakers with dotted lines for a better visualization)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "#Your implementation of the simple ML estimate of the parameters goes here\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PLDA Expectation Maximization training\n", "\n", "With EM, we can get better estimate of the parameters than the *naive* simple maximum-likelihood estimates.\n", "\n", "Your first task here will be to derive some of the math related to the expectation-maximization updates for PLDA model training.\n", "Below we provide the framework for such derivations.\n", "\n", "\n", "## Summary of the EM algorithm\n", "\n", "The EM algorithm makes use of the following formula to find the parameters that maximize the likelihood of the data:\n", "\n", "\n", "\n", "\n", "$\\ln p(\\mathbf{\\XX}|\\boldsymbol{\\eta}) = \n", "\\underbrace{\\sum_{\\ZZ}q(\\ZZ) \\ln p(\\XX,\\ZZ)|\\boldsymbol{\\eta})}_{\\mathcal{Q}(q(\\ZZ),\\eta)}\n", "\\underbrace{-\\sum_{\\ZZ}q(\\ZZ) \\ln q(\\ZZ)}_{H(q(\\ZZ))}\n", "\\underbrace{-\\sum_{\\ZZ} q(\\ZZ) \\ln \\frac{p(\\ZZ | \\XX,\\boldsymbol{\\eta})}{q(\\ZZ)}}_{D_{KL}(q(\\ZZ)||p(\\ZZ|\\XX,\\eta)}$\n", "\n", "The steps for the EM algorithm are:\n", "1. Initialize parameters of the model (e.g. randomly or to constant values).\n", "2. E-step, set $q(\\ZZ):=p(\\ZZ|\\XX,\\eta)$, to make the kullback-Lieber divergence 0\n", "3. M-step, having fixed $q(\\ZZ)$, optimize the parameters of the PLDA model to maximize the auxiliary function $\\mathcal{Q}$ (and maximize therefore the likelihood of the data)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## E-step\n", "In the E-step, we need to set $q(\\ZZ):=p(\\ZZ|\\XX,\\eta)$. \n", "By looking at the Bayesian network we can see that the posterior distribution of the latent variable factorizes as $p(\\ZZ|\\XX,\\eta)= \\prod_s p(\\mathbf{z}_s|\\mathbf{X}_s,\\eta)$. Therefore $q(\\ZZ)=\\prod_s q(\\zz_s)$ where we set $ q(\\zz_s):= p(\\mathbf{z}_s|\\mathbf{X}_s,\\eta)$, which can be calculated as:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "$$ p(\\zz_s | \\XX_s, \\eta) \n", "= \\mathcal{N}(\\zz_s;\\mmu_s,\\SSigma_s)$$ \n", "\n", "$$ \\mmu_s = \\SSigma_s \\left(\\SSigma_{ac}^{-1}\\mmu + \\SSigma_{wc}^{-1} \\sum_{n=1}^{N_s}\\xx_{sn}\\right) \\hspace{2cm} \\SSigma_s = \\left(\\SSigma_{ac}^{-1} + N_s \\SSigma_{wc}^{-1} \\right)^{-1}\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 2: \n", "Complete the derivations of the E-step to obtain such update, start from the expression given below (but check the tips given after it)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "\\begin{align}\n", "\\ln p(\\zz_s | \\XX_s, \\eta) \n", "&= \\ln p(\\XX_s,\\zz_s) + const. \\\\\n", "&= \\ln \\left[ p(\\zz_s) \\prod_{n=1}^{N_s} p(\\xx_{sn}|\\zz_s) \\right] + const. \\\\\n", "&... \\color{red}{write\\ your\\ derivations\\ here}\\\\\n", "\\end{align}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Tip: To complete the derivation, it might be useful to understand what is the \"completion of squares\" method:\n", "\n", "\n", "For any Gaussian distribution $\\mathcal{N}(\\mathbf{y};\\mmu_o,\\SSigma_o)$, the following holds: \n", "\n", "$$\\ln \\mathcal{N}(\\mathbf{y};\\mmu_o,\\SSigma_o) = -\\frac{D}{2} \\ln (2\\pi)-\\frac{1}{2} \\ln|\\SSigma_o|-\\frac{1}{2} (\\mathbf{y}-\\mmu_o)^T\\SSigma_o^{-1}(\\mathbf{y}-\\mmu_o) = -\\frac{1}{2} \\mathbf{y}^T \\SSigma_o^{-1} \\mathbf{y} + \\mathbf{y}^T \\SSigma_o^{-1}\\mmu_o + const.$$\n", "\n", "where $const$ is a constant encompassing all terms independent of $\\mathbf{y}$.\n", " \n", "Therefore, if you obtain an expression in the form:\n", "\n", "$$-\\frac{1}{2} \\mathbf{y}^T A \\mathbf{y} + \\mathbf{y}^T B$$\n", "\n", "and you know that it corresponds to a valid probability distribution (up to the missing constant term) then it corresponds to the log of a (unnormalized) Gaussian distribution where $A$ and $B$ will be the terms $A=\\SSigma_o^{-1}$ and $B=\\SSigma_o^{-1}\\mmu_o$. \n", "That is, it corresponds to $\\ln \\mathcal{N}(\\mathbf{y};A^{-1}B, A^{-1})$.\n", "
\n", " \n", "\n", " \n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## M-step\n", "In the M-step, we keep $q(\\mathbf{z}_s)$ fixed and we optimize the parameters of the model to maximize the auxliliary function.\n", "\n", "### Task 3: \n", "Complete the derivation for the M-step.\n", "**Explain** the different steps taken.\n", "Start from the expression below, where the first steps are given, as well as the final solution:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\n", "\\small\n", "\\begin{align}\n", "\\mathcal{Q} \n", "&= \\int q(\\ZZ) \\ln p(\\XX,\\ZZ) \\diff \\ZZ \\\\\n", "&= \\int \\dots \\int \\left(\\prod_{s=1}^S q(\\zz_s)\\right) \\sum_{s=1}^S \\left(\\ln p(\\zz_s) + \\sum_{n=1}^{N_s} \\ln p(\\xx_{sn}|\\zz_s)\\right) \\diff\\zz_1 \\dots \\diff \\zz_S \\\\\n", "\\end{align}\n", "$\n", "\n", "Given the factorization over components (see slide 28 from [EM algorithm](https://www.fit.vutbr.cz/study/courses/BAYa/public/slides/3-EM%20algorithm.pdf)):\n", "\n", "$\n", "\\small\n", "\\begin{align}\n", "\\mathcal{Q}&= \\sum_{s=1}^S \\int q(\\zz_s) \\left(\\ln p(\\zz_s) + \\sum_{n=1}^{N_s} \\ln p(\\xx_{sn}|\\zz_s)\\right) \\diff\\zz_s\\\\\n", "& ... \\color{red}{write\\ your\\ derivations\\ here\\ to\\ obtain:}\\\\\n", "\\end{align}\n", "$\n", "\n", "$\n", "\\scriptsize\n", "\\begin{align}\n", "&\\mathcal{Q} = -\\frac{1}{2} \\sum_{s=1}^S \\left(\n", "-\\ln |\\SSigma_{ac}^{-1}| + \\Tr\\left(\\SSigma_s \\SSigma_{ac}^{-1}\\right) +\\left(\\mmu_s-\\mmu\\right)^T\\SSigma_{ac}^{-1}\\left(\\mmu_s-\\mmu\\right) \n", "+ \\sum_{n=1}^{N_s}\\left(-\\ln |\\SSigma_{wc}^{-1}| + \\Tr\\left(\\SSigma_s\\SSigma_{wc}^{-1}\\right) +\\left(\\xx_{sn}-\\mmu_s\\right)^T\\SSigma_{wc}^{-1}\\left(\\xx_{sn}-\\mmu_s\\right)\\right)\\right) + const. \\\\\n", "\\end{align}\n", "$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
Tip: \n", "Given a probability density function $q(\\mathbf{y})$, we define the expected values as:\n", "\n", "$$ \\E[\\mathbf{y}] = \\int q(\\mathbf{y}) \\mathbf{y} d\\mathbf{y} $$\n", "\n", "$$ \\E[f(\\mathbf{y})] = \\int q(\\mathbf{y}) f(\\mathbf{y}) d\\mathbf{y} $$\n", "\n", "The expected values have (among others) the following properties:\n", " \n", "$\\E[X+Y]=\\E[X]+\\E[Y]$\n", " \n", "$\\E[aX]=a\\E[X]$\n", " \n", "For a Gaussian distribution $q(\\mathbf{y})=\\mathcal{N}(\\mathbf{y};\\mmu_o,\\SSigma_o)$, it holds:\n", "\n", "(1) $\\E[\\mathbf{y}] = \\mmu_o$\n", "\n", "(2) $\\E[\\mathbf{y}\\mathbf{y}^T]=\\SSigma_o+\\mmu_o\\mmu_o^T$\n", "\n", "(3) $\\E[\\mathbf{y}^TA\\mathbf{y}]=Tr(A\\SSigma_o)+\\mmu_o^TA\\mmu_o$\n", "\n", "where the operator $\\Tr$ refers to $trace$, the sum of elements on the main diagonal of a matrix, which has the following properties:\n", "\n", "$\\Tr(A+B)=\\Tr(A)+\\Tr(B)$\n", "\n", "$\\Tr(ABC)=\\Tr(CAB)=\\Tr(BCA)$\n", "\n", "In the derivations, you can make use of the results and properties defined above. If used, reference them in the explanations of the derivation.\n", " \n", "Most of these $tricks$ and many others can be found in [The matrix cookbook]( https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf). If you use this book for any step of the derivations, reference the corresponding formula in the text. \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 4\n", "Now, using the expression of the auxiliary function $\\mathcal{Q}$ provided above, derive the updates of $\\mmu$ and $\\SSigma_{ac}$ and $\\SSigma_{wc}$.\n", "Again, we provide the solution for these updates and you need to take the derivative of $\\mathcal{Q}$ and set it equal to 0 to get to such solutions.\n", "\n", "Recall, that even if you failed to complete the previous derivations you can proceed with this one." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "$\n", "\\begin{align}\n", "&\\frac{\\partial \\mathcal{Q}}{\\partial \\mmu}\n", "= \\color{red}{write\\ your\\ derivations\\ here}\\\\\n", "&\\implies \\mmu := \\frac{1}{S} \\sum_{s=1}^S \\mmu_s\n", "\\end{align}\n", "$\n", "\n", "$\n", "\\begin{align}\n", "&\\frac{\\partial \\mathcal{Q}}{\\partial \\SSigma_{ac}^{-1}}\n", "=\\color{red}{write\\ your\\ derivations\\ here}\\\\\n", "&\\implies \\SSigma_{ac} := \\frac{1}{S} \\sum_{s=1}^S \\SSigma_s + \\frac{1}{S} \\sum_{s=1}^S \\left(\\mmu_s-\\mmu\\right)\\left(\\mmu_s-\\mmu\\right)^T\n", "\\end{align}\n", "$\n", "\n", "$\n", "\\begin{align}\n", "&\\frac{\\partial \\mathcal{Q}}{\\partial \\SSigma_{wc}^{-1}}\n", "= \\color{red}{write\\ your\\ derivations\\ here}\\\\\n", "&\\implies \\SSigma_{wc} := \\frac{1}{\\sum_{s=1}^S N_s} \\sum_{s=1}^S N_s \\left(\\SSigma_s + \\frac{1}{N_s} \\sum_{n=1}^{N_s} \\left(\\xx_{sn}-\\mmu_s\\right)\\left(\\xx_{sn}-\\mmu_s\\right)^T\\right)\n", "\\end{align}\n", "$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Tip: \n", "Note, that to obtain the updates of the covance matrices we are suggesting to take the derivative the auxiliary function $\\mathcal{Q}$ with respect to the inverse of the covariance matrices. This results in somewhat simpler derivation of the updates. \n", "But (if you want to show off), you can start with the derivative with respect to the (non-inverse) covariance matrices, which leads to the same result.\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 5\n", "Using the updates for the E and M step provided above, implement the EM algorithm for PLDA model training.\n", "\n", "The cell below provides some initialization for the PLDA parameters.\n", "Run the algorithm for 10 iterations and plot the obtained result, together with the ground truth PLDA model (plot the ground truth model for the sampled speakers with dotted lines for an better visualization).\n", "\n", "\n", "****Note that we might modify this task so that the algorithm fits some specific form or function*" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Make use of the following variables\n", "mu = np.array([0.0, 0.0]) #intial global mean\n", "Sigma_ac = np.array([[1.0, 0.0], #initial across-class covariance matrix\n", " [0.0, 1.0]])\n", "Sigma_wc = np.array([[1.0, 0.0], #initial within-class covariance matrix\n", " [0.0, 1.0]])\n", "\n", "\n", "#Your implementation of the EM algorithm goes here\n", "\n", "#E-step\n", "\n", "#M-step\n", "\n", "\n", "\n", "\n", "\n", "\n", "#Plotting instructions go here, reuse the function gellipse defined above\n", "\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 2 }