3Fig. 3. Three types **of** distance/divergence with x ij =1: squared Eucliddistance, generalized KL divergence and IS divergenceItakura-Saito (IS) divergenced IS (x ij , ˆx ij )= x ij− log x ij− 1 . (6)ˆx ij ˆx ijFigure 3 shows examples **of** these distance/divergences withx ij = 1. We observe that KL and IS divergences areless sensitive to over-approximation than under-approximation.And from (6), we observe that IS divergence depends onlyon the ratio x ij /ˆx ij . Thus, for instance, d IS (900, 1000) =d IS (9, 10). This property is favorable when analyzing mostaudio signals such as music and speech, where low frequencycomponents have much higher energy than high frequencycomponents. This is because low and high frequency componentsare treated equally with similar importance accordingto the property.B. Algorithm: Multiplicative Update RulesWe can minimize the distance/divergence according to (3)together with (4), (5), or (6) in the following manner. First,the elements **of** T and V are randomly initialized with nonnegativevalues. Then, the following update rules [17], [19]are iteratively applied until convergence.Squared Euclidean distance∑jt ik ← t x ∑ijv kjik ∑j ˆx , v kj ← v kj ∑ i x ijt ikijv kj i ˆx (7)ijt ikKL divergenceIS divergencet ik ← t ik∑t ik ← t ik√ √√√ ∑jx ijˆx ijv kj∑j v kjx ij v kjj ˆx ij ˆx ∑ijv kjj ˆx ij, v kj ← v kj∑, v kj ← v kj√ √√√ ∑ix ijˆx ijt ik∑i t ikx ij t ikˆx ij ˆx iji∑i(8)t ik(9)ˆx ijThese update rules are called multiplicative, since each elementis updated by multiplying a scalar value, which isguaranteed to be non-negative.C. Probability distributions related to distance/divergenceThere are relations between the three distance/divergences(4)-(6) and specific probability distributions [18], [20], namelya Gaussian distribution N , a complex Gaussian distribution N cand a Poisson distribution PO as shown below. Studying theserelationships helps us to consider multichannel extensions **of**NMF in the next section.Minimizing the distance/divergence D ∗ (X, {T, V}) isequivalent to maximizing the log-likelihood log p(X|T, V) orlog p( ˜X|T, V), where ˜X, [ ˜X] ij =˜x ij , is a matrix **of** STFTcoefficients.Squared Euclidean distanceI∏ J∏p(X|T, V) = N (x ij |ˆx ij , 1 2 ) ,KL divergencei=1 j=1N (x ij |ˆx ij , 1 2 ) ∝ exp ( −|x ij − ˆx ij | 2) . (10)p(X|T, V) =I∏i=1 j=1J∏PO(x ij |ˆx ij ) ,ˆx xijijPO(x ij |ˆx ij )=Γ(x ij +1) exp (−ˆx ij) . (11)where Γ(x) is the Gamma function.IS divergencep( ˜X|T,I∏ J∏V) = N c (˜x ij |0, ˆx ij ) ,i=1 j=1N c (˜x ij |0, ˆx ij ) ∝ 1ˆx ijexp(− |˜x ij| 2ˆx ij). (12)Regarding IS divergence, the likelihood p( ˜X|T, V) is calculatednot for the matrix X **of** preprocessed non-negative valuesbut for the matrix ˜X **of** complex-valued STFT coefficients, andit is thus necessary to specify x ij = |˜x ij | 2 in a preprocessingstep for the connection to (6).When x ij = ˆx ij , the distance/divergence (4), (5) or (6)becomes 0 and each ij-term **of** the log-likelihood definedabove is maximized. Therefore, the distance/divergence canbe derived as the difference between the log-likelihoods **of**x ij and ˆx ij . We show the IS divergence case as an example:d IS (x ij , ˆx ij )=logN c (˜x ij |0,x ij ) − log N c (˜x ij |0, ˆx ij )= − log x ij − x (ij− − log ˆx ij − x )ijx ij ˆx ij= x ijˆx ij− log x ijˆx ij− 1 . (13)III. MULTICHANNEL EXTENSIONS OF NMFThis section presents our multichannel extensions **of** NMF.Figure 4 shows an overview **of** the multichannel extensions (inred), in contrast with standard single-channel NMF (in blue).We begin with the multichannel extension **of** IS divergence(6), since this extension is the most natural. We then extendEuclidean distance (4) to a multichannel case. Unfortunately,we have not found a multichannel counterpart for generalizedKL divergence (5).