13.07.2015 Views

accelerating the mixing phase in studio recording productions by ...

accelerating the mixing phase in studio recording productions by ...

accelerating the mixing phase in studio recording productions by ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.1 Initial AlignmentThe alignment problem is formulated as <strong>the</strong> track<strong>in</strong>g of an<strong>in</strong>put data stream along a reference, us<strong>in</strong>g motion equations.4.1.1 System State RepresentationThe system state is modeled as a two-dimensional randomvariable x = (s, t), represent<strong>in</strong>g <strong>the</strong> current position <strong>in</strong> <strong>the</strong>reference audio and tempo respectively; s is measured <strong>in</strong>seconds and t is <strong>the</strong> speed ratio of <strong>the</strong> performances. The<strong>in</strong>com<strong>in</strong>g signal process<strong>in</strong>g frontend is based on spectralfeatures extracted from <strong>the</strong> FFT analysis of an overlapp<strong>in</strong>g,w<strong>in</strong>dowed signal representation, with hop size ∆T . In orderto use sequential Montecarlo methods to estimate <strong>the</strong> hiddenvariable x k = (s k , t k ) us<strong>in</strong>g observation z k at time frame k,we assume that <strong>the</strong> state evolution is Markovian.4.1.2 Observation Model<strong>in</strong>gLet p(z k |x k ) denote <strong>the</strong> likelihood of observ<strong>in</strong>g an audioframe z k of <strong>the</strong> take given <strong>the</strong> current position along <strong>the</strong> referenceperformance s k . We consider a simple spectral similaritymeasure, def<strong>in</strong>ed as <strong>the</strong> Kullback-Leibler divergencebetween <strong>the</strong> power spectra at frame k of <strong>the</strong> take and at times k <strong>in</strong> <strong>the</strong> reference.4.1.3 System State Transition Model<strong>in</strong>gLet p(x k |x k−1 ) denote <strong>the</strong> pdf for <strong>the</strong> state transition; wemake use of tempo estimation <strong>in</strong> <strong>the</strong> previous frame, assum<strong>in</strong>gthat it does not change too quickly:[ ]skp(x k |x k−1 ) = N ( | µt k , Σ)k[ ]sk−1 + ∆T tµ k =k−1t k−1[ ]σ2Σ = s ∆T 00 σt 2 ∆TIntuitively, this corresponds to a performance where tempois ra<strong>the</strong>r steady but can fluctuate; <strong>the</strong> parameters σ 2 t and σ 2 scontrol respectively <strong>the</strong> variability of tempo and <strong>the</strong> possibilityof local mismatches that do not affect <strong>the</strong> overalltempo estimate.4.1.4 Inference AlgorithmSequential Montecarlo <strong>in</strong>ference methods work <strong>by</strong> recursivelyapproximat<strong>in</strong>g <strong>the</strong> current distribution of <strong>the</strong> systemstate us<strong>in</strong>g <strong>the</strong> technique of Sequential Importance Sampl<strong>in</strong>g:a random measure {x i k , wi k }Ns i=1is used to characterize<strong>the</strong> posterior pdf with a set of N s particles over <strong>the</strong>state doma<strong>in</strong> and associated weights, and is updated at eachtime step as <strong>in</strong> Algorithm 1. In particular, q(x k |x k−1 , z k ) is<strong>the</strong> particle sampl<strong>in</strong>g function. In our implementation thiscorresponds to <strong>the</strong> transition probability density function; <strong>in</strong>this case <strong>the</strong> algorithm is known as condensation algorithm.An optional resampl<strong>in</strong>g step is used to address <strong>the</strong> degeneracyproblem, common to particle filter<strong>in</strong>g approaches;this is discussed <strong>in</strong> detail <strong>in</strong> [1,5] and <strong>in</strong> <strong>the</strong> next paragraph.The decod<strong>in</strong>g of position and tempo is carried out <strong>by</strong>comput<strong>in</strong>g <strong>the</strong> expected value of <strong>the</strong> result<strong>in</strong>g random measure(which is efficiently computed as E[x k ] = ∑ N si=1 xi k wi k ).Algorithm 1: SIS Particle Filter - Update stepfor i = 1 . . . N s dosample x i k accord<strong>in</strong>g to q(xi k |xi k−1 , z k)w i k ←ŵk i ← wi p(z k |x i k )p(xi k |xi k−1 )k−1 q(x i k |xi k−1 ,z k)ŵi k ∑j ŵj k∀i = 1 . . . N sN eff ← ( ∑ N si=1 (wi k )2 ) −1if N eff < resampl<strong>in</strong>g threshold <strong>the</strong>nresample x 1 k . . . xNs kaccord<strong>in</strong>g to ddf wk 1 . . . wNs kwk i ← N s−1 ∀i = 1 . . . N s4.1.5 InitializationInitialization plays a central role <strong>in</strong> <strong>the</strong> performance of <strong>the</strong>algorithm; <strong>in</strong> a probabilistic context this corresponds to anappropriate choice of <strong>the</strong> prior distribution p(x 0 ).In a real-time setup <strong>the</strong> player is expected to start <strong>the</strong> performanceat a well known po<strong>in</strong>t of <strong>the</strong> reference; this fact isexploited <strong>in</strong> <strong>the</strong> design of <strong>the</strong> algorithm <strong>by</strong> sett<strong>in</strong>g an appropriatelyshaped prior distribution, typically a low-varianceone around <strong>the</strong> beg<strong>in</strong>n<strong>in</strong>g.In <strong>the</strong> proposed situation however <strong>the</strong> <strong>in</strong>itial po<strong>in</strong>t is notknown (it represents <strong>in</strong>deed <strong>the</strong> aim of our <strong>in</strong>terest). To copewith this, <strong>the</strong> prior distribution p(x 0 ) is set to be uniformover <strong>the</strong> whole duration L of <strong>the</strong> reference performance; <strong>the</strong>algorithm is expected to “converge” to <strong>the</strong> correct positionafter a few iterations. Figure 3 shows <strong>the</strong> evolution of <strong>the</strong>probability distribution for <strong>the</strong> position of <strong>the</strong> <strong>in</strong>put at differentmoments of <strong>the</strong> alignment.4.1.6 Degeneracy Issues w.r.t. Realtime AlignmentA relevant parameter of Algorithm 1 is <strong>the</strong> resampl<strong>in</strong>g threshold.The variable N eff , commonly known as effective samplesize, is used to estimate <strong>the</strong> degree of degeneracy whichaffects <strong>the</strong> random measure; degeneracy is related to <strong>the</strong>variance of <strong>the</strong> weights {wk i }Ns 1 , and it is proven to be always<strong>in</strong>creas<strong>in</strong>g <strong>in</strong> absence of resampl<strong>in</strong>g. In a degeneratesituation most particles have close-to-zero weight, result<strong>in</strong>g<strong>in</strong> most of <strong>the</strong> computation be<strong>in</strong>g spent <strong>in</strong> updat<strong>in</strong>g particleswhich are subject to numerical approximation errors.Resampl<strong>in</strong>g is <strong>in</strong>troduced to obviate this issue. Intuitively,resampl<strong>in</strong>g replaces a random measure of <strong>the</strong> true distributionwith an equivalent one (<strong>in</strong> <strong>the</strong> limit of N s → ∞) that isbetter suited for <strong>the</strong> <strong>in</strong>ference algorithm. S<strong>in</strong>ce resampl<strong>in</strong>g

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!