Functional Decomposition as an Optimization Technique ∗Edin H. MulalićFaculty <strong>of</strong> Electronic EngineeringUniversity <strong>of</strong> Niš, Serbiaedinmulalic@yahoo.comMiomir S. StankovićFaculty <strong>of</strong> Ocupational SafetyUniversity <strong>of</strong> Niš, Serbiamiomir.stankovic@gmail.com1 IntroductionA basic operation such as calculating <strong>the</strong> value <strong>of</strong> a function is in <strong>the</strong> heart <strong>of</strong> any problem solving process. In specializedsystems where <strong>the</strong> speed <strong>of</strong> calculation is <strong>of</strong> great importance, various optimization techniques are applied. There is nogeneral recipe for successful optimization, it usually requires problem dependant heuristic. One can express a functionto be calculated in different ways, use various decomposition techniques, use hardware implementation (or a combination<strong>of</strong> s<strong>of</strong>tware and hardware implementation), improve speed <strong>of</strong> data structures used in <strong>the</strong> algorithm, use precomputedvalues, use approximative solutions etc. In this paper, we will explore <strong>the</strong> possibility <strong>of</strong> taking advantage <strong>of</strong> functionaldecomposition combined with using underlying probability distribution <strong>of</strong> input variables for determining which valuesshould be used for pre computation.2 PreliminariesLet’s suppose that we have given a finite commutative ring (R, +, ∗), where R = {r 1 , r 2 , ..., r K }, K ∈ N. We want toevaluate function f : R N → R, N ∈ N (f(x) = f(x 1 , x 2 , ..., x N ) where x i ∈ R for 1iN.) Computing <strong>the</strong> function valuefor a specific input requires time T c (f). If we have a memory <strong>of</strong> limited size M, we would be able to pre compute andstore function value for up to M values <strong>of</strong> input parameter combinations (input vectors). Assuming that reading a valuefrom <strong>the</strong> memory requires time T M and that T M is significantly less than T c (f), with this approach we can cut down<strong>the</strong> average time <strong>of</strong> evaluating <strong>the</strong> function f. Note that <strong>the</strong> term memory in this context can denote a physical memoryor a convenient data structure. Let’s assume that <strong>the</strong>re is an underlying probability distribution <strong>of</strong> input variables, sothat probability <strong>of</strong> x i = r j is denoted as p ij , where ∑ k=Kk=1 p ik = 1, 1iN. We will also assume that input parameters haveindependent distributions. The information about <strong>the</strong> distribution can be used to find M most probable combinations <strong>of</strong>input parameters and use <strong>the</strong>m for precomputed and stored values. Obviously, that will minimize <strong>the</strong> expected time <strong>of</strong>evaluating <strong>the</strong> function which is given by formula:E f [T ] = T c (f) − P (X M )(T c (f) − T M ) (1)where X M is set <strong>of</strong> all input vectors used for pre computation and P (X M ) is probability that an input vector is used forpre computation.Of course, this is not <strong>the</strong> only way <strong>of</strong> using <strong>the</strong> memory resource. Depending on <strong>the</strong> usage <strong>of</strong> <strong>the</strong> system, this might besatisfying solution. But <strong>the</strong>re are some issues in this approach. First, is it possible to use memory resource in a differentway to reduce average evaluation time even more? And second, how can we affect more than M input vectors? One wayto approach to <strong>the</strong>se two problems is functional decomposition. A decomposition ∆(f) <strong>of</strong> a function f is set <strong>of</strong> functions∆(f) = {F, f 1 , f 2 ..., f D }, such thatf(x) = F (f 1 (x 1 ), f 2 (x 2 ), ..., f D (x D )) (2)where x i (1iD) are vectors formed as sub vectors <strong>of</strong> <strong>the</strong> initial vector x.decompositions <strong>of</strong> formIn this paper, from now on, we are consideringf(x) = f 1 (x 1 ) ⊕ 1 f 2 (x 2 ) ⊕ 2 ... ⊕ D−1 f D (x D ) (3)where ⊕ i ∈ {+, ∗}, 1iD − 1. Basically, that means that we are not examining hierarchical decompositions <strong>of</strong> form g(h(y)).∗ Supported by Ministry <strong>of</strong> Sci. & Techn. Rep. Serbia, <strong>the</strong> project III 04400626
The natural question here is how to find decomposition which minimizes <strong>the</strong> average time <strong>of</strong> calculation for function f.But before that, we have to find a way to efficiently find <strong>the</strong> average time for one particular decomposition ∆(f). Themain issue is how to divide <strong>the</strong> available memory resource among functions f i in <strong>the</strong> optimal way. The next section givesanswer to that question.3 Optimal resource distributionThe average time for evaluating <strong>the</strong> function f is given by formulaWe are looking to maximise ω(D), whereE f,∆ [T ] = T c (∆(f)) −ω(q) =q∑j=1D∑j=1P (X (j,mj)M)(T c (f j (x j )) − T M ) (4)P (X (j,mj)M)(T c (f j (x j )) − T M ) (5)Each function f j from a decomposition ∆(f) can count on m j memory locations and ∑ Dj=1 m jM, 0m j M. Since P (X (j,mj)M)depends on m j , finding minimal average time E f,∆ [T ] requires optimal configuration [m 1 , m 2 , ..., m D ]. One way to solvethis is by brute force, but <strong>the</strong> problem with this approach is exponential complexity. Here, we will propose <strong>the</strong> algorithmbased on dynamic programming which solves <strong>the</strong> problem in polynomial time.Step 1. For each 1jD calculate P ij = P (X (j,i)M) where m j = i, 0il j and l j = min{M, length <strong>of</strong> vector x j }. Once again,this can be done by brute force, but exponential complexity can be avoided by reducing this problem to finding M + 1best paths in a trellis. Therefore, <strong>the</strong> complexity <strong>of</strong> this procedure is O(DMK 2 N).Step 2. Let’s define Q ij = P ij (T c (f j (x j )) − T M ). Let’s define matrix Ω and its element Ω[i, j, k] as max{ω(j)} such thatm j = i and ∑ ja=1 m a = k. Now we can find max{ω(D)} by <strong>the</strong> following procedure with complexity O(M 3 D).Initialization. For 0il 1 , 0kM{Q i1 if i = kΩ[i, 1, k] =(6)0 o<strong>the</strong>rwise.Recursion. For 2jD, 0il j , ikMStopping. Let’s define M ′ as M ′ = min{M, ∑ Da=1 l a}. Then, max{ω(D)} = maxM ′Reconstruction. For D − 1j1ReferencesΩ[i, j, k] = Q ij + max0i ′ i Ω[i′ , j − 1, k − i] (7)Ψ[i, j, k] = argmaxΩ[i ′ , j − 1, k − i] (8)i ′ ,0i ′ iΩ[i, D, M ′ ], m D = maxΩ[i, D, M ′ ], k D =0il D i,0il Dm j = Ψ[m j+1 , j + 1, k j+1 ], k j = k j+1 − m j+1 (9)[1] L. Huang, D. Chiang, Better k-best Parsing, Proceedings <strong>of</strong> <strong>the</strong> 9th International Workshop on Parsing Technologies,2005.[2] D. Brown, D. Golod, Decoding HMMs Using <strong>the</strong> k Best Paths: Algorithms and Applications, APBC, 2010.[3] W. Powell, Approximate Dynamic Programming: Solving <strong>the</strong> Curses <strong>of</strong> Dimensionality, Wiley Series in Probabilityand Statistics, 2007.27