Views
1 year ago

arXiv:1702.00887v2

1702.00887

Published as a

Published as a conference paper at ICLR 2017 procedure INSIDEOUTSIDE(θ) α, β ← −∞ ⊲ Initialize log of inside (α), outside (β) tables for i = 1, . . . , n do α[i, i, L, 1] ← 0 α[i, i, R, 1] ← 0 β[1, n, R, 1] ← 0 for k = 1, . . . , n do ⊲ Inside step for s = 1, . . . , n − k do t ← s + k α[s, t, R, 0] ← ⊕ u∈[s,t−1] α[s, u, R, 1] ⊗ α[u + 1, t, L, 1] ⊗ θst α[s, t, L, 0] ← ⊕ u∈[s,t−1] α[s, u, R, 1] ⊗ α[u + 1, t, L, 1] ⊗ θts α[s, t, R, 1] ← ⊕ u∈[s+1,t] α[s, u, R, 0] ⊗ α[u, t, R, 1] α[s, t, L, 1] ← ⊕ u∈[s,t−1] α[s, u, L, 1] ⊗ α[u, t, L, 0] for k = n, . . . , 1 do ⊲ Outside step for s = 1, . . . , n − k do t ← s + k for u = s + 1, . . . , t do β[s, u, R, 0] ← ⊕ β[s, t, R, 1] ⊗ α[u, t, R, 1] β[u, t, R, 1] ← ⊕ β[s, t, R, 1] ⊗ α[s, u, R, 0] if s > 1 then for u = s, . . . , t − 1 do β[s, u, L, 1] ← ⊕ β[s, t, L, 1] ⊗ α[u, t, L, 0] β[u, t, L, 0] ← ⊕ β[s, t, L, 1] ⊗ α[s, u, L, 1] for u = s, . . . , t − 1 do β[s, u, R, 1] ← ⊕ β[s, t, R, 0] ⊗ α[u + 1, t, L, 1] ⊗ θ st β[u + 1, t, L, 1] ← ⊕ β[s, t, R, 0] ⊗ α[s, u, R, 1] ⊗ θ st if s > 1 then for u = s, . . . , t − 1 do β[s, u, R, 1] ← ⊕ β[s, t, L, 0] ⊗ α[u + 1, t, L, 1] ⊗ θ ts β[u + 1, t, L, 1] ← ⊕ β[s, t, L, 0] ⊗ α[s, u, R, 1] ⊗ θ ts A ← α[1, n, R, 1] ⊲ Log partition for s = 1, . . . , n − 1 do ⊲ Compute marginals. Note that p[s, t] = p(z st = 1 | x) for t = s + 1, . . . , n do p[s, t] ← exp(α[s, t, R, 0] ⊗ β[s, t, R, 0] ⊗ −A) if s > 1 then p[t, s] ← exp(α[s, t, L, 0] ⊗ β[s, t, L, 0] ⊗ −A) return p Figure 6: Forward step of the syntatic attention layer to compute the marginals, using the inside-outside algorithm (Baker, 1979) on the data structures of Eisner (1996). We assume the special root symbol is the first element of the sequence, and that the sentence length is n. Calculations are performed in log-space semifield with ⊕ = logadd and ⊗ = + for numerical precision. a, b ← c means a ← c and b ← c. a ← ⊕ b means a ← a ⊕ b. 20

Published as a conference paper at ICLR 2017 procedure BACKPROPINSIDEOUTSIDE(θ, p, ∇ L p ) for s, t = 1, . . . , n; s ≠ t do ⊲ Backpropagation uses the identity ∇ L θ = (p ⊙ ∇ L p )∇ log p θ δ[s, t] ← log p[s, t] ⊗ log ∇ L p [s, t] ⊲ δ = log(p ⊙ ∇ L p ) ∇ L α, ∇ L β , log ∇ L θ ← −∞ ⊲ Initialize inside (∇ L α), outside (∇ L β ) gradients, and log of ∇ L θ for s = 1, . . . , n − 1 do ⊲ Backpropagate δ to ∇ L α and ∇ L β for t = s + 1, . . . , n do ∇ L α[s, t, R, 0], ∇ L β [s, t, R, 0] ← δ[s, t] ∇ L α[1, n, R, 1] ← ⊕ −δ[s, t] if s > 1 then ∇ L α[s, t, L, 0], ∇ L β [s, t, L, 0] ← δ[t, s] ∇ L α[1, n, R, 1] ← ⊕ −δ[s, t] for k = 1, . . . , n do ⊲ Backpropagate through outside step for s = 1, . . . , n − k do t ← s + k ν ← ∇ L β [s, t, R, 0] ⊗ β[s, t, R, 0] ⊲ ν, γ are temporary values for u = t, . . . , n do ∇ L β [s, u, R, 1], ∇ L α[t, u, R, 1] ← ⊕ ν ⊗ β[s, u, R, 1] ⊗ α[t, u, R, 1] if s > 1 then ν ← ∇ L β [s, t, L, 0] ⊗ β[s, t, L, 0] for u = 1, . . . , s do ∇ L β [u, t, L, 1], ∇ L α[u, s, L, 1] ← ⊕ ν ⊗ β[u, t, L, 1] ⊗ α[u, s, L, 1] ν ← ∇ L β [s, t, L, 1] ⊗ β[s, t, L, 1] for u = t, . . . , n do ∇ L β [s, u, L, 1], ∇ L α[t, u, L, 0] ← ⊕ ν ⊗ β[s, u, L, 1] ⊗ α[t, u, L, 1] for u = 1, . . . , s − 1 do γ ← β[u, t, R, 0] ⊗ α[u, s − 1, R, 1] ⊗ θ ut ∇ L β [u, t, R, 0], ∇ L α[u, s − 1, R, 1], log ∇ L θ [u, t] ← ⊕ ν ⊗ γ γ ← β[u, t, L, 0] ⊗ α[u, s − 1, R, 1] ⊗ θ tu ∇ L β [u, t, L, 0], ∇ L α[u, s − 1, R, 1], log ∇ L θ [t, u] ← ⊕ ν ⊗ γ ν ← ∇ L β [s, t, R, 1] ⊗ β[s, t, R, 1] for u = 1, . . . , s do ∇ L β [u, t, R, 1], ∇ L α[u, s, R, 0] ← ⊕ ν ⊗ β[u, t, R, 1] ⊗ α[u, s, R, 0] for u = t + 1, . . . , n do γ ← β[s, u, R, 0] ⊗ α[t + 1, u, L, 1] ⊗ θ su ∇ L β [s, u, R, 0], ∇ L α[t + 1, u, L, 1], log ∇ L θ [s, u] ← ⊕ ν ⊗ γ γ ← β[s, u, L, 0] ⊗ α[t + 1, u, L, 1] ⊗ θ us ∇ L β [s, u, L, 0], ∇ L α[t + 1, u, L, 1], log ∇ L θ [u, s] ← ⊕ ν ⊗ γ for k = n, . . . , 1 do ⊲ Backpropagate through inside step for s = 1, . . . , n − k do t ← s + k ν ← ∇ L α[s, t, R, 1] ⊗ α[s, t, R, 1] for u = s + 1, . . . , t do ∇ L α[u, t, R, 0], ∇ L α[u, t, R, 1] ← ⊕ ν ⊗ α[s, u, R, 0] ⊗ α[u, t, R, 1] if s > 1 then ν ← ∇ L α[s, t, L, 1] ⊗ α[s, t, L, 1] for u = s, . . . , t − 1 do ∇ L α[s, u, L, 1], ∇ L α[u, t, L, 0] ← ⊕ ν ⊗ α[s, u, L, 1] ⊗ α[u, t, L, 0] ν ← ∇ L α[s, t, L, 0] ⊗ α[s, t, L, 0] for u = s, . . . , t − 1 do γ ← α[s, u, R, 1] ⊗ α[u + 1, t, L, 1] ⊗ θ ts ∇ L α[s, u, R, 1], ∇ L α[u + 1, t, L, 1], log ∇ L θ [t, s] ← ⊕ ν ⊗ γ ν ← ∇ L α[s, t, R, 0] ⊗ α[s, t, R, 0] for u = s, . . . , t − 1 do γ ← α[s, u, R, 1] ⊗ α[u + 1, t, L, 1] ⊗ θ st ∇ L α[s, u, R, 1], ∇ L α[u + 1, t, L, 1], log ∇ L θ [s, t] ← ⊕ ν ⊗ γ return signexp log ∇ L θ ⊲ Exponentiate log gradient, multiply by sign, and return ∇ L θ Figure 7: Backpropagation through the inside-outside algorithm to calculate the gradient with respect to the input potentials. ∇ a b denotes the Jacobian of a with respect to b (so ∇ L θ is the gradient with respect to θ). a, b ← ⊕ c means a ← a ⊕ c and b ← b ⊕ c. 21

DeepLearning-NowPublishing-Vol7-SIG-039
Proceedings - Institute of Software Technology - Graz University of ...
DataScienceWeekly-DataScientistInterviews-Vol1-April2014
NIPS 2016
CS 478 – Backpropagation 1 - Neural Networks and Machine ...
What can computational models tell us about face processing? What ...
Artificial Intelligence and Soft Computing: Behavioral ... - Arteimi.info
Algorithms and Data Structures - Searching - pjwstk
RTNS 2011 Proceedings - IRCCyN - Ecole Centrale de Nantes
Getting Started with SAS Enterprise Miner™ 7.1
New Aspects of AUTOMATIC CONTROL, MODELLING ... - Wseas.us
Designing Online and Offline Marketplaces Using Theory, Models ...
AI - a Guide to Intelligent Systems.pdf - Member of EEPIS
Evidence-Centered Assessment Design: Layers, Structures - PADI ...
The Future of Machine Intelligence
NIPS 2015