22.11.2014 Views

Lower bounds for sparse matrix vector multiplication on hypercubic ...

Lower bounds for sparse matrix vector multiplication on hypercubic ...

Lower bounds for sparse matrix vector multiplication on hypercubic ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Discrete Mathematics and Theoretical Computer Science 2, 1998, 35–47<br />

<str<strong>on</strong>g>Lower</str<strong>on</strong>g> <str<strong>on</strong>g>bounds</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g><br />

<str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> <strong>on</strong> <strong>hypercubic</strong> networks<br />

Giovanni Manzini<br />

Dipartimento Scienze e Tecnologie Avanzate, I-15100 Alessandria, Italy and Istituto di Matematica Computazi<strong>on</strong>ale,<br />

CNR, I-56126 Pisa, Italy.<br />

E-mail: manzini@mfn.al.unipmn.it<br />

In this paper we c<strong>on</strong>sider the problem of computing <strong>on</strong> a local memory machine the product ¢¡¤£¦¥ , where £ is<br />

a random §©¨§ <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> with § n<strong>on</strong>zero elements. To study the average case communicati<strong>on</strong> cost of this<br />

problem, we introduce four different probability measures <strong>on</strong> the set of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> matrices. We prove that <strong>on</strong> most local<br />

memory machines with processors, this computati<strong>on</strong> requires § time <strong>on</strong> the average. We prove that<br />

the same lower bound also holds, in the worst case, <str<strong>on</strong>g>for</str<strong>on</strong>g> matrices with <strong>on</strong>ly !§ or "#§ n<strong>on</strong>zero elements.<br />

Keywords: Sparse matrices, pseudo expanders, <strong>hypercubic</strong> networks, bisecti<strong>on</strong> width lower <str<strong>on</strong>g>bounds</str<strong>on</strong>g><br />

1 Introducti<strong>on</strong><br />

In this paper we c<strong>on</strong>sider the problem of $&%('*) computing , ' where is a +¢,-+ random <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g><br />

.0/+21 with n<strong>on</strong>zero elements. We prove that, <strong>on</strong> most local memory machines 3 with processors, this computati<strong>on</strong><br />

45/6/+27831:9=?31 requires time in the average case. We also prove that this computati<strong>on</strong> requires<br />

time in the worst case <str<strong>on</strong>g>for</str<strong>on</strong>g> matrices with <strong>on</strong>ly E>+ n<strong>on</strong>zero elements and, under additi<strong>on</strong>al<br />

45/@/A+27B319C;D=231<br />

hypotheses, also <str<strong>on</strong>g>for</str<strong>on</strong>g> matrices F>+ with n<strong>on</strong>zero elements. We prove these results by c<strong>on</strong>sidering <strong>on</strong>ly the<br />

communicati<strong>on</strong> cost, i.e. the time required <str<strong>on</strong>g>for</str<strong>on</strong>g> routing the comp<strong>on</strong>ents ) of to their final destinati<strong>on</strong>s.<br />

The average communicati<strong>on</strong> cost of $G%H'*) computing has also been studied in [10] starting from<br />

different assumpti<strong>on</strong>s. In [10] we restricted our analysis to the algorithms in which the comp<strong>on</strong>ents ) of<br />

$ and are partiti<strong>on</strong>ed am<strong>on</strong>g the processors according to a fixed scheme not depending <strong>on</strong> the ' <str<strong>on</strong>g>matrix</str<strong>on</strong>g> .<br />

However, it is natural to try to reduce the cost of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> by finding a ‘good’<br />

partiti<strong>on</strong> of the comp<strong>on</strong>ents ) of $ and . Such a partiti<strong>on</strong> can be based <strong>on</strong> the n<strong>on</strong>zero structure of the<br />

<str<strong>on</strong>g>matrix</str<strong>on</strong>g> '<br />

which is assumed to be known in advance. This approach is justified if <strong>on</strong>e has to compute<br />

many products involving the same <str<strong>on</strong>g>matrix</str<strong>on</strong>g>, or involving matrices with the same n<strong>on</strong>zero structure. The<br />

development of efficient partiti<strong>on</strong> schemes is a very active area of research; <str<strong>on</strong>g>for</str<strong>on</strong>g> some recent results see [3,<br />

4, 5, 14, 15].<br />

In this paper we prove lower <str<strong>on</strong>g>bounds</str<strong>on</strong>g> which also hold <str<strong>on</strong>g>for</str<strong>on</strong>g> algorithms which partiti<strong>on</strong> the comp<strong>on</strong>ents of<br />

) and $ <strong>on</strong> the basis of the n<strong>on</strong>zero structure of ' . First, we introduce four probability measures <strong>on</strong> the<br />

class of +I,J+ <str<strong>on</strong>g>sparse</str<strong>on</strong>g> matrices with .K/+21 n<strong>on</strong>zero elements. Then, we show that with high probability<br />

the cost of computing $L%M'*) <strong>on</strong> a 3 -processor hypercube is 45/@/+27831:9


+2s<br />

F<br />

Adj /{51<br />

<br />

<br />

x<br />

w<br />

<br />

<br />

<br />

36 Giovanni Manzini<br />

comp<strong>on</strong>ents of ) and $ am<strong>on</strong>g the processors. Since the hypercube can simulate with c<strong>on</strong>stant slowdown<br />

any .0/P31 -processor butterfly, CCC, mesh of trees and shuffle exchange network, the same <str<strong>on</strong>g>bounds</str<strong>on</strong>g> hold<br />

<str<strong>on</strong>g>for</str<strong>on</strong>g> these communicati<strong>on</strong> networks as well (see [7] <str<strong>on</strong>g>for</str<strong>on</strong>g> descripti<strong>on</strong>s and properties of the hypercube, and<br />

of the other networks menti<strong>on</strong>ed above).<br />

Our results are based <strong>on</strong> the expansi<strong>on</strong> properties of a bipartite graph associated with the <str<strong>on</strong>g>matrix</str<strong>on</strong>g> ' .<br />

Expansi<strong>on</strong> properties, as well as the related c<strong>on</strong>cept of separators, have played a major role in the study of<br />

the fill-in generated by Gaussian and Cholesky factorizati<strong>on</strong> of a random <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> (see [1, 8, 11, 12]),<br />

but to our knowledge, this is the first time they have been used <str<strong>on</strong>g>for</str<strong>on</strong>g> the analysis of parallel <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g><br />

<str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> algorithms.<br />

We establish lower <str<strong>on</strong>g>bounds</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> the hypercube assuming the following model of computati<strong>on</strong>. Processors<br />

have <strong>on</strong>ly local memory and communicate with <strong>on</strong>e another by sending packets of in<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong> through<br />

the hypercube links. At each step, each processor is allowed to per<str<strong>on</strong>g>for</str<strong>on</strong>g>m a bounded amount of local computati<strong>on</strong>,<br />

or to send <strong>on</strong>e packet of bounded length to an adjacent processor (we are there<str<strong>on</strong>g>for</str<strong>on</strong>g>e c<strong>on</strong>sidering<br />

the so-called weak hypercube).<br />

This paper is organized as follows. In secti<strong>on</strong> 2 we introduce four different probability measures <strong>on</strong> the<br />

set of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> matrices, and the noti<strong>on</strong> of the pseudo-expander graph. We also prove that the dependency<br />

graph associated with a random <str<strong>on</strong>g>matrix</str<strong>on</strong>g> is a pseudo-expander with high probability. In secti<strong>on</strong> 3 we<br />

study the average cost of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g>, and in secti<strong>on</strong> 4 we prove lower <str<strong>on</strong>g>bounds</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g><br />

matrices with E>+ and FD+ n<strong>on</strong>zero elements. Secti<strong>on</strong> 5 c<strong>on</strong>tains some c<strong>on</strong>cluding remarks.<br />

2 Preliminaries<br />

QR%(S#TVU!W!XXXWYTZ\[ Let , ]^%(S#_DU!WXX!X\Ẁ _ZV[ and . Given +a,K+ an ' <str<strong>on</strong>g>matrix</str<strong>on</strong>g> , we associate ' with a directed<br />

bipartite ba/c'51 graph , called the dependency graph. The vertex set ba/c'51 of Qed©] is , /Tf`WY_Ygh1 and is an<br />

edge bi/c'j1 of if and <strong>on</strong>ly kgBfml %Gn if . In other words, the vertices ba/c'51 of represent the comp<strong>on</strong>ents ) of<br />

$ and , and the /Tf@Ẁ _YgD1po-ba/q'j1 edge if and <strong>on</strong>ly if the _Yg value depends Tf up<strong>on</strong> .<br />

Definiti<strong>on</strong> 2.1 nartsur^v Let wyxzv , s?wyrzv and . We say that the bi/c'j1 graph is /As¦W6wpWY+21 an -pseudoexpander<br />

if <str<strong>on</strong>g>for</str<strong>on</strong>g> each {¤|}Q set the following property holds:<br />

D~<br />

{<br />

~(<br />

{<br />

<br />

Note that we must have sNwyr^v , since each set of s‚+ vertices can be c<strong>on</strong>nected to at most + vertices.<br />

The noti<strong>on</strong> of a pseudo-expander graph is similar to the well known noti<strong>on</strong> of an expander graph [2]. We<br />

(hence, all expanders<br />

recall that a graph /sW@wpWY+21 is an -expander {<br />

:~<br />

+2s if implies Adj /{j1 x¤w<br />

<br />

{<br />

are pseudo-expanders).<br />

The average case complexity of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> will be obtained combining two<br />

<br />

basic results: in this secti<strong>on</strong>, we prove that a random <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> is a pseudo-expander with probability<br />

very close to <strong>on</strong>e, in the next secti<strong>on</strong> we prove $e%ƒ') that computing 3 <strong>on</strong> a -processor hypercube<br />

requires time ba/q'j1 if is a pseudo-expander.<br />

45/6/+27831:9=23\1<br />

To study the average cost of a <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g>, we introduce four probability measures<br />

<strong>on</strong> the +y,„+ set of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> matrices. Each measure represents a different type of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g><br />

with n<strong>on</strong>zero elements. Although the n<strong>on</strong>zero elements can be arbitrary real numbers, we assume<br />

.K/…+21<br />

the behaviour of the <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> algorithms does not depend up<strong>on</strong> their numerical values. Hence, all<br />

measures are defined <strong>on</strong> the set of 0–1 v matrices, where represents a generic n<strong>on</strong>zero element.<br />

+2sR%2€


3. Let †D› be an integer n<br />

~<br />

<br />

%<br />

{<br />

<br />

U<br />

~<br />

®<br />

<br />

Ÿ<br />

+¨–©3<br />

‘ª ¦<br />

Ÿ<br />

Ÿ<br />

Ÿ<br />

+<br />

š:<br />

+<br />

Ÿ<br />

’<br />

+ ~e¡¢<br />

š©£<br />

v<br />

m¥ T<br />

+<br />

¦¨<br />

’<br />

–<br />

†<br />

<br />

~<br />

›<br />

Ÿ<br />

the probability measure such that S'Š[al Z<br />

ˆ‚‰<br />

Ÿ ’<br />

+<br />

†<br />

†<br />

¦<br />

£˜« +<br />

f<br />

5± ²‚±± ³´± +a<br />

~<br />

Ÿ<br />

†<br />

U<br />

± ²‚±± ³¹± +¨<br />

<br />

Z<br />

Z<br />

~<br />

š<br />

~<br />

Sparse <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> 37<br />

1. Let n<br />

+ such that † U + is an integer. We denote by ‡ U the probability measure such that<br />

~^†<br />

ˆ‚‰<br />

%zn <strong>on</strong>ly if the <str<strong>on</strong>g>matrix</str<strong>on</strong>g> ' has exactly † U‹+ n<strong>on</strong>zero elements; moreover, all Œ<br />

Z: ŽB Zh matrices<br />

SB'Š[¢l<br />

with UB+ n<strong>on</strong>zero elements are equally likely.<br />

†<br />

2. n Let<br />

ˆ‚‰<br />

elements ' of .<br />

the probability measure such that <str<strong>on</strong>g>for</str<strong>on</strong>g> each entry k>f”g<br />

, 3“% 7>+ . We denote by ‡<br />

+<br />

f g l %¤n:[0%¤3 . Hence, we have ˆ‚‰ S'Š[0%¤3•\/@v—–J31 Z˜Y • , where š is the number of n<strong>on</strong>zero<br />

S!k<br />

~‘†:’~<br />

†:’<br />

<strong>on</strong>ly if each row of '<br />

c<strong>on</strong>tains exactly †D› n<strong>on</strong>zero elements; moreover, all Œ<br />

matrices with †D›<br />

n<strong>on</strong>zero elements per row are equally likely.<br />

ŽYœ Z<br />

~‘†D›5~<br />

+ . We denote by ‡<br />

%¤n<br />

4. Let be n + an integer . We ‡ denote by the probability measure such S'Š[al %¤n<br />

that<br />

<strong>on</strong>ly if each column †D of c<strong>on</strong>tains exactly n<strong>on</strong>zero elements; moreover, all ˆ‚‰ ŽYž matrices with<br />

†D<br />

' Œ<br />

n<strong>on</strong>zero elements per column are equally likely.<br />

The above measures are clearly related since they all represent unstructured <str<strong>on</strong>g>sparse</str<strong>on</strong>g> matrices. We have<br />

†D<br />

chosen these measures since they are quite natural, but of course, they do not cover the whole spectrum<br />

of ‘interesting’ <str<strong>on</strong>g>sparse</str<strong>on</strong>g> matrices. In the following, the expressi<strong>on</strong> ‘random <str<strong>on</strong>g>matrix</str<strong>on</strong>g>’ will denote a <str<strong>on</strong>g>matrix</str<strong>on</strong>g><br />

chosen according to <strong>on</strong>e of these probability measures.<br />

Given ' a random <str<strong>on</strong>g>matrix</str<strong>on</strong>g> we want to estimate the probability ba/c'51 that the graph is a pseudo-expander.<br />

In this secti<strong>on</strong> we prove that ‡ U –‡ <str<strong>on</strong>g>for</str<strong>on</strong>g> the measures previously defined such probability is very close<br />

to <strong>on</strong>e (Theorem 2.6). In the following, we make use of some well known inequalities. For n<br />

+ ,<br />

~ԠD5~<br />

•<br />

(1)<br />

and, <str<strong>on</strong>g>for</str<strong>on</strong>g> any real number T©¤Gv ,<br />

(2)<br />

v–<br />

r ¢<br />

NU<br />

In additi<strong>on</strong>, a straight<str<strong>on</strong>g>for</str<strong>on</strong>g>ward computati<strong>on</strong> shows that <str<strong>on</strong>g>for</str<strong>on</strong>g> any triplet of positive ¦-WY+‚W3 integers such that<br />

we have +<br />

¦G§¢3<br />

NU<br />

~H¡<br />

(3)<br />

v–<br />

' ‡Rf ¬%Hv>WX!XXW6­<br />

{G|„Q ®G|}]<br />

Lemma 2.2 Let be a random <str<strong>on</strong>g>matrix</str<strong>on</strong>g> chosen according to the probability measure , .<br />

If , we have<br />

(4)<br />

ˆ‚‰<br />

S Adj /{j1N¯L®t%t°?[<br />

v–<br />

Proof. We c<strong>on</strong>sider ¬p%Hv>WYE <strong>on</strong>ly , since the ¬%µFW6­ cases are similar. µ%eShkgBf Tfmo {¦Ẁ _Yg¨o ®K[ Let .<br />

We have <br />

<br />

and Adj /{j1?¯·®(%¸° if and <strong>on</strong>ly if kgBf‚%^n <str<strong>on</strong>g>for</str<strong>on</strong>g> all kgBf´o& . For the probability<br />

<br />

measure ‡ƒU , using (3) we get<br />

NU<br />

ˆ‚‰<br />

S Adj /{j12¯·®t%¤°?[—%<br />

v–<br />

<br />

<br />

U +<br />

U +\


Lemma 2.3 Let {t| Q , {<br />

We have that Adj /{j1<br />

Adj Ì /{j12¯<br />

Since +2sÀ7DF<br />

S<br />

~<br />

š<br />

~<br />

+ ¢<br />

+a–¢wNš<br />

S<br />

<br />

S<br />

S<br />

Adj /{j1<br />

<br />

~<br />

~<br />

Ÿ<br />

›<br />

Ÿ<br />

Ÿ<br />

Adj /{j1<br />

<br />

Adj /{j1<br />

<br />

Z<br />

•8Ð ZpÏ”Å<br />

+ ¢<br />

+i–ÂwNš<br />

†<br />

f<br />

+K<br />

É<br />

<br />

~<br />

+ ¢<br />

+i–JwNš<br />

¢<br />

š<br />

Ÿ<br />

~<br />

~<br />

Z¹ÏÑÅ •BÐ<br />

Ÿ<br />

Ÿ<br />

<br />

Ÿ<br />

+<br />

É<br />

¢ + ~<br />

+i–„+2s?w<br />

Ÿ<br />

~<br />

Ÿ<br />

†<br />

f<br />

+K<br />

Ÿ<br />

S<br />

†<br />

Ÿ<br />

†<br />

f<br />

+K<br />

Adj /6{j1<br />

<br />

†<br />

f<br />

+<br />

†<br />

Ÿ<br />

†<br />

É<br />

†<br />

f<br />

+i<br />

f<br />

©ÒhÓ`Ô +K<br />

~<br />

r<br />

Ÿ<br />

¢<br />

<br />

®<br />

<br />

†<br />

f<br />

+<br />

†<br />

f<br />

+K<br />

É<br />

38 Giovanni Manzini<br />

For the probability measure ‡<br />

not c<strong>on</strong>tain n<strong>on</strong>zero elements in the {<br />

we have Adj /{j12¯·®t%¤° <strong>on</strong>ly if the ®<br />

rows corresp<strong>on</strong>ding ® to<br />

columns corresp<strong>on</strong>ding { to . By (3) we get<br />

do<br />

{<br />

<br />

?U@¼<br />

†D›<br />

+<br />

†D›<br />

ˆ‚‰<br />

S Adj /{j12¯·®t%¤°?[*%»º<br />

+¨–<br />

†D›<br />

v–<br />

+© 5± ²¹±@± ³¹± ½<br />

± ³‚±<br />

%¤š , and assume (4) holds. If<br />

s‚+‚W<br />

f‚¤<br />

s‚+<br />

F<br />

F<br />

v–Â9=m/@v–„s?wN1AÃ (5)<br />

sÀ¿·Á<br />

n¾ru¿Lrtv>W<br />

then<br />

•DÄ ZÅ •hÆAÄ UYÈÇ Æ<br />

wNš\[¾r<br />

v–<br />

Proof. First note that<br />

ˆ¹‰<br />

˜~<br />

—~ËÉ<br />

, with Ì ®Í|R]<br />

® sets of +i– size<br />

, such that wNšÊ<br />

, if (4) holds, we have<br />

wNšÊ<br />

%Î+&–<br />

wNšÊ[<br />

˜~<br />

ˆ‚‰<br />

D~µÉ<br />

ˆ¹‰<br />

wNš\[5%<br />

wNšÊ <strong>on</strong>ly if there exists a set Ì<br />

®t%t° . Since there are Œ<br />

ˆ‚‰<br />

D~<br />

•>Ä Z¹ÏÑÅ •BÐ`Æ<br />

By (1) we have<br />

wNš\[<br />

v–<br />

+i–<br />

wNšÊ8<br />

f<br />

+i<br />

Adj /{j1<br />

<br />

ZpϔŠ•8Ð<br />

•>Ä Z¹ÏÑÅ •8Ð`Æ Ç Ÿ<br />

ˆ¹‰<br />

•>Ä Z¹ÏÑÅ •BÐ`Æ…Ä UYÈÇ Æ<br />

v–<br />

˜~<br />

wNš\[<br />

¢ +<br />

v–<br />

+i–<br />

wNšÊ<br />

•DÄ ZpϔŠ•8Ð`Æ Ç Ÿ<br />

•>Ä ZÅ •hÆAÄ UY2Ç Æ<br />

Hence, we need to show that<br />

v–<br />

v–<br />

• Ç<br />

rGv<br />

v–<br />

+2s , using (2) we get<br />

• Ç<br />

UYÖÕ6×<br />

v–<br />

This completes the proof since the hypothesis (5) <strong>on</strong> † f implies that<br />

v–&s?w ÔYÒ<br />

v–<br />

UY Õ6×<br />

½<br />

v–„s?w<br />

ÔYÒ


š<br />

~<br />

†<br />

Z<br />

•<br />

á<br />

+ ¢<br />

š<br />

Ÿ<br />

†<br />

f<br />

+K<br />

†<br />

+ ¢<br />

š<br />

Ÿ<br />

Ÿ<br />

†<br />

f<br />

+<br />

ZÅ •hÆAÄ UYÈÇ Æ ~ F ¢<br />

Ä<br />

s<br />

­<br />

s<br />

F<br />

v–J9=*/6v–„sNwN1…ÃW<br />

s‚¿©Á<br />

á<br />

†<br />

W<br />

¢<br />

†<br />

­<br />

s<br />

with á<br />

§<br />

r<br />

r<br />

v<br />

/@v–„¿\1B/6v–Is?wN1<br />

Ÿ<br />

¢<br />

Ÿ<br />

<br />

+<br />

š<br />

¢ + ¡<br />

š·£<br />

Ÿ<br />

~<br />

Ÿ<br />

•<br />

­<br />

sm<br />

†<br />

Ÿ<br />

v<br />

F<br />

f<br />

+K<br />

~<br />

†<br />

f<br />

+K<br />

†<br />

f<br />

+<br />

Z Ä Ù 2ÚDÅ ÆAÄ UYÈÇ Æ<br />

n<br />

v<br />

/6v–&¿\1B/6vm–&s?wN12ß<br />

Á<br />

½<br />

Sparse <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> 39<br />

Lemma 2.4 Suppose that the hypotheses of Lemma 2.3 hold. If<br />

(6)<br />

%eš such that Adj /{j1<br />

f‚¤<br />

vp§Â9=<br />

?~<br />

then the probability that there exists a {Ø|^Q set with {<br />

• . F<br />

wNš is less than<br />

Proof. The probability that a set of size š is c<strong>on</strong>nected to less than wNš vertices is given in Lemma 2.3.<br />

Since there exist Œ<br />

of such sets, we have<br />

ˆ‚‰<br />

D~<br />

•DÄ ZÅ •hÆAÄ UYÈÇ Æ<br />

S#Ù\{ such that Adj /{j1<br />

wNš\[<br />

v–<br />

•>Ä ZÅ •hÆAÄ UYÈÇ Æ<br />

vp–<br />

There<str<strong>on</strong>g>for</str<strong>on</strong>g>e, we need to prove that<br />

(7)<br />

v–<br />

Ä Z\Å •#ÆAÄ Ù 2Ç Æ ~<br />

Since<br />

’ ~ Z˜Ú<br />

+2s , we have<br />

v–<br />

v–<br />

From (6) we have<br />

F ~<br />

s<br />

Ž<br />

× Ä UYÈÚDÅ ÆAÄ UYÈÇ Æ (8)<br />

UY<br />

hence<br />

­<br />

s<br />

f`/6v–&s?wN1B/6vm–¿1<br />

9


†<br />

the graph ba/c'51 is an /As¦W6wpWY+21 -pseudo-expander with probability greater than v–&F UY<br />

†<br />

w<br />

<br />

the graph ba/c'51 is an /As¦W6wpWY+21 -pseudo-expander with probability greater than v–&F UY<br />

†<br />

<br />

U<br />

’<br />

®<br />

~<br />

§<br />

§<br />

~<br />

f ã2ë<br />

Ð â ÏPZ˜Ú<br />

F<br />

•<br />

{<br />

Ó<br />

ž<br />

Óò<br />

<br />

~<br />

½<br />

½<br />

½<br />

40 Giovanni Manzini<br />

Lemma 2.5 Let ' be an +-,-+ random <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> which (4) holds. If<br />

Ú vp§I9=<br />

v–„s?w<br />

F<br />

v–J9WX!XXW6­ Theorem 2.6 Let be a random <str<strong>on</strong>g>matrix</str<strong>on</strong>g> chosen according to the probability measure .<br />

If<br />

Ó`Ò<br />

Ó`Ò<br />

Ú vp§I9=<br />

v–„s?w<br />

F<br />

v–J9ðmXX!X . Theorem 2.6 tells us that,<br />

under this assumpti<strong>on</strong>, † is an / U ’ W Uí<br />

Uî WY+21 -pseudo-expander with probability v–}F UY<br />

. Similarly, if<br />

ba/q'j1<br />

nDñmXX!X\ba/c'51 is a / î W î WY+21 -pseudo-expander with probability greater than v–„F Ù <br />

.<br />

fN¤Gv!ðX<br />

ba/q'j1 If is /sW@wpWY+21 an -pseudo-expander, then, {e|¸Q given such +2s‚7>F<br />

~ó \~<br />

+2s that there<br />

<br />

are<br />

values _Yg that depend <strong>on</strong> the values TfoÂ{ . Similarly, if bi/c'môp1 is an /As¦W6wpWY+21 -pseudoexpander,<br />

®^|‘] given s‚7>F<br />

{<br />

~z ~<br />

s , , the _YgKoy® values depend up<strong>on</strong> more w<br />

<br />

® than Tf values . By<br />

more w than<br />

symmetry c<strong>on</strong>siderati<strong>on</strong>s we have the following corollary of Theorem 2.6.<br />

Corollary 2.7 ' Let be a random <str<strong>on</strong>g>matrix</str<strong>on</strong>g> chosen according to the probability ‡Rf2¬N%¸v>WX!XXW6­ measure .<br />

If f satisfies (11), then the graph ba/c'—ô¹1 is an /sW@wpWY+21 -pseudo-expander with probability greater than<br />

†<br />

f‚¤<br />

vp–„F Ù <br />

ÒÓ<br />

3 Study of the Average Case Communicati<strong>on</strong> Complexity<br />

' Let be +¢,J+ an <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g>, and 3 let<br />

case complexity of computing the $-%‘') product <strong>on</strong> 3 a -processor hypercube. Our analysis is based <strong>on</strong><br />

the cost of routing the comp<strong>on</strong>ents ) of to their final destinati<strong>on</strong>. That is, we c<strong>on</strong>sider <strong>on</strong>ly the cost of<br />

+ . In this secti<strong>on</strong> we prove a lower bound <strong>on</strong> the average


+<br />

’<br />

’<br />

~<br />

›<br />

š<br />

~<br />

¥<br />

’<br />

¥<br />

<br />

Ì<br />

<br />

¤<br />

+<br />

F<br />

’<br />

+<br />

­<br />

~<br />

¥<br />

’<br />

ù<br />

ø<br />

<br />

¤<br />

¥<br />

1<br />

<br />

%<br />

ü<br />

+<br />

F<br />

+<br />

ï/cw§‘v1<br />

É<br />

¥<br />

+ ~<br />

F<br />

¤<br />

+<br />

ï/cw§Mv!1<br />

~<br />

¥<br />

’<br />

Sparse <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> 41<br />

routing, to the processor that _ g computes , the T f values ’s <str<strong>on</strong>g>for</str<strong>on</strong>g> ¬ all such k g8f l that<br />

<str<strong>on</strong>g>for</str<strong>on</strong>g> establishing a lower bound is that we can compute partial 3>gŠ%¸kgBf Tf §<br />

sums § kgBfcõhTföõ during<br />

the routing process. In this case, by moving a single value 3Dg we can ‘move’ several Tf ’s. However, since<br />

ªª!ª<br />

kgBf the ’s can be arbitrary, we assume that the partial 3>g0%Gkg8f Tf §<br />

sum § k!g8fcõ!Tfcõ can be used <strong>on</strong>ly<br />

<str<strong>on</strong>g>for</str<strong>on</strong>g> computing the ÷ -th comp<strong>on</strong>ent _Yg .<br />

ª!ªª<br />

%^n . A major difficulty<br />

In the previous secti<strong>on</strong>, we have shown that a random <str<strong>on</strong>g>matrix</str<strong>on</strong>g> is a pseudo-expander with high probability.<br />

There<str<strong>on</strong>g>for</str<strong>on</strong>g>e, to establish an average case result it suffices to obtain a lower bound <str<strong>on</strong>g>for</str<strong>on</strong>g> this class of matrices.<br />

In the following, we assume that there exist two ø functi<strong>on</strong>s<br />

vh[ such that<br />

WYø?ù mapping S˜vDẀ FW!XX!XẀ +?[ into ShnWX!XXWA3m–<br />

1. ¬N%¸v>WX!XXWY+ <str<strong>on</strong>g>for</str<strong>on</strong>g> , the T f value is initially c<strong>on</strong>tained <strong>on</strong>ly in processor ø /A¬1 number ;<br />

2. ÷I%úvDW!XXXẀ + <str<strong>on</strong>g>for</str<strong>on</strong>g> , at the end of the computati<strong>on</strong> the _Yg value is c<strong>on</strong>tained in processor number<br />

ù /1 . ø<br />

In the ø NU /Aš1 following, will denote the ShT f /¬A1a%ûš\[ set , that is, the set of all comp<strong>on</strong>ents ) of<br />

initially c<strong>on</strong>tained in š processor . ø NU /Aš1 Similarly, will denote the comp<strong>on</strong>ents $ of c<strong>on</strong>tained in processor<br />

š .<br />

' ba/c'51 /sW@wpWY+21 s-%¸v!7DF › r„wÂr<br />

F n rÂ3 ø<br />

~tü ?U / +2783Ê ù ç +27B3 è<br />

Theorem 3.1 Let be a <str<strong>on</strong>g>matrix</str<strong>on</strong>g> such that is an -pseudo-expander with ,<br />

. If c<strong>on</strong>diti<strong>on</strong>s 1–2 hold, and <str<strong>on</strong>g>for</str<strong>on</strong>g> , or , any algorithm <str<strong>on</strong>g>for</str<strong>on</strong>g> computing<br />

the product $J%<br />

') <strong>on</strong> a 3 -processor hypercube requires 45/6/+27831:9=23\1 time.<br />

Proof. The two ø WYø ù functi<strong>on</strong>s define a mapping of the vertices ba/q'j1 of into the processors of the<br />

hypercube. If the /ATf6Ẁ _Yg>1 edge bel<strong>on</strong>gs ba/c'51 to (that kg8fjl %Gn is, ) the _Yg value depends Tf up<strong>on</strong> . Hence,<br />

the Tf value has to be moved from ø /A¬1 processor to ø ù /1 processor .<br />

v 9=N3 For we c<strong>on</strong>sider the set of š dimensi<strong>on</strong> links of the hypercube (that is, the links<br />

c<strong>on</strong>necting processors whose addresses differ in š the -th bit). If the š dimensi<strong>on</strong> links are removed,<br />

the hypercube is bisected into two ý¨U!Ẁ ý sets 37DF with processors each. From the hypothesis ø ù <strong>on</strong> it<br />

follows that at the end of the computati<strong>on</strong> each set c<strong>on</strong>tains at /3\7>F>1 ç +27B3 è F>+27DE most _Yg values ’s.<br />

We can assume ý U that initially c<strong>on</strong>tains at +27DF least T f values ’s (if not, we exchange the roles ý U of<br />

ý and ). A T f o‘ý U value can ý reach either by itself or inside a partial þRk g!ÿ T ÿ sum . Note that a<br />

T f value that ý reaches by itself can be utilized <str<strong>on</strong>g>for</str<strong>on</strong>g> the computati<strong>on</strong> of _ g several ’s, but we assume that<br />

each partial sum can be utilized <str<strong>on</strong>g>for</str<strong>on</strong>g> computing <strong>on</strong>ly _ g <strong>on</strong>e .<br />

+?U Let denote the number of Tf‚oLý¨U values that traverse the š dimensi<strong>on</strong> links by themselves, and let<br />

be the number of partial sums that traverse the š dimensi<strong>on</strong> links. We claim Û©ÜhÝ\/+?U!WY+<br />

that<br />

x + 1 ¡<br />

%<br />

Å:<br />

í where Å£¢?U .<br />

Æ Ä ¡ ~ ¡ + If , we c<strong>on</strong>sider the Ì QU set of the Tf‚o-ý¨U values that do not traverse the š dimensi<strong>on</strong> links by<br />

+?U<br />

themselves. We have<br />

(12)<br />

Q U<br />

–„+ U ¤<br />

– ¡ +©%<br />

Since ba/c'51 is a pseudo-expander, and


¡<br />

’<br />

’<br />

’<br />

›<br />

’<br />

’<br />

¥<br />

’<br />

ÅhZ<br />

í ¥ Å£¢?U Æ Ä<br />

3<br />

£<br />

’<br />

–<br />

¤<br />

’<br />

›<br />

values _ g oJý<br />

Z<br />

wN+<br />

ï/cw§Mv!1<br />

Ì ]<br />

<br />

]<br />

’<br />

’<br />

¤<br />

–<br />

+<br />

F<br />

F>+<br />

E<br />

’<br />

¤<br />

’<br />

¤<br />

+<br />

F<br />

’<br />

¤<br />

]<br />

’<br />

Ì<br />

wN+<br />

ï/qw“§¤v1<br />

depend <strong>on</strong> the values T f o<br />

w©–­/qw“§¤v1<br />

ï/cw§Mv!1<br />

–<br />

¥<br />

’<br />

F>+<br />

E<br />

’<br />

ü<br />

1<br />

<br />

% /<br />

¤<br />

+<br />

ï/cwa§¤v!1<br />

Ì<br />

É<br />

~<br />

–<br />

Z<br />

’<br />

’<br />

’<br />

Ì<br />

½<br />

½<br />

42 Giovanni Manzini<br />

we have that more than<br />

Q U . We have Ì<br />

% ¡ +<br />

%‘+<br />

that is, more than + values _YgŠo-ý depend up<strong>on</strong> the values Tf‚o QU . Moreover, the values Tf‚o Q¨U can<br />

reach ¡ <strong>on</strong>ly inside the + partial sums that traverse the dimensi<strong>on</strong> š links. Since each partial sum can<br />

ý<br />

be utilized <str<strong>on</strong>g>for</str<strong>on</strong>g> the computati<strong>on</strong> of <strong>on</strong>ly _ g <strong>on</strong>e , we have + x ¡ + that as claimed.<br />

This proves 45/A+21 that items must traverse the š dimensi<strong>on</strong> links. Since the same result holds <str<strong>on</strong>g>for</str<strong>on</strong>g> all<br />

dimensi<strong>on</strong>s, the sum of the lengths of the paths traversed by the data 45/A+9C;D=N31 is . Since at each step at<br />

3 most links can be traversed, the computati<strong>on</strong> $-%M'*) of 45/@/A+27B3\1:9=231 requires time.<br />

Theorem 3.2 ' Let be a <str<strong>on</strong>g>matrix</str<strong>on</strong>g> such ba/q'mô¹1 that is /sW@wpWY+21 an -pseudo-expander s‘%Hv!7DF with , r ›<br />

. If c<strong>on</strong>diti<strong>on</strong>s 1–2 hold, and, <str<strong>on</strong>g>for</str<strong>on</strong>g> n<br />

~óü<br />

r^3 , ø NU +2783Ê or ç +27B3 è , any algorithm <str<strong>on</strong>g>for</str<strong>on</strong>g><br />

wGrØF<br />

computing the $-%M'*) product <strong>on</strong> 3 a -processor hypercube 45/6/+27831:9=?31 requires time.<br />

Proof. ý¨U!WYý Let<br />

the hypothesis ø <strong>on</strong><br />

Let +?U!WY+<br />

denote the sets obtained by removing the dimensi<strong>on</strong> š links of the hypercube. From<br />

follows that initially each set c<strong>on</strong>tains at /37DF>1 ç +27B3 è F>+27>E most Tf values ’s. We<br />

c<strong>on</strong>tains at +27DF least _Yg values ’s (if not we ý¨U c<strong>on</strong>sider ).<br />

can assume that at the end of the computati<strong>on</strong> ý<br />

with ¡ %<br />

If +<br />

be defined as in the proof of Theorem 3.1. Clearly, it suffices to prove Û©ÜhÝ/A+?U#WY+<br />

that<br />

.<br />

Å:<br />

Ä Å£¢?U Æ í<br />

~ ¡ + , we c<strong>on</strong>sider the set Ì of the values _ g o-ý<br />

’<br />

traverses the dimensi<strong>on</strong> š links. We have<br />

k:ÿ8f…Tf<br />

<strong>on</strong>ly if no partial sum þ<br />

that are computed entirely inside ý<br />

. That is,<br />

1x ¡ +<br />

_hÿKo<br />

] Ì<br />

’ <br />

–Â+<br />

– ¡ +©%<br />

ba/q'môp1 Since is a _ g o<br />

pseudo-expander, the values depend <strong>on</strong><br />

ÅhZ<br />

í more Å£¢?U<br />

›<br />

than<br />

values T f oLý U .<br />

By c<strong>on</strong>structi<strong>on</strong>, these values ¥ f ’s must traverse the dimensi<strong>on</strong> š links by themselves. Hence<br />

T<br />

Æ Ä<br />

+?Umx<br />

% ¡ +<br />

By combining Theorems 3.1 and 3.2 with a result <strong>on</strong> the complexity of balancing <strong>on</strong> the hypercube,<br />

it is possible to prove that the computati<strong>on</strong> of $^%R'*) requires 45/@/+27831:9


¦<br />

¡<br />

¡<br />

’<br />

’<br />

’<br />

3<br />

£<br />

<br />

<br />

¥<br />

’<br />

ü<br />

<br />

1<br />

ü<br />

1<br />

<br />

/<br />

¥<br />

£<br />

r<br />

½ r<br />

½<br />

Sparse <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> 43<br />

We also need the c<strong>on</strong>verse of this lemma. Given a distributi<strong>on</strong> of + items over 3 processors with no<br />

more ¦ than items per processor, there exists an algorithm UNBALANCE that c<strong>on</strong>structs such distributi<strong>on</strong><br />

starting from an initial setting in which each processor ç +27B3 è c<strong>on</strong>tains or +2783Ê items. The algorithm<br />

UNBALANCE is obtained by executing the operati<strong>on</strong>s of the algorithm BALANCE in the opposite order,<br />

É<br />

hence its running time is again ¦<br />

9F › r<br />

werÎF n rØ3 ø ?U ~Îü<br />

ù n<br />

~<br />

%¦i/@/+27831:9


’<br />

’<br />

Ì<br />

›<br />

%<br />

<br />

’<br />

É<br />

›<br />

Adj /{j1<br />

<br />

<br />

<br />

<br />

’<br />

~<br />

š<br />

~<br />

ü<br />

<br />

1<br />

¥<br />

ü ½<br />

% 1 /<br />

½<br />

½<br />

½<br />

44 Giovanni Manzini<br />

what is the minimum number of n<strong>on</strong>zero elements of ' <str<strong>on</strong>g>for</str<strong>on</strong>g> which this property holds. In this secti<strong>on</strong> we<br />

give a partial answer to this questi<strong>on</strong>.<br />

Definiti<strong>on</strong> 4.1 varGw‘r^F Let . Given a ' <str<strong>on</strong>g>matrix</str<strong>on</strong>g> , we say that the ba/c'51 graph is w a -weak-expander if<br />

<str<strong>on</strong>g>for</str<strong>on</strong>g> each {G|„Q set the following property holds:<br />

{<br />

<br />

{<br />

<br />

+27>F>ÊÂ%2€<br />

x}w<br />

<br />

Obviously, all /As¦W6wpWY+21 -pseudo-expanders with s¢¤Gv!7>F are weak-expanders but the c<strong>on</strong>verse is not true.<br />

As we will see, weak-expanders can be still used to get lower <str<strong>on</strong>g>bounds</str<strong>on</strong>g> <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g>. In<br />

additi<strong>on</strong>, the next lemma shows that there exist matrices with <strong>on</strong>ly E>+ n<strong>on</strong>zero elements whose dependency<br />

graph is a weak-expander.<br />

Lemma 4.2 For +·¤}ð all , there exists +·,©+ an Ì ' <str<strong>on</strong>g>matrix</str<strong>on</strong>g> such that<br />

, where U W <br />

Ì 't%© U § §<br />

1.<br />

2. ba/c'51 both ba/c'—ô¹1 and ! are<br />

W<br />

are permutati<strong>on</strong> matrices,<br />

" -weak-expanders.<br />

Proof. The existence of Ì ' <str<strong>on</strong>g>matrix</str<strong>on</strong>g> is established in the proof of Theorem 4.3 in [11].<br />

Lemma 4.3 ' Let be a <str<strong>on</strong>g>matrix</str<strong>on</strong>g> with a c<strong>on</strong>stant number of n<strong>on</strong>zero elements per row, and such ba/q'môp1 that<br />

is w a -weak-expander. If c<strong>on</strong>diti<strong>on</strong>s 1–2 of secti<strong>on</strong> 3 3 + hold, , and, n<br />

~‘ü<br />

rÂ3 <str<strong>on</strong>g>for</str<strong>on</strong>g> , ø ?U<br />

ù / %t+2783 , any<br />

algorithm <str<strong>on</strong>g>for</str<strong>on</strong>g> computing the product <strong>on</strong> a 3 -processor hypercube requires 45/@/+27831:9+ most n<strong>on</strong>zero elements, such that,<br />

if the comp<strong>on</strong>ents of <strong>on</strong>e of the ) <str<strong>on</strong>g>vector</str<strong>on</strong>g>s $ or are evenly distributed am<strong>on</strong>g the processors, any algorithm<br />

<strong>on</strong> a 3 -processor hypercube requires 45/6/+27831:9=?31 time.<br />

'*)<br />

<str<strong>on</strong>g>for</str<strong>on</strong>g> $-%<br />

computing<br />

The case of matrices with three n<strong>on</strong>zero elements per row appears to be the boundary between easy and<br />

difficult problems. In fact, if a # <str<strong>on</strong>g>matrix</str<strong>on</strong>g> c<strong>on</strong>tains at most two n<strong>on</strong>zero elements in each row and in each<br />

column, we can arrange the comp<strong>on</strong>ents of ) and $ so that the product $Â%¤'*) can be computed <strong>on</strong> the<br />

hypercube in ¦a/c+27831 time. To see this, it suffices to notice that each vertex of ba/#Š1 has degree at most


U<br />

Ì<br />

Ì<br />

’<br />

›<br />

¥<br />

’<br />

›<br />

'<br />

#<br />

½<br />

Sparse <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> 45<br />

two. Hence, the c<strong>on</strong>nected comp<strong>on</strong>ents ba/#¾1 of can <strong>on</strong>ly be chains or rings and can be embedded in the<br />

hypercube with c<strong>on</strong>stant dilati<strong>on</strong> and c<strong>on</strong>gesti<strong>on</strong>.<br />

We c<strong>on</strong>clude this secti<strong>on</strong> by studying the complexity of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> when we<br />

require that, ¬2%¸v>WX!XXWY+ <str<strong>on</strong>g>for</str<strong>on</strong>g> , the _f value is stored in the same processor Tf c<strong>on</strong>taining . A typical situati<strong>on</strong><br />

in which this requirement must be met, is when the <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> algorithm is utilized <str<strong>on</strong>g>for</str<strong>on</strong>g> computing the<br />

)‚ÄP• ¢?U Æ*% '*)‚Ä•hÆ sequence generated by an iterative method. Note that, using the notati<strong>on</strong> of secti<strong>on</strong> 3,<br />

ø ù .<br />

$<br />

this requirement is equivalent to assuming that ø<br />

Theorem 4.6 % Let be the class of <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> algorithms such that, ¬%ev>WXX!XWY+ <str<strong>on</strong>g>for</str<strong>on</strong>g> ,<br />

the _ f value is stored in the same processor T f c<strong>on</strong>taining . For +¢¤‘ð all , there exists +¢,-+ an Ì <str<strong>on</strong>g>matrix</str<strong>on</strong>g><br />

such that<br />

1. each row and each column Ì # of c<strong>on</strong>tains at most two n<strong>on</strong>zero elements,<br />

2. if the comp<strong>on</strong>ent ) of are evenly distributed, any algorithm % in <str<strong>on</strong>g>for</str<strong>on</strong>g> $z% Ì<br />

computing<br />

-processor hypercube requires 45/@/A+27B3\1:9=231 time.<br />

3<br />

#5) <strong>on</strong> a<br />

Proof. By Lemmas 4.2 and 4.4, we know that there exists a Ì <str<strong>on</strong>g>matrix</str<strong>on</strong>g> such that the communicati<strong>on</strong> cost<br />

of $‘% ') computing 45/Y/A+27B319C;D=231 is time (note that this bound holds <str<strong>on</strong>g>for</str<strong>on</strong>g> the algorithms not % in ).<br />

Moreover, we know Ì that are permutati<strong>on</strong> matrices. Now c<strong>on</strong>sider the<br />

. Since the <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> by a permutati<strong>on</strong> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> does not require any communicati<strong>on</strong><br />

'<br />

U §& §& , where U W W <br />

't%©<br />

'('%©<br />

NU<br />

<str<strong>on</strong>g>matrix</str<strong>on</strong>g><br />

(<str<strong>on</strong>g>for</str<strong>on</strong>g> the algorithms not % in !), the communicati<strong>on</strong> cost of $„%G')'


46 Giovanni Manzini<br />

lower bound by allowing processors to c<strong>on</strong>tain multiple copies of the comp<strong>on</strong>ents of ) . To be fair, our<br />

algorithm should produce multiple copies of the comp<strong>on</strong>ents of $ so that it can be used to compute<br />

sequences of the kind )‚Ä• ¢?U ÆÀ%u'*)‚Ä•hÆ .<br />

References<br />

[1] A. Agrawal, P. Klein and R. Ravi. Cutting down <strong>on</strong> fill using nested dissecti<strong>on</strong>: Provably good<br />

eliminati<strong>on</strong> orderings. In A. George, R. Gilbert and J. Liu, editors, Graph Theory and Sparse Matrix<br />

Computati<strong>on</strong>, 31–55. Springer-Verlag, 1993.<br />

[2] B. Bollobás. Random Graphs. Academic Press, 1985.<br />

[3] U. Catalyuerek and C. Aykanat. Decomposing irregularly <str<strong>on</strong>g>sparse</str<strong>on</strong>g> matrices <str<strong>on</strong>g>for</str<strong>on</strong>g> parallel <str<strong>on</strong>g>matrix</str<strong>on</strong>g><str<strong>on</strong>g>vector</str<strong>on</strong>g><br />

<str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g>. Parallel Algorithms <str<strong>on</strong>g>for</str<strong>on</strong>g> Irregulary Structured Problems (IRREGULAR ’96),<br />

LNCS 1117, 75–86. Springer-Verlag, 1996.<br />

[4] T. Dehn, M. Eiermann, K. Giebermann and V. Sperling. Structured <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g>-<str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g><br />

<strong>on</strong> massively parallel SIMD architectures. Parallel Computing 21:1867–1894, 1995.<br />

[5] P. Fernandes and P. Girdinio. A new storage scheme <str<strong>on</strong>g>for</str<strong>on</strong>g> an efficient implementati<strong>on</strong> of the <str<strong>on</strong>g>sparse</str<strong>on</strong>g><br />

<str<strong>on</strong>g>matrix</str<strong>on</strong>g>-<str<strong>on</strong>g>vector</str<strong>on</strong>g> product. Parallel Computing 12:327–333, 1989.<br />

[6] V. Kumar, A. Grama, A. Gupta and G. Karypis. Introducti<strong>on</strong> to Parallel Computing: Design and<br />

Analysis of Algorithms. Benjamin/Cummings, Redwood City, CA, 1994.<br />

[7] F. T. Leight<strong>on</strong>. Introducti<strong>on</strong> to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.<br />

Morgan Kaufmann, San Mateo, CA, 1991.<br />

[8] R. Lipt<strong>on</strong>, D. Rose and R. Tarjan. Generalized nested dissecti<strong>on</strong>. SIAM J. Numer. Anal. 16:346–358,<br />

1979.<br />

[9] G. Manzini. Sparse <str<strong>on</strong>g>matrix</str<strong>on</strong>g> computati<strong>on</strong>s <strong>on</strong> the hypercube and related networks. Journal of Parallel<br />

and Distributed Computing 21:169–183, 1994.<br />

[10] G. Manzini. Sparse <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> <strong>on</strong> distributed architectures: <str<strong>on</strong>g>Lower</str<strong>on</strong>g> bound and<br />

average complexity results. In<str<strong>on</strong>g>for</str<strong>on</strong>g>mati<strong>on</strong> Processing Letters 50:231–238, 1994.<br />

[11] G. Manzini. On the ordering of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> linear systems. Theoretical Computer Science 156(1–2):301–<br />

313, 1996.<br />

[12] V. Pan. Parallel soluti<strong>on</strong> of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> linear and path systems. In J. H. Reif, editor, Synthesis of Parallel<br />

Algorithms, 621–678. Morgan Kaufmann, 1993.<br />

[13] C. G. Plaxt<strong>on</strong>. Load balancing, selecti<strong>on</strong> and sorting <strong>on</strong> the hypercube. Proc. of the 1st Annual ACM<br />

Symposium <strong>on</strong> Parallel Algorithms and Architectures, 64–73, 1989.<br />

[14] L. Romero and E. Zapata. Data distributi<strong>on</strong>s <str<strong>on</strong>g>for</str<strong>on</strong>g> <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g>. Parallel Computing<br />

21:583–605, 1995.


Sparse <str<strong>on</strong>g>matrix</str<strong>on</strong>g> <str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g> 47<br />

[15] L. Ziantz, C. Oezturan and B. Szymanski. Run-time optimizati<strong>on</strong> of <str<strong>on</strong>g>sparse</str<strong>on</strong>g> <str<strong>on</strong>g>matrix</str<strong>on</strong>g>-<str<strong>on</strong>g>vector</str<strong>on</strong>g> <str<strong>on</strong>g>multiplicati<strong>on</strong></str<strong>on</strong>g><br />

<strong>on</strong> SIMD machines. In Parallel Architectures and Languages Europe (PARLE ’94), LNCS 817,<br />

313–322. Springer-Verlag, 1994.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!