**Pointers** **and** **Subscripting**

When A is an array the expression

A[i]

is converted to

(&A[0])[i]

where

&A[0]

means a pointer to the first element

When p is a pointer the expression

p[i]

is the same as

*(p+i)

When a pointer is added to an integer the integer is

multiplied by the size of what the pointer points to **and** is

then added to the integer value of the pointer

Size of arrays **and** pointers

int a[10];

sizeof(a) // = 10 * sizeof(int) = 40 on our machines

int **b;

sizeof(b) // sizeof(int **) = 4 on our machines

Both a **and** b act like two-dimensional arrays

Example - two-dimensional arrays

C has limited two-dimensional arrays

Common practice is to make an array of pointers that point

to the elements

int m=3, n=5;

int **a;

a = malloc(m*sizeof(int *));

int i;

for(i = 0; i < m; i++) {

a[i] = malloc(n*sizeof(int));

}

// now a[i][j] will work

another way

int m=3, n=5;

int **a;

a = malloc(m*sizeof(int *));

int *p = malloc(m*n*sizeof(int));

int i;

for(i = 0; i < m; i++) {

a[i] = p + m*i;

// or a[i] = &p[m*i];

}

Tricks with pointers - casting

overlaying a region of memory with an array or struct

A FIFO Queue Example

#include

#include

typedef struct Node {

int data;

struct Node *next;

} Node;

struct Queue {

Node *start;

Node **end;

};

struct Queue *makeQueue() {

struct Queue *q = malloc(sizeof(struct Queue));

q->start = NULL;

q->end = &q->start;

return q;

}

void destroyQueue(struct Queue *q) {

while(q->start) {

Node *n = q->start->next;

free(q->start);

q->start = n;

}

free(q);

}

_Bool empty(struct Queue *q) {

return !q->start;

}

void enqueue(struct Queue *q, int data) {

Node *n = malloc(sizeof(Node));

n->data = data;

n->next = NULL;

*q->end = n;

q->end = &n->next;

}

int dequeue(struct Queue *q) {

int data = -100; // error return for empty queue

if(q->start) {

data = q->start->data;

Node *n = q->start;

q->start = q->start->next;

free(n);

if(!q->start) { // queue is now empty

q->end = &q->start;

}

}

return data;

}

int main() {

struct Queue *q = makeQueue();

enqueue(q, 1);

enqueue(q, 2);

enqueue(q, 3);

while(!empty(q)) {

printf("%i\n", dequeue(q));

}

destroyQueue(q);

return 0;

}

Matrix multiply

The product of a m×n matrix A **and** a n×p matrix B is a m×p

matrix C given by

Cij = ∑k=0 n AikBkj

This requires m×n×p multiplications (**and** about the same

number of additions)

Code sketch for straight-forward serial implementation

for(i = 0; i < m; i++) {

for(j = 0; j < p; j++) {

double sum = 0.0;

for(k = 0; k < n; k++) {

sum = A[i][k]*B[k][j];

}

c[i][j] = sum;

}

}

How can we speed up matrix multiplication

using parallel processing?

All of the multiplications are independent of each other **and**

so can be done in any order

The additions are independent for each m×p output element

(but dependent within the calculation for a single output

element)

Possibilities

Assign a processor for each element of C

Requires each process to access some row of A **and** some column of B

one matrix accessed by rows, the other by columns

partition each matrix into, say 5×5 blocks **and** do a matrix multiply

using the blocks as elements

Each multiply is now a 5×5 matrix multiply

All matrices are partitioned identically

More consistent access patterns

Systolic array

Matrix multiplication

The algorithm

cache behavior of algorithm

column accesses may destroy a cache

reformulating the algorithm for parallelism

each processor calculates an answer element

systolic arrays

block algorithms

How to measure complexity of parallel

programs

Asymptotic algorithm speed

space complexity

time complexity

communication complexity

partitioning a problem across processors **and** memory

memory model

access model

reference time - reads **and** writes

degree of coupling

MIMD

SIMD

parallelism from architecture

cdc 6600

pentium

vector processors

special ad hoc hardware

pipeline parallelism

branch prediction **and** pipeline flushes

caching behavior

cache miss times

2048x2048 matrix slower than 2049x2049

loop unrolling

Convolution filtering

Convolution filtering performs a convolution operation

between a data array D **and** a "filter" array F to produce a

new data array E

Ei = ∑k=-a b DkFi-k

We "slide" the filter array under the data array **and** multiply

corresponding terms **and** sum the products

This is often done in two dimensions in image processing

Often the size of the filter array is small in comparison with the

data array

This means that only a few input data elements affect any output data

element

Alternatives

Divide the output array into regions **and** have a separate process

calculate each region in parallel

input data elements near the boundary of a region might affect multiple

output regions **and** must be duplicated in each affected process

block matrix

systolic array

Build a processor that calculates a single output data element **and**

pipeline input data through it (all multiplies for a single output data

element can be done effectively in parallel)

FFT techniques

f * g = F -1 (F(f) × F(g))

A Fourier transform can be done in O(n log n) steps so total

calculation done in O(n log n) which is better than O(n 2 )

mention echo cancellers for telephones

infinite time sequence

Factoring Large Numbers

Trial division

Each machine can be given a range

All computations completely independent

Quadratic sieve

Each machine can find relations in different ranges

All computations completely independent

When enough relations obtained they can be combined to give

factors

Every relation checked - don't need to trust individual machines

No relation is critical - can ignore any subset of relations if

enough relations

Final computation needs to solve large set of linear equations

Other Applications

Partial differential equations

Weather forcasting

Wind tunnel simulation

3-D solid modeling

Chemistry simulations

Protein folding

High-energy physics simulations