20.11.2012 Views

Contents Telektronikk - Telenor

Contents Telektronikk - Telenor

Contents Telektronikk - Telenor

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

158<br />

Segment size [byte]<br />

8192<br />

6144<br />

4096<br />

2048<br />

0<br />

0 5000 10000 15000<br />

Time [msec]<br />

(a)<br />

quence, a timer generated acknowledgment<br />

can change the segment flow on the<br />

connection. An example of this effect is<br />

found in Figure 3 (a) which displays a<br />

log (Sparc2, SBA-100/2.2.6) of the size<br />

of the segments on a connection with a<br />

4 kbyte window and a user data size of<br />

4096 bytes. There are three different flow<br />

behaviors on the connection; 4096 byte<br />

segments, fluctuation between 1016 and<br />

3080 byte segments, and fluctuation between<br />

2028 and 2068 byte segments. Initially,<br />

4096 byte segments are transmitted.<br />

After nearly 4 seconds a timer generated<br />

acknowledgment acknowledges all<br />

outstanding bytes and announces a window<br />

size of 3080 bytes. This acknowledgment<br />

is generated before all bytes are<br />

copied out of the socket receive buffer,<br />

and the window is thereby reduced corresponding<br />

to the number of bytes still in<br />

the socket receive buffer. This affects the<br />

size of the following segments which<br />

will fluctuate between 1016 and 3080<br />

bytes. Figure 3 (b) presents the segment<br />

flow before and after this timer generated<br />

acknowledgment which acknowledges<br />

4096 bytes and announces a window of<br />

3080 bytes. A similar chain of events<br />

gets the connection into a fluctuation of<br />

transmitting 2028 and 2026 byte segments.<br />

This shows up as the horizontal<br />

line of the last part of the segment size<br />

graph in Figure 3 (a).<br />

3.3 Host architecture and<br />

network adapters<br />

To establish how the difference in the<br />

Sparc2 and the Sparc10 bus architecture<br />

influences the achievable performance<br />

we measured the time for the send and<br />

receive paths for both architectures.<br />

Using the SBA-100 adapters, the measured<br />

driver times are proportional to the<br />

segment size. The receive times of the<br />

SBA-100 adapter include the time it<br />

[byte]<br />

8192<br />

6144<br />

4096<br />

2048<br />

Figure 3 Timer generated acknowledgments<br />

0<br />

3930 3935 3940 3945<br />

Time [msec]<br />

3950 3955 3960<br />

takes to read the cells from the receive<br />

FIFO on the adapter. The corresponding<br />

send times include the time it takes to<br />

write the cells to the transmit FIFO on<br />

the adapter. Using the SBA-200 adapters<br />

the measurable driver times are more or<br />

less byte independent. Obviously, the<br />

driver times for the DMA-based SBA-<br />

200 adapter do not include the time to<br />

transfer the segment between host memory<br />

and the network adapter memory.<br />

(We do not have an Sbus analyzer.) Figure<br />

4 presents for different segment sizes<br />

for both Sparc2 and Sparc10 the total<br />

send and receive times and the driver<br />

send and receive times as seen from the<br />

host:<br />

- the total send time is the time from the<br />

write call is issued to the driver is finished<br />

processing the outgoing segment,<br />

- the driver send time is the time from<br />

the driver processing starts until it is<br />

finished processing the outgoing segment.<br />

- the total receive time is the time from<br />

the host starts processing the network<br />

hardware interrupt to the return of the<br />

read system call, and<br />

- the driver receive time is the time from<br />

the host starts processing the network<br />

hardware interrupt until the packet has<br />

been inserted into the IP input queue.<br />

Each measurement point is the average<br />

of 1000 samples. A client-server program<br />

was written to control the segment flow<br />

through the sending and receiving end<br />

system. The client issues a request which<br />

is answered by a response from the server.<br />

Both the request segment and the<br />

response segment are of the same size.<br />

The reported send and receive times are<br />

taken as the average of the measured<br />

send and receive times at both the client<br />

and the server. To be able to send single<br />

Timer generated acknowledgment<br />

Window size<br />

Unacknowledged bytes<br />

(b)<br />

segments of sizes up to MSS bytes, we<br />

removed the 4096-byte copy limit of the<br />

socket layer (Section 3.1).<br />

As expected, the receive operation is the<br />

most time-consuming. The total send and<br />

receive processing times of the Sparc10<br />

are shorter for all segment sizes compared<br />

to the Sparc2. However, using the<br />

SBA-100 adapters, the driver processing<br />

times are in general faster on the Sparc2.<br />

(The only exception is for small segments.)<br />

This is due to the fact that the<br />

Sparc10 CPU does not have direct access<br />

to the Sbus. The latency to access onadapter<br />

memory is thereby longer. Thus,<br />

a 4.5 times as powerful CPU does not<br />

guarantee higher performance with programmed<br />

I/O adapters. As mentioned<br />

above, the SBA-200 driver processing<br />

times do not include the moving of data<br />

between the host memory and the network<br />

adapter. The Sparc10 SBA-200<br />

driver send times are longer than the<br />

driver receive times. For Sparc2 it is the<br />

other way round. On transmit the driver<br />

must dynamically set up a vector of<br />

DMA address-length pairs. On receive,<br />

only a length field and a pointer need to<br />

be updated. The Sparc2 must in addition<br />

do an invalidation of cache lines mapping<br />

the pages of the receive buffers,<br />

while the Sparc10 runs a single cache<br />

invalidation routine. The Sparc10 SBA-<br />

200 driver send time is slightly longer<br />

than the corresponding Sparc2 times, as<br />

the Sparc10 sets up DVMA mappings for<br />

the buffers to be transmitted.<br />

The send and receive processing times<br />

reflect the average time to process one<br />

single segment. The processing times of<br />

the receive path do not include the time<br />

from the network interface poses an<br />

interrupt until the interrupt is served by<br />

the network driver. Neither do the times<br />

include the acknowledgment generation<br />

and reception. The numbers are therefore

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!