10.07.2015 Views

DTJ Number 3 September 1987 - Digital Technical Journals

DTJ Number 3 September 1987 - Digital Technical Journals

DTJ Number 3 September 1987 - Digital Technical Journals

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Netwo;king Products<strong>Digital</strong> <strong>Technical</strong>Journal<strong>Digital</strong> Equipment Corporation<strong>Number</strong> 3<strong>September</strong> I 986


Contents8 ForewordWilliam R. Johnson, Jr.10 <strong>Digital</strong> Network Architecture OverviewAnthony G. Lauck, David R. Oran, and Radia J. PerlmanNew Products2 5 Performance Analysis and Modeling of <strong>Digital</strong>'sNetworking ArchitectureRaj Jain and William R. Hawe35 The DECnetjSNA Gateway Product-A Case Studyin Cross Vendor NetworkingJohn P:.. ._orency, David Poner, Richard P. Pitkin, and David R. Oran54 The Extended Local Area Network Architectureand LANBridge 100William R. Hawe, Mark F. Kempf, and Alan). Kirby7 3 Terminal Servers on Ethernet Local Area NetworksBruce E. Mann, Colin Strutt, and Mark F. Kempf88 The DECnet-VAX Product-An Integrated Approachto NetworkingPaul R. Beck and James A. Krycka100 The DECnet-ULTRIX SoftwareJohn Forecast, James L. Jackson, and Jeffrey A. Schriesheim108 The DECnet-DOS SystemPeter 0. Mierswa, David). Mitton, and Maha L. Spence117 The Evolution of Network Management ProductsNancy R. La Pelle, Mark). Seger, and Mark W. Sylor129 The NMCCjDECnet Monitor DesignMark W. Sylor1


Editor's IntroductionRichard W. BeaneEditorThis third issue features papers about <strong>Digital</strong>'s networkingproducts. <strong>Digital</strong> was an early advocate ofdistributed interactive computing, a conceptallowing systems resources to be shared amongmany users over a network. Just as two personsfrorri different cultures have problems communi·eating, however, so do computers with differentdesigns. Some standard set of rules is needed toallow successful interaction.The <strong>Digital</strong> Network Architecture (DNA) is theset of rules that defines how <strong>Digital</strong>'s productscommunicate over a network. Being flexible, thisarchitecture allows many ways for design ·groups toimplement the DNA rules into various DECnetproducts.The first paper discusses the DNA structure andhow it has evolved. Tony Lauck, Dave Oran, andRadia Perlman describe DNA's design goals and thenew functions supported in its four developmentphases. The tasks performed by the eight DNA layersare explained, with particular emphasis on thenetwork management and routing layers.To achieve high performance, models and simulationswere used to test the DNA structure. Thepaper by Raj Jain and Bill Hawe relates some casestudies, one for each layer, that resulted in fastercommunication. These models helped to optimizehow data packets.are handled by simulating differenttraffic patterns.Although the DNA and SNA architectures arequite different, they can communicate through theDECnetjSNA Gateway product. John Morency,Dave Porter, Richard Pitkin, and Dave Orandescribe how the gateway's design accomplishesthis communication. The authors describe thecomponents in each architecture and how messagesare structured.The paper by Bill Hawe, Mark Kempf, and AIKirby reports how studies of potential new broadbandproducts led to the development of theExtended LAN Architecture. The design of theLANBridge 100, the first product incorporatingthat architecture, is described, along with thetrade-offs made to achieve high performance.The speed of communication between terminalsand systems depends on how they are connected.Bruce Mann, Colin Strutt, and Mark Kempf explainhow they developed the LAT protocol to connectterminals to hosts on an Ethernet. The EthernetTerminal Server, the DECserver 100, and theDECserver 200 all use this new protocol.The next three papers describe how DNA wasincorporated into three different operating systems.The first paper, by Paul Beck and Jim Krycka,explains how the DNA principles were built intothe VAXJVMS system. The authors describe howtransparency was achieved by a tight couplingbetween the VMS software and the DECnet structure.The ULTRIX software is <strong>Digital</strong>'s second operatingsystem for its VAX computers. In the secondpaper, John Forecast, Jim Jackson, and JeffSchriesheim describe how they blended DNA intothe ULTRIX software. Several unique tools weredeveloped to avoid changes to existing DNA implementations.The DNA architecture has also beenincorporated into the MS-DOS system in theDECnet-DOS product. The third paper, by PeterMierswa, Dave Mitton, and Marty Spence, describeshow they built communication services intoMS-DOS's background by writing new code andborrowing existing code from the DECnet-ULTRIXsoftware.The final two papers discuss an important aspectof any network: its management. Nancy La Pelle,Mark Seger, and Mark Sylor discuss how networkmanagement is built into many diverse DECnetproducts. They describe <strong>Digital</strong>'s common managementarchitecture and the need to meld themanagement of voice and data networks. TheNMCCJDECnet Monitor controls a DECnet networkfrom a central location. Mark Sylor relates how thismonitor functions, describing its database structureand reports for the network manager. Themonitor's analysis techniques to identify real-timeproblems are especially interesting.I thank John Adams, Andrea Finger, and WaltRonsicki for their help in preparing this issue._2) 02


BiographiesPaul R. Beck As a consulting software engineer, Paul Beck is currently thearchitect and was project leader for the DECnet-VAX product. He designedrecovery mechanisms for high-availability software in the VMS group andwas the network software architect for the DECdataway product. Beforecoming to <strong>Digital</strong> in 1977, he worked as a senior systems analyst at AppliedData Research, Inc. Paul earned a B.E.S. degree (1969) from Johns HopkinsUniversity and an M.S.E.E. degree (1970) from Stanford University. He is amember of Tau Beta Pi and Eta Kappa Nu.John Forecast John Forecast received his B.A. degree from the Universityof Lancaster in 1971 and his Ph.D. degree from the University of Essexin 1975. Joining <strong>Digital</strong> in the United Kingdom in 1974, he later movedto the United States to join the newly formed group that developed theDECnet-RSX Phase 2 products. John held various positions within this groupthrough the development of DECnet Phase IV, then worked on the DECnet­ULTRIX project. He is currently a consulting software engineer irt the LocalArea Systems Group.William R. Hawe A consulting engineer, Bill Hawe manages the DistributedSystems Architecture Group. He is designing new LAN interconnectsystem architectures and integrating ISO standards intj:> DNA. Bill helped todevelop the Extended LAN Architecture. For Corporate Research, he workedon the Ethernet design with Xerox and Intel Corporations and analyzed theperformances of new communications technologies. Before joining <strong>Digital</strong>in 1980, Bill taught networking and electrical enginering at SoutheasternMassachusetts University, where he earned his B.S.E.E.:and M.S.E.E. degrees.Bill is a member of the IEEE 802 Local Networks Standards Committee.James L. Jackson In 1976, Jim Jackson came to <strong>Digital</strong> after receiving aB.A.Sc . (EE.) degree (1973) from the University of Waterloo and anM.Eng.Sc. (E.E.jC.S.) degree (1976) from the University of Queensland. Hecontributed to several projects since DECnet Phase 1 ' 1 was conceived. Jimwas project leaderjmanager for DECnet-ULTRIX V1.0 and was developerand project leader for multiple versions of DECnet-RSX and DECnet-IAS. He.also supervised a release of the DECnet Router and s01ire advanced develop·ment work. Currently, Jim is a software development manager working ondistributed systems services projects. He is a member, of the IEEE.3


BiographiesRaj Jain Raj Jain graduated from A.P.S. University (B.E., 1972), the IndianInstitute of Science (M.E., 1974), and Harvard University (Ph.D., 1978) .Joining <strong>Digital</strong> in 1978, he worked on performance modeling of VAXclustersystems, and Ethernet and other DNA protocols. During a one-year sabbaticalat M.I.T., Raj taught a graduate course on modeling techniques. As a consultingengineer, he is now engaged in performance modeling for distributedsystems and networking architectures. A member of ACM and senior memberof IEEE, Raj has written over 15 papers on performane analyses and is writinga textbook on performance analysis techniques.Mark F. Kempf Mark Kempf is currently involved in planning <strong>Digital</strong>'snext generation of interconnect products. A consulting engineer, he was theproject manager for advanced development of the LANBridge 1 00 andDECserver 100. Coming to <strong>Digital</strong> in 1979, Mark worked on software for aDECnet front end and one of <strong>Digital</strong>'s first implementations of Ethernet. Earlier,he worked at Standard Oil of Indiana on real-time process control systems.Mark earned a B.S. degree from Northwestern University in 1972. Heholds three patents, including one in bridge technology.Alan J. Kirby As the manager of the Communications and Distributed SystemsAdvanced Development Group, Alan Kirby has managed and participatedin the development of products such as the DECserver 1 00 and theLANBridge 1 00. Before joining <strong>Digital</strong> in 1981, Alan was the manager of networkdevelopment at National CSS, Inc., where he helped to design a largescalepacket switching network. He received a B.S. degree (1974) in computerscience from Worcester Polytechnic Institute and an M.S. degree(1979) at the Polytechnic Institute of New York. Alan is a member of theIEEE.James A. Krycka A principal software engineer, Jim Krycka is a supervisorin the VMS Development Group and project leader of the VMS Batch/Print facility. Joining <strong>Digital</strong> in 1972 . as a software specialist, he providedtechnical support for PDP-1 1 and PDP-8 systems. In the VMS group, Jimdesigned and implemented the remote file access portion of the DECnet­VAX software. He also helped to design the data access protocol and represented<strong>Digital</strong> on the ANSI committee working on ISO networking standards.Jim earned a B.S. degree (1970) in computer science from Michigan StateUniversity.Nan cy R. La Pelle As a software engineering manager, Nancy La Pelleoversees the development of management software for network products.She chaired the task force to specify preliminary network managementrequirements for DNA Phase V. Joining <strong>Digital</strong> in 1977 as a senior softwareengineer, Nancy later worked in Software Services and Customer ServicesSystems Engineering. Earlier, she performed systems analysis and programmingfor several companies. She earned a B.A. degree ( 1966) in French fromthe University of Pennsylvania and an M.A. degree ( 1968) in linguistics fromCornell University and studied for a Ph.D.· degree at M.I.T.4


Anthony G. Lauck Tony Lauck is a corporate consulting engineer andgroup manager for architecture and advanced development of distributedsystems. He headed the development of Phases II, III, and N of the DNAarchitecture. A member of the task force that led toithe development ofEthernet, Tony also led the effort to standardize it. His group developed conceptsand built prototypes for the LANBridge 100 and ;DECserver products.Before joining <strong>Digital</strong> in 1974, Tony worked for Autex Inc., and the SmithsonianAstrophysical Observatory. He earned a B.A. degree ( 1964) in mathematicsfrom Harvard University.'IIiBruce E. Mann A consulting engineer, Bruce Mann[ is now studying theapplication of <strong>Digital</strong> architectures to commercial on•l'line transaction processing.He wrote the IAT architecture, creating its first prototypes andproducts. An early contributor to Ethernet projects, B d. ce helped to designthe system interfaces. In 1978 he used networking to Jmomate engine testsat the Volkswagon Research Laboratories. Before joinidg <strong>Digital</strong> in 1976, hedesigned medical computer systems at the Harvard Medical School. Bruceearned a B.S.E.E. degree in 1971 from Cornell Univtrsity and with threeother engineers has applied for a patent on the IAT p t otocol.IPeter 0. Mierswa A consulting engineer, Peter Mielwa is project leaderfor the DECnet-DOS and DECnet-Rainbow products. Hcl is currently studyingthe integration of OSI protocols into DECnet implemehtations. With <strong>Digital</strong>since 1977, Peter was a software specialist supporting sales, then a networksconsultant in the Large Computer Group. He was the project leader for theDATATRIEVE-20 system. Peter received a B.S.C.S. degree (Cum Laude) fromS.U.N.Y. at Stony Brook, where his faculty named him i t s best graduating student.He worked for S.U.N.Y. as a systems programm d r after graduation.David J. Mitton Educated at the University of Michigan (B.S. ComputerEngineering, 1977), Dave Mitton joined <strong>Digital</strong> after graduation. He firstworked as a software engineer on communications microcode, developingfirmware for microprocessors. Later, on the DECnet-RSX development team,Dave designed file access and transfer utilities. He was also the corporatearchitect for the data access protocol. After organizing the DECnet-DOS project,Dave was that product's principal designer and developer of the internalsarchitecture and implementation. He is currently a principal engineer.john P. Morency John Morency is a consulting engineer currently definingfuture communication server architectures and developing simulationmodels for IBM interconnect products. He was a principal contributor tothe DECnetjSNA Gateway. In other positions John performed worldwidetechnical support for DECnet, IBM, and X.25 products and supported salesof networking products to the banking and insurance industries. Prior tojoining <strong>Digital</strong> in 1978, John worked at IBM Corporation and at GeneralElectric Company. He earned a B.S. degree (Magna Cum Laude) in mathematicsand computer science from the University of New Hampshire in1974.5


BiographiesDavid R. Oran Dave Oran is a network architect working on the DNAnaming service. He also worked on the SNA Gateway architecture andsupported customers with large networks. Dave represents <strong>Digital</strong> on theANSI and ISO committees for the OSI network layer. Before coming to<strong>Digital</strong> in 1976, he designed a nationwide network for the largest bank inMexico and programmed at NASA. Earning a B.A. degree (1970) inEnglish and physics from Haverford College, Dave is a member of ACMand was vice chairman of the Ninth Data Communications Symposium in1985.Radia J. Perlman Radia Perlman is a consulting engineer responsiblefor the specification of the protocols and algorithms in DNA's routinglayer. On the LANBridge 100 project, she designed the spanning-treealgorithm. Before joining <strong>Digital</strong> in 1980, Radia was the network architecton the ARPA Packet Radio Network at BB&N, Inc. She earned her S.B.(1973) and S.M. (1976) degrees from M.I.T. , both in mathematics. Radiais on the editorial board of Computer Networks and ISDN Systems andhas published numerous papers on networks.Richard P. Pitkin As a principal engineer and project leader, RichardPitkin was a senior contributor to the DECnetjSNA Gateway and VMSjSNAprojects. His major work was in product development, testing, and performanceanalysis. Currently, Richard is assessing the IEEE 802.5 tokenring standard. In previous work, he was a principal software specialistinvolved in worldwide technical support for IBM interconnect products.Before coming to <strong>Digital</strong> in 1979, Richard supported large timesharingsystems for the State of Massachusetts. He earned his B.S. degree in mathematicsfrom the University of Massachusetts, Boston.David Porter Dave Porter joined <strong>Digital</strong> after earning his B.Sc. (Hons)degree with first-class honours in 1977 from Leeds University.In the U.K. , he designed X.25 interconnect products, and on U.S. assignment,he worked on the SNA Gateway V1.0. Returning to the U.K.,Dave developed a product connecting to IBM's DISOSS system. Currently,back in the U.S., he is a principal software engineer working on IBM interconnectproducts. Dave specified the architecture for the SNA GatewayAccess Protocol V2.0 and worked on gateway management and security.Jeffrey A. Schriesheim Educated at S. U.N.Y. at Binghamton, JeffSchriesheim came to <strong>Digital</strong> in 1976 after earning B.A. (Fine Arts, 1970)and M.S. (Systems Design, 1975) degrees. He worked as a design engineer,supervisor, and consultant on DECnet Phases II, III, and IV on softwaresupporting the RSX, lAS, PRO, and ULTRIX systems. Jeff also contributedto the design of Ethernet products, including the DECnet Routerand various terminal servers. Currently a consulting engineer, Jeff isworking on extending DECnet services. He is a member of the DECnetReview Group.6


ForewordWilliam R. Johnson, Jr. 'Vice President,Distributed SystemsDuring the 1970s, the concept of the mini.computerchanged from a small computing enginewith minimal software to an effective, efficientgeneral-purpose computer. While this changeoccurred, the need to exchange informationamong these computers became progres·sively greater. In most cases information wasexchanged using magnetic tape or, in the case ofmany DEC computers, DECtape. In the early sev·enties, data communications was in its infancy;the only widely used communications protocolwas 2780 BISYNC from IBM Corporation, a protocolfor remote job entry. At this time, <strong>Digital</strong>was becoming very successful at a new kind ofcomputing, called interactive computing. Itbecame clear that we needed a flexible way tointerconnect <strong>Digital</strong>'s systems, giving our customersthe ability to share resources among thesemachines.As a result of this realization, a small group ofpeople was asked to specify a network architecture.That architecture was intended to workacross multiple operating systems and to supportmultiple functions-file transfer, remoteresource access, and virtual terminals -andmultiple communication technologies -leasedlines, X.25, and dial-up networks. As it turnedout, that task was well beyond <strong>Digital</strong>'s abilityto complete; for that matter, it was well beyondthe existing state of the art. Therefore, DECnetPhase I fell far short of the ambitious goals set forit. The reality of DECnet Phase I proved to be aset of products confined to the RSX family ofoperating systems, having limited functionality,and poor performance and maintainability.Although creating serious problems in thefield, DECnet Phase I also forced us to make acomplete reappraisal of what it meant to be inthe network business. As a result of this painful,yet valuable, Phase I experience, network specialistswith direct communication to Engineeringwere placed in the field. And a strong architecturalprocess, managed by Tony Lauck, anda certification and verification process wereforged to ensure that products conformed tothe DECnet architecture. The result was DECnetPhase II, the first set of DECnet products that ranon multiple operating systems. DECnet Phase IIprovided more user functionality, much bettermaintainability, and a much more robust networkarchitecture than DECnet Phase I. However,DECnet Phase II was still only usefulforsmall networks since it did not support routing.And performance remained a problem, so that<strong>Digital</strong> was still not viewed in the industry as anetworking leader.That recognition came with DECnet Phase III.Phase III provided adaptive routing, making possiblethe building of networks with over 200nodes. Additional user functionality was providedwith the virtual terminal capability and therelease of the first SNA-interconnect product.Although difficult to accept now, a network ofover 200 nodes was considered very large in1980; there were severe reservations about itsability to be managed. Yet DECnet Phase III wasa major achievement. It was supported on sevenoperating systems and three hardware families:the 36-bit, 32-bit, and 16-bit CPUs. DECnetPhase III was also quite reliable. Our experiencewith the Phase III family of products was so goodthat many of the field controls erected to dealwith Phase I and Phase II problems wereremoved.All our problems, however, were not solvedyet. The cost of the network links with DECnetPhase III was still excessive. Thus the majorobjective of DECnet Phase IV was to reduce thiscost by supporting a low-cost, high-performancemultipoint interconnect: the Ethernet. Sincereducing the cost of networking was likely to8


increase the sizes of networks, we thought it wasalso important to eliminate the Phase III restrictionof 256 nodes. Therefore, the architecturalrestriction for Phase IV was increased to 64,000nodes, although the practical limit was abouthalf that number. At that time, there were seriousdebates about whether anyone would ever buildsuch a large network. We now know that <strong>Digital</strong>'sown internal network will be that large bythe end of June <strong>1987</strong>.Using the Ethernet concept was a major undertakingfor <strong>Digital</strong>. The only way to get an interconnectthat had both low cost and high speedwas to use a standard. Since no such standardexisted, we had to create one. As a company, wehad little experience in generating standards;many people confidently predicted that such aneffort would fail. Through hard work and persistence,however, we succeeded in that Ethernetstandardization effort. Today, Ethernet is both.the accepted market leader and, as IEEE 802.3,the only approved standard for local areanetworks.Shipments of DECnet Phase IV began in 1983.Almost immediately, customers started tomigrate from their old point-to-point networks tothe new multipoint Ethernet. By this time, aninstalled base of over 10,000 DECnet nodesexisted in the field. <strong>Digital</strong> was adding to thatbase at the rate of 3,000 per year. Since theannouncement of DECnet Phase IV, the growthin DECnet systems has been extremely rapid.From July 1986 to June <strong>1987</strong>, Digitafwill shipover 30,000 DECnet licenses. By the end of thatperiod, there will be over 100,000 computersrunning the DECnet system. Clearly, the DECnetconcept has been both a technical and commercialsuccess. From a difficult and problem-filledstart, it has evolved to become the standard bywhich other peer-to-peer networking productsare measured. The concepts embodied in theDECnet architecture have been incorporated inmany international standards. Much of that standardswork has been done by <strong>Digital</strong>'s DECnetarchitects and implementers.<strong>Digital</strong> has played a major role in the developmentof the OSI protocols, which will over timebecome the international standard for networking.Small working groups performed much ofthe technical work to develop these protocols. Inimany cases DECnet architects and developersIplayed a very significant part in ensuring that thestandard was technically exFellent. Today, <strong>Digital</strong>is recognized as a leader in OSI because of ourIwork in standards and the OSI products we arecurrently shipping. Had it npt been for the experiencewe gained from DECriet development, it isIquite likely that the OSI activity would be muchfurther behind. !By the standards of the cdmputer industry, tenyears is a very long time. Ye.t many of the peopleworking on the DECnet products today wereworking on them ten years ago. Most of theauthors who wrote the papers in this issue ofthe <strong>Digital</strong> <strong>Technical</strong> jo urnal were involvedwith DECnet Phase III; all of them were involvedwith Ethernet. The DECnet architecture, as weknow it today, is the result of many people workingtogether, trying to solve a problem that formany years was imperfectly understood. Eventoday, it seems barely credible. The papers containedwithin represent the work of many people,not just these authors. These papers describeIan environment that continues to evolve at arapid rate. That environment is now fundamen-1tally altering the way peope use computers.i9


Anthony G. LauckIDavid R. OranRadia J. PerlmanA <strong>Digital</strong> NetworkArchitecture OverviewThe <strong>Digital</strong> Network Architecture (DNA) defi nes the functions, structure,interfaces, and protocols used in DECnet computer networks. These networkscan be constructed from both local area and wide area communicationtechnologies. Although evolving through jour phases in ten years,the DNA design goals have remained constant. Each phase bas supportednew technologies and applications, yet bas retained backward compatibility.Phase W contains the latest architectural design. The DNA functionsare described with emphasis placed on the relationships betweenlayers and bow they cooperate to eliminate duplicate tasks. DNA's futureevolution is discussed, showing <strong>Digital</strong>'s commitment to the open architectureprinciple.Design Goals of the ArchitectureA small number of design goals have guided theevolution of the DNA architecture through itsinitial version, Phase I, to its current version,Phase N. These goals are described below.Common User In terfa ceA single interface to a computer network shouldsupport a broad range of applications, isolatingthem from the details of network configurationand communications technology. This isolationallows a network to accommodate new applicationsas they are developed, sharing communicationsfacilities with existing applications. Thusnetworks can expand to adapt to new communicationstechnologies without adversely affectingthose existing applications.Wide Range of CommunicationsFa cilitiesTo be cost effective, computer networks mustsupport a wide range of communications facilitieswith a variety of cost, performance, and distancetrade-offs. For example, an Ethernet localarea network (LAN) can economically supportdata communication in a building or on a campusat a data rate of 10 million bits per second.Leased lines, on the other hand, are currentlyeconomical at a data rate of 9600 bits per secondbut over thousands of miles. Since communicationresources should be shared among users,these trade-offs point out the need to use differingfacilities in tandem.Wide Range of TopologiesCost-effective computer networks have many differentconfigurations. Those differences reflectthe location of computer systems, the availabilityof communications facilities, the applicationtraffic patterns, and the performance requirements.These configurations usually change as anetwork grows, yet the results may not always beoptimum for changing traffic patterns. Theactual network configuration may differ from theintended configuration due to failures. A networkmust accommodate these changing configurations,or topologies, to provide a uniform serviceto its users.Available ServiceA computer network must provide an availablecommunications service that its userscan depend on to run their applications. Thisrequirement implies that networks must detectand recover from failures and isolate and bypassfaults. Single failures should minimally affect thenetwork's operation.ExtensibleA computer network must be able to evolve toadapt to new technologies. Older and newercomputer systems and communications facilities10<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New Productsmust be able to coexist in the same network.Thus the network architecture must adjust tonew technologies, some of which were not envisionedwhen the architecture was originallydeveloped.Easily ImplementedThe network architecture must be implementedon a range of computer systems, from small personalcomputers to superminicomputer ormainframe systems. The architecture must beimplemented over a range of communicationhardware and be cost effective so that eithersmall or large networks can be constructed. Thisneed implies that the architecture must permitsimple implementations as well as more complexones to conform to the needs of individualcomputer systems and network configurations.Cost EffectiveImplementations of the DNA architecture shouldbe cost effective compared to the alternative ofan application-specific network architecture.This attribute will encourage the use of a commonnetwork architecture, with the resultingeconomies of scale.Design PrinciplesIn addition to the goals described above, thedevelopment of DNA has been guided by anumber of important design principles. Wechose these principles in concert with thegoals and with the benefit of experience inresearch networks. Such networks includethe ARPA, National Physical Laboratory, andCyclades networks. These design principles aredescribed below. Of course, several generaldesign principles, such as simplicity and modularity,also guided the development of the DNAarchitecture.Distribute FunctionsFunctions should be distributed among thecomputer systems in a network to avoidsingle points of failure and encourage paralleloperation.Use a Hierarchically Layered StructureFunctions should be divided into layers to factorarchitecture complexity into easily understoodpieces and to facilitate the architecture's evolution.Lower layers should provide their definedservices without concerning themselves with<strong>Digital</strong> TecbntcalJournalNo. 3 <strong>September</strong> 1986upper layers. Upper layers should rely on the servicesprovided by lower laers without havingthe detailed knowledge of how they actuallyprovide those services.Address Computer Systems Un iformlyIt should be possible to communicate betweencomputer systems no matter where they arelocated in the network. This communicationcan be done if nodes are as$igned addresses thatcan be used anywhere in tll'e network to specifythe node as either a soure or destination ofmessages.Implement Functions at the Highest1Practical LevelWhen a function is implemented at a high level,it gains the use of lower-level functions, thussimplifying the implementation of the higherlevelfunction. If a function were implementedat a low level, it might haye to duplicate functionsalready provided at 1some intermediatelevel.·Use Dynamic AdaptaionThe configuration of a computer network willIconstantly change as the network expands and asits components fail and are testored to service. Itis highly desirable for the nFtwork to adapt to itscurrent configuration without manual intervention.This adaptability makes the network easierto install, easier to operate,'and more resistant tofailures.Use Stable, Robust AlgorithmsA computer network is a large, complex system.Like any such system, a etwork may exhibitunanticipated and undesirable behavior. WhenIdesigning critical algorithms, network reliabilityand availability must I not be compromisedin favor of small improements in averageIperformance. ,Evolution of DNA and DECnetProductsIn the ten years since it was first announced, DNAhas had four major phases, each bringing newcapabilities to the DECnet family. These capabilitieshave increased the rapge of computer systemsimplementing the architecture, the numberof applications and communications technologiessupported, and the sie and complexity ofnetwork topologies. In addition to functionalII11I


New ProductsOverview of the Phase W DNAArchitectureThe DNA structure, being hierarchically layered,supports communication between adjacent layerswithin a single system by using architecturallydefined interfaces.1 In addition, communicationtakes place between computer systemsby exchanging messages following architecturallydefined protocols. A protocol specificationdefines the messages to be exchanged andthe procedures for sending and receiving thosemessages.From an architecture perspective, a computernetwork is divided into a set of computer systems,each system being divided into layers. Eachlayer contains one or more protocol modules.Figure 1 illustrates the case of a two-nodenetwork in which communication betweenmodules takes place vertically, using interfaces.In other systems, communication betweenmodules can take place horizontally, usingprotocols.The DNA architecture specifies the functionalinterfaces between layers. These interfacesdefine the services each layer provides to higherlayers and thus firmly partitions the architectureinto layers. The interfaces defined by the standardare abstract specifications, concentrating onthe functions to be provided, not on the form ofthe interface implementation. The details ofinterface realization are left to the implementersof each DECnet product. These details typicallydepend on the structure and conventions establishedin each operating system.PROTOCOLSMODULESI•IIliNTER FACESNODE AFigure 1PROTOCOLSPROTOCOLSPROTOCOLS•I MODULES I LAYER nII INTERFACESNODE BBasic DNA StructureDNA provides a precise ,specification of theprotocols between layers. This precision is necessaryto ensure that separate DNA implementationscan intemperate. In some cases the detailsof protocol operation are left to the implementers.If incompatible dhoices are possible,the standard specifies rules for negotiation toselect compatible options.: The DNA specificationsinclude model protool implementationswith which all valid implementations must communicatecorrectly.·Protocols in the DNA arqhitecture are strictlylayered. These are peer protocols that define therelationships between modules in the same layerof two computer systems. 'the protocols in eachlayer operate independently, with communicationbetween layers taking b1ace only across thedefined interfaces. This structure allows theindependent replacement br addition of protocolsin each layer as the arthitecture evolves. Itshould be pointed out that strict layering is anarchitectural, not necess i rily an implementation,concept. Experien{ e in implementingDNA has shown that signficant performanceimprovements are possible by collapsing theimplementation of several l layers. However, thisperformance improvement ' is obtained at the costof reduced modularity and flexibility of that particularimplementation.Figure 2 defines the DNA layers and the interfacesbetween them. Each layer is discussedbelow in terms of its functions, interfaces, andprotocols.The user layer contains user-defined functionssuch as applications programs. Only one func:tion is specified by DNA for this layer: the networkcontrol program, or NCP. This is a networkmanagement module that implements thenetwork command language specified byDNA network management, thus providing auser interface to DNA network managementfunctions.Three interfaces to lower DNA layers arealso provided. Application programs in the userlayer can directly access the session layer forprogram-to-program communication. Theycan also use the application layer to gain accessto common network services, such as remotefile access. Finally, they can access the networkmanagement layer. Tha access allows programmersto write applications that enhance thebasic network management capabilities providedby NCP.<strong>Digital</strong> Tecbnlcal]ournalNo. 3 <strong>September</strong> 198613


A <strong>Digital</strong> Network Architecture OverviewIUSER MODULES+r=l NETWORK MANAGEMENT MODULESI NETWORK APPLICATION MODULESF=--1SESSION CONTROL MODULESEND COMMUNICATION MODULESROUTING MODULESDATA LINK MODULES!PHYSICAL LINK MODULES-CLIENT====- NETWORK MANAGEMENTFigure 2DNA Layers and InterfacesNetwork Management LayerThe network management layer provides decentralizedmanagement for a DECnet computer network.2The network management moduleswithin a node are responsible for two functions.First, they coordinate the management of thatparticular node; second, they communicate withpeer management modules in remote nodes asneeded to accomplish decentralized management.The network management module uses theservices of three lower layers to provide its functions.The network applications layer is used toprovide common network services, such asremote file access. The session control layer isused to communicate with peer entities asneeded for decentralized management. The modulehas a special interface to the data link layer sothat simple management functions can be providedbetween adjacent nodes. These functionsinclude remote restarting, down-line and up-lineloading, and loopback testing. Intermediate protocollayers are bypassed for these functions;such layers can be economically implemented inROMs.JJIJJJIIThree protocols are defined for the networkmanagement layer: the network information andcontrol exchange (NICE), the event logger, andthe maintenance operations protocols.NICE sends commands and responses betweenpeer network management modules in twonodes. The functions controlled include the following:• Loading and dumping remote systems• Changing and examining network parameters• Examining network counters and events thatindicate how the network is performing• Testing links at both the data link and sessioncontrol levels• Setting and displaying the states of nodes andlinesNICE is a simple request-response protocol thatuses the session control interface to permit network-widemanagement control.The event logger protocol sends significantevents from the nodes in which they are detectedto nodes in which these events can be logged.Examples of such events include lines coming upor going down, error counters reaching thresholdvalues, and nodes becoming unreachable.Events can be filtered at the source node, makingit possible to control the amount of network trafficgenerated by event logging. Like other networkmanagement functions, this filtering can becontrolled remotely using the NICE protocol.Event logging uses the session control interfaceto permit network-wide event logging. Eventsare sent to "logging sinks," which can be consoleprinters, disk files, or special applicationsprograms. .The maintenance operations protocol (MOP)performs loop back tests at the data link level,controls unattended systems remotely, anddown-line loads or up-line dumps computer systemshaving no mass storage.3 Like most data linkprotocols, MOP uses sequence numbers, timers,and acknowledgements to detect and recoverfrom data link failures. Unlike other DNA protocols,however, MOP is designed for extreme simplicityrather than performance. This designmakes it possible to implement MOP in smallROMs. MOP has also been implemented in communicationsdevices and bootstrap ROMs incomputer systems. Because MOP uses the data14<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New Productslink layer directly, its operation is restricted toadjacent nodes. A system to be down-line loadedmust be directly connected to its load server.That load server might be controlled by an NCPlocated elsewhere in a network (using NICE) .Moreover, the server might be reading the filecontaining the load image from yet anothernode (using the data access protocol, or DAP, inthe application layer) .In examining Figure 2, note the arrows on theleft side that connect the network managementlayer with each lower layer. Each arrow is a controlpath used by network management to coordinatethe activity of each layer in a node. Eachlower layer of the DNA structure specifies a networkmanagement interface that defines the networkmanagement functions provided by thatlayer.A basic principle followed in designing DNAnetwork management was to perform managementfunctions at.the highest level possible. Forexample, management functions use the regularapplications layer service for accessing filescontaining management information. Thesefunctions can also communicate using the normalservices of the session layer. The alternativeto this principle would have been to use somespecial-purpose mechanism to achieve the managementfunctions, thus adding unwarrantedcomplexity to the architecture and to its implementations.Out of practical necessity, however,the direct use of the data link layer by DNA managementrepresents a partial deviation from thisdesign principle. Implementing all the lowerlayers of DNA in ROM microcode was deemed tobe uneconomical. Although the sizes of low-costROM memories have expanded in the last tenyears, fixing all the DECnet layers into ROMremains undesirable. Changes to add new functionsor correct implementation or architecturalbugs would simply require too many costlyhardware updates.Another basic design principle followed indesigning DNA network management was to firstspecify primitive functions, then make themavailable to network managers or to specializedapplications programs. The goal was to have asimple, flexible structure implemented in allDECnet nodes while still providing the opportunityfor dedicating computing resources to themanagement of large networks. 4iiNetwork Applications: LayerThe network application lyer provides genericservices to the user and network managementlayers.5 These services include remote fileaccess and transfer, remote interactive terminalaccess, and gateway access to non-DNA systems.6Modules in this layer operare independently andasynchronously. A single DECnet node may supportmany different network applications modules,which communicate using many differentprotocols. This layer supports modules suppliedby both <strong>Digital</strong> and users. As new network applicationsare developed, they can easily be addedto this layer. One application layer protocol isdescribed below to illustrate a typical operationof the applications layer.1DAP provides remote file access and transfer.Two cooperating application layer modulesexchange DAP messages using the DNA sessioncoiurol service: the user :module at the noderequests the file operation, and the server moduleacts on the user's behaif at the remote node.Figure 3 depicts those services.These applications layer modules operateunder the control of either"a utility program or auser program residing i the user layer. Forexample, a file transfer oeration might be initiatedby a utility program. In some DECnetimplementations, such as the DECnet-VMS system,remote file operations are initiated by normalVMS user programs using VMS file systemcalls. File naming conventions will determinewhether or not a local or remote · file operationis implemented.7In a typical DAP protocol dialogue, the firstmessage exchange involves configuration messagesproviding information about the operatingand file systems. Those essages are followedby attribute messages that supply informationabout the file. An access-tequest message typi-1cally follows to open a picular file. A controlmessage then sets up th'e data stream. Afterfile transfer has completed, access-completemessages will terminate the data stream. Withthese messages, either an entire file can betransferred at one time or· portions of a file canbe transferred, either randomly, sequentially, orindexed.<strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 198615


A <strong>Digital</strong> Network Architecture OverviewLOCAL NODEREMOTE NODEUSER LAYERTHE ACCESSING PROGRAM:•NF"i' UTILITY•A VAX/VMS COMMAND• A USER-WRITTEN PROGRAMNETWORK APPLICATION LAYER.----t DAP - SPEAKING ROUTINES:• NFARs, OR ROUTINES IN THELOCAL FILE SYSTEM (E.G.,RMS)SESSION CONTROL LAYEREND COMMUNICATION LAYERROUTING LAYERDATA LINK LAYERNETWORK APPLICATION LAYER·- ------ -·DAP - SPEAKING SERVER:• FILE ACCESS LISTENERSESSION CONTROL LAYEREND COMMUNICATION LAYERROUTING LA YEADATA LINK LAYERDAP - DATA ACCESS PROTOCOLFCS - FILE CONTROL SYSTEMRMS - RECORD MANAGEMENT SERVICESNFARs - NETWORK FILE ACCESS ROUTINESNFT - NETWORK FILE TRANSFER UTILITYFigure 3File Transfer across a NetworkDAP's principal design problem was accommodatingthe needs of diverse file systems. It wasnecessary to define a mapping between the featuresand functions of each different system. Thisdefinition was not always easy to make. Some systemshad differing capabilities (for example,some supported index files); others had differingmeans of providing similar capabilities (forexample, stream or record structures for textfiles). Moreover, it was very important for filetransfers between like systems to operate at maximumefficiency and to be completely transparent.For example, it should be possible to copy afile from one VMS system to another and stillretain exactly the same bit patterns in the copy.Two design approaches were studied to achievethese capabilities: using a common, or canonical,file format in protocol messages; and performingneeded translations. The canonical formatwas rejected because it was not transparentor efficient enough in the homogeneous case.The second approach, in which translation isperformed at the client DAP protocol module,was adopted.Session Control LayerThe session control layer resides directly abovethe end communications layer.8 The session controllayer provides system-dependent, process-toprocesscommunication functions for processesresiding in the user, network management, andnetwork application layers. These functionsbridge the gap between the pure communicationfunctions provided by the end communicationslayer and the functions required by processesrunning under an operating system. The communicationservice provided by the session controllayer is connection oriented: an initiating processrequests a connection to a destination process.The session control layer manages theseconnections. Once a connection is established,data flows between the processes without furtherintervention by the session control layer, usingthe facilities provided by the end communicationslayer.When establishing a connection, the higherlayer specifies th e destination process in twoparts: first, by destination node, then, by processwithin destination node. Destination nodes are16<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New Productsspecified by a six-character node name. Each sessioncontrol module contains a local copy of anode database that maps between the nodenames and the 16-bit node addresses used in theend communications and routing layers. Thisnode database is set up under the control of networkmanagement and can be updated in adecentralized fashion.Different operating systems employ differentconventions to identify their processes. Therefore,selecting a specific process in the destinationsystem depends on the particular operatingsystem being used. However, the DNA sessioncontrol layer provides a mechanism for specifyingprocesses generically by their function, usingan object-type field. Thus the session controlarchitecture specifies a mapping betweenreserved object-type field values and specificupper-layer protocol modules. For example, aspecific object-type code is reserved to designatethe process or processes implementing theserver end of the DAP file access protocol. Thiscode frees most network usage from having toknow the details of process addressing by theoperating system.The session control module at the destinationsystem will either map an incoming connectionrequest onto an existing active process, activate aprocess, or create a new process, whichever isappropriate. For example, consider two possibleimplementations of the DAP server process. Oneimplementation is multithreaded and supportsmultiple simultaneous connections. When thefirst connection is requested, the process wouldbe activated. Subsequent requests would maponto the existing process. The second implementationis not multithreaded and supports only asingle simultaneous connection. Each time aconnection request is received, the connectionmust be mapped onto a new process, which mayneed to be activated or created. Whether processesare activiated or created depends on operatingsystem conventions and reflects the costsof creating processes and keeping processesdormant.The session control layer provides one otherfunction: it validates incoming connectingrequests using access control information providedby the requesting session control module.The details of this validation information dependon the access control mechanisms provided bythe destination operating system. This informa-tion typically identifies the requesting user orrequested account and, opionally, a password.End Communications LayerThe end communications layer, residing immediatelyabove the routing layer, provides a standardizedcommunication service used by thehigher layers of the DNA software.9 The end communicationslayer provides a reliable, sequential,connection-oriented Service to the sessioncontrol layer. The formr layer isolates thehigher layers from any transient errors orreordering of data introduced by lower layers. Italso provides a multiplexing function, enablingmultiple connections to be established betweenpairs of nodes or between a node and multiplenodes. These connections are called logicallinks.The network services pr9tocol (NSP) providesthe logical link service to the session layer,exchanging protocol messages using the routinglayer. The originating session control moduleasks the local NSP module to set up a logical linkto a remote session control module. If the remotenode is accessible via the routing layer and hasresources to support an additional connection,the remote NSP module will indicate its desire toconnect to the remote session layer. The NSPmodule will transfer session control protocoldata, such as user identifiers, passwords, and theDECnet object type. The remote session controlmodule can either accept or reject the link. Onlywhen the logical link is finally established can itsupport the flow of data.The connection management algorithms andprotocol mechanisms of NSP are designed toensure that data on each logical link flows independentlyfrom data on every other logical link.This independence takes two forms. First, datareceived on each connection will never bemixed with data from any other connection, evenone between the same two nodes and processes.This restriction enables higher-layer protocols toestablish initial synchronization and to reestablishsynchronization if one computer hascrashed. Second, if data flow must be blocked onone connection (for exarhple, because there isnothing to send or no buffers are available toreceive) , data can still flow on other connections.This data flow takes place even if memory,processing, and communications resources areshared between the two connections.<strong>Digital</strong> <strong>Technical</strong> journalNo. 3 <strong>September</strong> 198617


A <strong>Digital</strong> Network Architecture OverviewData flow on a logical link can be modeled bya pair of message queues in each direction. Onequeue in each direction handles the transmissionof "normal data" between higher-layer protoc olmodules; that pair is used by all higher-level protocols.The other queue pair handles transmissionof occasional shon "interrupt" messages;this pair is used by some higher-level protocols.For example, the virtual terminal protocol usesinterrupt messages to transmit interrupt commands,suc h as those generated when a userenters the command CTRL-Y to a VMS system. Oneach queue, data flows on each queue independentlyof the other queue. Data is transferredwhen the requesting session control module providesa message to be transmitted and the receivingsession control module indicates its willingnessto receive, by providing a buffer forexample. NSP uses protocol messages and flowcontrol algorithms to ensure an orderly data flowon logical links. An orderly flow takes place evenif limited by the ability of the sources to providedata, the network to transmit data, or the destinationto receive data.In providing the reliable logical link service;NSP must exist in a hostile environment. In particular,NSP must operate correctly when therouting layer occasionally loses, reorders, orduplicates messages. Moreover, NSP must dealwith potential confusion created by computers'crashing at one or both ends of the logical link;this is the problem of "half-open connections." 10· 11 NSP deals with these problems byassigning logical link identifiers to each logicallink and by assigning sequence numbers toeach data or flow-control message sent on eachlink. Timers are used to detect errors and initiateretries of operations: Should excessive retriesappear to be required, NSP will repon this problemto the session control layer, whic hcan decide to break the connection or continueretrying.9NSP has evolved with eac h phase of DNA. In·· Phase II, the NSP protocol was revised to allowdynamic sharing of message buffers between logical links. This capability, called optimistic flowcontrol, must deal with the delay between the. time that data is requested and the time that dataarrives. For example, during this delay on onelogical link, data on other links might arrive,thus consuming all the available NSP buffers.The NSP protocol was designed to handle thiscase correctly without deadlock.The Phase II version of NSP ran right on top ofa data link protocol, the <strong>Digital</strong> Data CommunicationsMessage Protocol (DDC MP). The DDC MPprotocol provided a reliable point-to-point communications service, rendering unnecessaryNSP's use of timers to detect lower layer failures.The Phase III version of NSP was designed to runon top of the routing layer. This version includeda timer capability to detect and recover fromrouting layer failures, such as the loss of a messagewhen an alternate route must be selectedfollowing a node or link failure.Only minor changes were made to NSP inPhase IV. Two changes improved the prot oc olperformance by reducing the number of controlmessages exchanged to perform flow-control anderror-recovery functions. First, the protocol messageformats and procedures were allowed tocombine control messages with each other andwith data messages. Second, provision was madefor selectively delaying acknowledgement messages,making it possible to send many data messagesfor eac h acknowledgement. In a typicalimplementation of NSP, these changes make itpossible for more than 90 percent of the messagestransmitted by NSP to be data messages.Reducing the number of messages exchangedimproves the throuhput of DECnet implementationson Ethernet LANs by reducing the CPU timeneeded to generate, transmit, rec eive, anddecode control messages. Reducing the numberof messages decreases the common-c arriercharges when running NSP over X. 25 public datanetworks, which charge for each packet.Routing LayerThe routing layer provides a network-wide messagedelivery service. 12 This layer accepts messagesfrom the end communic ations layer in asource node and forwards the packets, possiblythrough intermediate nodes, to a destinationnode. The routing layer implements a datagramservice, which delivers packetS on a best-effonbasis.n The routing layer makes no absoluteguarantees against packets being lost, duplicated,or delivered out of order. Such guaranteesare made by the end communications layer. Toprovide this network-wide service, the routinglayer calculates routes, using them to forwardpackets. In the forwarding process, the routinglayer must attempt to avoid or at least control anycongestion that results from overloading the networkwith excessive traffic. The routing layer18<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New Productsalso ensures that packets do not wander aroundfor too long before being delivered. Such "old"packets might confuse the end communicationslayer.The routing layer determines routes by usingan adaptive, distributed algorithm that respondsto changing network configurations. Routesbetween all sources and destinations are automaticallycalculated or recalculated by the routinglayer whenever new nodes or links are addedto or removed from the network. These changesto the network topology can either be plannedby the network operators or result from theunplanned failures and recoveries of networknodes and links.Routes are calculated on the basis of link costs,which typically are inversely proportional to linkspeed. The route to each destination node isalong a path having the minimum total path cost,which is the sum of the link costs along the path.Link costs can be set either by network managers,if desired, or by using default values. This feature,called adaptive routing, is a key aspect ofthe DNA software, making it very easy to operatelarge DECnet networks. Without adaptive routing,network operators and users would have tocalculate paths between all pairs of data sourcesand destinations, including alternative paths tohandle failures. These calculations rapidlybecome impractical as netWorks grow beyond afew nodes. Figure 4, depicting a small DECnetnetwork, illustrates how routes are chosen usinglink costs.The routing layer forwards packets based ona uniform addressing scheme. Each node in aDECnet network is assigned a unique address,used by the routing algorithms to calculateroutes. These addresses indicate each packet'ssource and destination and guide the forwardingdecision. A uniform addressing scheme allowsthe higher layers and network applications totreat the network as a uniform resource. InPhase IV, addresses are 16 bits long and have twocomponents: a 6-bit area field and a 1 O-bit nodeaddress. As mentioned above , a network can bedivided into a maximum ,of 62 areas of up to1,023 nodes. Routes are calculated at twolevels for each area: Level I routes carry trafficwithin the area; Level II routes carry trafficbetween areas. This hierarchical scheme makes itpossible to build very large DECnet networksyet minimizes the memory, communications,and processing requirements of the routingalgorithm.®, @ . .. !J) CIRCUIT COSTS IN INCREASING ORDERNODE A WANTS TO SEND A PACKET TO NODE D. THERE ARE THREE POSSIBLE PATHS.PATH PATH COST PATH LENGTHA to B, B to C, C to D ®+®+®r 3 HOPSA to B, B to D®+!J)g2 HOPSA to B, B to F, F to E, E to D ®+ ®+ ®+® 13 4 HOPS'7 IS THE LOWEST PATH COST; NODE A THEREFORE ROUTES THE PACKET TO NODE D VIA THIS PATH.Figure 4Routing Paths and Costs<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 198619


---A <strong>Digital</strong> Network Architecture OverviewRo ute calculatio n is done in a distributedfashion. Two types of no des are defined in DNA:end nodes, and routing no des. End no des haveonly a single attachment to a network; therefore,they do not need to calculate routes or forwardpackets on behalf of other no des. On theother hand, ro uting no des support multiple linksand forward traffic on behalf of other nodes;therefore, routing no des must calculate routes.Route calculation is performed using three majorcomponents:• An initialization sublayer that determineswhich links interconnect with which no des• A decision process at each routing node thatcalculates routes to all destinations (withinone area for Level I routing or to each areafor Level II routing)• An update process at each routing no de bywhich routing nodes exchange informationabout their ro utesThe routing algorithm runs whenever the initializationsublayer at a routing node detects alo cal to po logy change. It also runs periodicallyto ensure that ro utes throughout the network areco rrect. This routing algorithm, ro bust and selfstabilizing,recovers automatically from corruptionoccurring in routing databases stored inrouting nodes or from any number of simultaneoustopological changes.12The routing layer supports a variety of communicationsfacilities for communicating betweenadjacent nodes. A complete path from a sourcenode to a destination no de can use a mixture oflink types. Three main types are supported: dedicatedlinks using the DDCMP protocol, X.25packet-switched networks, and Ethernet LANs.X.25 packet-switched networks are treated bythe routing layer as a co llection of po int-topointvirtual circuits; hence, these networks functionsimilarly to DDCMP point-to-po int links. Endno des have a particularly simple task on these·types of links since end no des make no decisionwhen sending a packet out; they simply send iton the link to the adjacent node.End no de ro uting is somewhat more complexon Ethernet LANs since each station can sendto any other station on the Ethern et; therefore,the end no des attached to an Ethernet must makea ro uting decision. When sending to no desremote fro m their Ethernet, the end nodes mustsend to a ro uter. When sending to no des on theirEthernet, the end no des send directly to the destinationno de. End no des fo llow a simple procedureto determine which path to follow. If arouter is present and the end no des do no t knowabout a particular destination, they forward theirpackets to the router. If no ro uter exists or if theyknow a particular destination is on the Ethern et,they send their packets directly to the destinationaddress, using 48-bit Ethernet addressesderived fro m the 16 -bit destination no deaddress.End no des learn that particular nodes are ontheir Ethernet by receiving packets directly fromthose no des or by being informed by a router.This approach was chosen to reduce the memoryand overhead in end no des while still permittingmultiple Ethernet LANs to reside in one DNAarea. The alternate approach was to limit eachDNA area to a single Ethernet, which wo uld havelimited the size of Phase IV networks.A DECnet network, like a complex network ofroads, is subject to co ngestion sho uld it be overloaded.The routing layer incorporates severaldesign decisions to reduce the potential for congestion,to prevent lo cal congestion from spreadingglobally, and to minimize the impact of congestionon network performance. To minimizecongestion, traffic sho uld be kept out of congestedportions of the network. To accomplishthat, each node restricts the number of buffersavailable to traffic originating at a no de, therebygiving priority to traffic transiting the no de.Two design decisions help to prevent theglobal spread of local co ngestion. The first decision was to keep ro uting as a function of the network topology, not of the network load. The seconddecision was to handle co ngestio n byallowing packets to be discarded at a no de whenan output queue has filled, instead of slowingdown input to the node. This second decisionminimizes the impact of traffic flowing throughthe node that does no t need the congested link.The discard policy also prevents buffer deadlock,which occurred in early research netwo rks, bypreventing circular buffer waiting conditions.The performance impact of co ng estion is minimizedby this policy fo r limited buffer sharingbetween co ngested links.14Perhaps the most important decision made indesigning the DNA routing layer was to provide a"best-effort" delivery service instead of a "reliable"service. This decision wa s made for a varietyof reasons. First, implementing functions at20<strong>Digital</strong> TecbnU:aiJournalNo. 3 <strong>September</strong> 1986


New Productsthe highest practical level suggested that deliveryguarantees should be provided by the endcommunications layer. In that way, reliable communicationcould be provided to user-writtenprograms. As we have seen, reliable delivery iseasily performed by the NSP protocol, involvingonly the cooperation of the two communicatingnodes.Second, providing reliable delivery in therouting layer would have been quite difficultsince it would require synchronizing state informationat many nodes. These nodes include allthose on the path between the communicatingsystems and possibly others that might be ormight have been used, since routes must changewhen the network topology changes. Furthercomplicating this problem is the fact that anystate information in intermediate nodes wouldbe lost following a crash.Third, providing reliable delivery would havecomplicated the congestion control problem andrequired complex algorithms to avoid bufferdeadlock. Fourth, a best-effort delivery servicewas a "least common denominator" among datalink protocols then in use or being developed.Ethernet provides such a data link service. WhenEthernet was added to the DNA architecture inPhase IV, its best-effort delivery service was aperfect match to the DNA routing layer.Data Link LayerThe data link layer creates a communicationspath between adjacent nodes. This layer framesmessages for transmission on the channel connectingthe nodes, checks the integrity ofreceived messages, manages the use of channelresources, and, when required, ensures theintegrity and proper sequence of transmitteddata. Currently, there are three protocols residingin the DNA data link layer: DDCMP, X.25,and Ethernet.DDCMP operates over synchronous or asynchronouscommunications links.15 It can operatein point-to-point configurations or in multipointconfigurations in which communication takesplace between a control station and each of severaltributary stations. DDCMP messages areframed as sequences of bytes, beginning with asingle control byte indicating the message's startingpoint and type (e.g., data or control). WhileDDCMP control messages have a fixed length,data messages have variable lengths, indicated bythe length field. On reception, this encodingallows the receiver to determine the beginningsand ends of messages. Incoming bits are assembledinto bytes by the communications hardware,using startjstop bits for asynchronous linksand synchronization characters for synchronouslinks.The DDCMP protocol uses a 16-bit cyclicredundancy check (CRC-16) to detect errors inheaders or user data. On half-duplex or multipointchannels, DDCMP executes link allocationprocedures to ensure that two or more stationsdo not conflict in their use of the channel. Thesetechniques are based on polling in which onestation extends permission to the other to transmit.DDCMP uses timers and sequence numbersto detect and recover from lost messages; it alsoprevents the process of error recovery from creatingduplicates. The routing layer uses the errordetection-and-retry capability of DDCMP toverify that links between nodes are operationaland to synchronize the operation of the routingprotocols.The X.25 specification developed by theInternational Telegraph and Telephone ConsultativeCommittee (CCITT) defines an interfacebetween a packet-switched network, such as apublic data network provided by a commoncarrier, and data terminal equipment, such as aDECnet node.16•17 The service provided bypacket-switched networks is a virtual circuitservice in which connections, called virtualcircuits, are established between pairs of nodes.DNA supports these virtual circuits for useby two functions: the DNA routing layer, andspecial applications, such as communicatingwith communication services built on top ofX.25 and offered by the common carriers.DNA defines procedures for allocating X.25virtual circuits between these two functionsand for providing access to X.25 networks byDNA nodes not directly connected to an X.25interface.Routing uses X.25 virtual circuits in much thesame way it uses point-to-point links. A singleX.25 virtual circuit can carry data between manydifferent nodes, and virtual circuits are used intandem with DDCMP links and Ethernet IANs.The Ethernet IAN provides a communicationsfacility for high-speed communication amongcomputers located within a moderately sizedgeographic area, such as a building or a campus.This IAN includes a data link layer and a physicallayer, which can send data at a rate of 1 0 million<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 19862 1


A <strong>Digital</strong> Network Architecture Overoiewbits per second. The Ethernet has a maximum stationseparation of 2.5 kilometers with a maximumof 1024 stations. A shielded coaxial cableis used as the physical medium. Ethernet alsouses a branching, nonrooted tree topology. TheEthernet IAN technology was jointly developedby <strong>Digital</strong> Equipment Corporation, Intel Corporation,and Xerox Corporation. 18 The Ethernetspecification, with minimal changes, has subsequentlybeen standardized by the IEEE 802 LocalArea Networks Committee as IEEE Standard802.3.19The Ethernet data link protocol provides abest-effort delivery service. Messages, calledframes, are transmitted over the physical chatmelin a broadcast fashion. Stations are assigned48-bit addresses, and each frame contains asource address and a destination address. A framecan be addressed to an individual station or to agroup of stations, using a 48-bit group address(called a multicast address in Ethernet terminology). A special multicast address, consisting of1 's, is used to denote the set of all stations on anEthernet and is typically used for maintenancepurposes. In the DNA architecture, the multicastcapability is used for network configuration purposesby the routing and network managementlayers. For example, a multicast address, specifiedby the architecture, is defined in the routinglayer specification as the set of all routing nodeson an Ethernet IAN.End nodes advenise their availability to routingnodes by periodically broadcasting "hello"messages to the multicast address. The large48-bit address space permits a unique address .tobe assigned to each Ethernet station when it ismanufactured. That address space permits stationsto be plugged in to a IAN and operate withouthaving addresses assigned manually. MOPuses this address when down-line loading computersystems, such as server systems with nomass storage. The 48-bit address can be used toselect the correct program and parameters to beloaded into the node, such as the 16-bit DECnetnode address.The Ethernet data link protocol frames messagesusing the propenies of the Manchester codingscheme employed by the physical channel tomark the beginning and end of each frame. Inaddition to source and destination addresses,frames employ a 16-bit protocol type field toidentify the higher-level protocol carried inthe frame. The protocol type field values areassigned in blocks to all vendors who manufactureEthernets, thus permitting different proprietaryand public protocols to coexist in a singleEthernet station. Ethernet frames also contain a32-bit CRC to ensure that frames received inerror are detected and discarded.Since the Ethernet physical channel can transmitdata only from one station at a time, the Ethernetdata link protocol must allocate the singlechannel among all the stations. This allocation isaccomplished by the technique of CSMA/CD(carrier sense multiple access with collisiondetection) . In this contention-based protocol,stations "listen" before transmitting (carriersense) and defer their transmissions to other stationsalready transmitting.Should several stations begin transmittingsimultaneously, a collision will occur, preventingcorrect reception of any transmission. In thiscase the physical channel hardware in each collidingstation will detect the collision (collisiondetection) and each station will rescheduletransmission after a randomly selected delay. Toensure efficient, stable operation of the networkunder both low- and high-load conditions, thisrandom delay is adjusted on subsequent collisionsby the back-off algorithm. This causes eachstation to reduce the load presented to the networkunder overload conditions. Studies haveshown that this procedure provides good performance(low delay and high throughput) over arange of Ethernet configurations and loads.20•21These studies have also shown that the procedureallocates resources fairly between competingstations and operates stably under high-loadand overload conditions.<strong>Digital</strong> recently introduced the concept ofbridges and extended IANs as a means to extendthe physical extent, number of stations, andthroughput capabilities of a single LAN . 22Extended IANs operate transparently to higherlevelprotocols, such as the.DNA protocols. Thus,although not a part of the DNA architecture,extended LANs - such as those built fromEthernets and LANBridge 1 00s (Ethernet-to­Ethernet) - can be components of a DECnetnetwork.Physical Link LayerThe physical link layer transmits bits of informationbetween adjacent nodes. The functions inthis layer include encoding and decoding signalson the connecting channel, performing clock22<strong>Digital</strong> TecbnlcalJournalNo. 3 <strong>September</strong> 1986


New Productsrec overy of received signals, and interfacingthe communications channel to any processorand memory used to implement higher-levelprotocol functions. Implementations of thislayer encompass hardware interface devicesand device drivers in operating systems, aswell as communic ations hardware suc h asmodems, transceivers, and the physical channelsthemselves.Protocols for the physical layer are rudimentary,emphasizing the specification of electricalinterfacing parameters. No special physical layerspecifications have been developed for DNA.Instead, it relies on industry standards for thephysical layer, thereby ensuring that DECnetproducts can operate over available communicationstechnologies and infrastructures. Physicallayer standards supported by DNA for wide areanetworks include the EIA RS-232C and RS-423specifications, and the CCITT V.24 and X.25Level 1 specific ations. Physical layer standardssupported by the DNA architecture forLANs include two baseband implementationsof ThinWire Ethernet, the original18 and thethinwire2 3 specifications, and a broadband24implementation.Future Directions fo r the DNAArchitectureFor ten years, DNA has evolved in four maindimensions: network applications, communicationstechnologies, network size and scale, anddiversity of supported computer systems. Thisability to evolve independentl y along fourdimensions has proven to be one of the principalbenefits of the architecture. It is reasonable toassume that evolution along these lines will continue.Local area and wide area communicationstechnologies continue to evolve, typically resul t­ing in higher communications data rates. Newapplications for computer networks and newapplications protocols will also continue toevolve. The DNA architecture will continue toaccommodate these trends.The 16-bit node addresses used by DNA PhaseIV currently limit the size of DECnet networks. ·<strong>Digital</strong>'s own internal DECnet network is nearingthe limits of this address space; over 10, 000nodes are currently registered. Clearly, the architecturemust be extended to support morenodes. From our experience with this network,there are two separate reasons why networkscontinue to grow rapidly. First, the availability oflow-cost computer systems allows individuals toown network nodes rather than sharing a singletimesharing system. Second, certain applications,such as network mail, need to operateacross whole organizations. This breadth makes asingle company-wide network, rather than separateindependent networks, highly desirable.Indeed, there is even a need for networks thatspan multiple organizations, adding further tothe problems of sc ale, complicating networkmanagement requirements, and creating newproblems of network security. DNA will have toevolve to adapt to much larger and more diversenetworks.As computer networks have beco me larger,users have developed increasing requirementsfor networks that interconnect computer systemsfrom multiple vendors. The International StandardsOrganization (ISO) has been developingstandards for such networks through their OpenSystems Interconnection program (OSI) . The OSlreference model defines a network architecturesimilar in many respec ts to that of DNA. 25 Mostmajor computer vendors, ncluding <strong>Digital</strong>, haveannounced their support tor OSI and are beginningto deliver OSI network products. <strong>Digital</strong> hasannounced its strategy to incorporate OSI protocolsinto its networking products, integratingthem into the DNA arc hitecture. Future versionsof the DNA arc hitec ture will correspondto a mixture of standardized and proprietaryprotocols.References1. DECnet <strong>Digital</strong> Ne twork Architecture(Phase IV) General Description (Bedford:<strong>Digital</strong> Equipment Corporation,Order No. AA-149A-TC, 1982) .2. DECnet <strong>Digital</strong> Network Architecture(Phase IV) Network Management FunctionalSp ecification (Bedford: <strong>Digital</strong>Equipment Corporation, Order No.AA-X437A-TK, 1983).3. DECnet <strong>Digital</strong> Network Architecture(Phase IV) Maintenance Op erationsFu nctional Specification (Bedford: <strong>Digital</strong>Equipment Corporation, Order No.AA-X436A-TK, 1983).4. N. La Pelle, M. Seger, and M. Sylor, "TheEvolution of Network Management Products," <strong>Digital</strong> <strong>Technical</strong> journal (<strong>September</strong>1986, this issue): 117-128.<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 198623


A <strong>Digital</strong> Network Architecture Overview5. DECnet <strong>Digital</strong> Network ArchitectureData Access Protocol (DAP) FunctionalSpecification (Bedford: <strong>Digital</strong> EquipmentCorporation, Order No. AA-K177A-TK,1983).6. J. Morency, D. Porter, R. Pitkin, and D.Oran , ''The DECnetjSNA GatewayProduct - A Case Study in Cross VendorNetworking,'' <strong>Digital</strong> <strong>Technical</strong> jo urnal(<strong>September</strong> 1986, this issue) : 35-53.7. P. Beck and J. Krycka, "The DECnet-VAXProduct -An Integrated Approach to Networking,"<strong>Digital</strong> <strong>Technical</strong> journal (<strong>September</strong>1986, this issue) : 88-99.8. DEC net <strong>Digital</strong> Network Architecture SessionControl Layer Functional Specification(Bedford: <strong>Digital</strong> Equipment Corporation,Order No. AA-K 182A-TK, 1980) .9. DECnet <strong>Digital</strong> Network Architecture(Phase IV) NSP Functional Specification(Bedford: <strong>Digital</strong> Equipment Corporation,Order No. AA-X439A-TK, 1983) .10. C. Sunshine and Y. Dalal, "ConnectionManagement in Transport Protocols,"Computer Networks , vol . 2 (December1978): 454-473.11. W. Lai, "Protocol Traps in Computer Networks-A Catalog,'' IEEE Tra nsactionson Communication , vol . COM-30, no. 6oune 1982): 1434-1449.12. DECnet <strong>Digital</strong> Network Architecture(Phase IV) Routing Layer FunctionalSpecification (Bedford: <strong>Digital</strong> EquipmentCorporation, Order No. AA-X43 5A-TK,1983).13. L. Pouzin, "Presentation and Major DesignAspects of the Cyclades Computer Net-.work," Th ird IEEEjA CM Data CommunicationSymposium (November 1973):80-87.14. R. Jain and W. Hawe, "Performance Analysisand Modeling of <strong>Digital</strong>'s NetworkingArchitecture," <strong>Digital</strong> <strong>Technical</strong> journal(<strong>September</strong> 1986, this issue) : 25-34.15. DECnet <strong>Digital</strong> Network Architecture<strong>Digital</strong> Data Communications MessageProtocol (DDCMP) Functional Specification,Version 4. 1. 0 (Bedford: <strong>Digital</strong>16.17.18.19.20.21.22.23.24.25.Equipment Corporation, Order No.AA-K175A-TK, 1984) .CCITT Recommendation X. 25, CCITTYellow Book , vol. VIII.2 (Geneva: lnternationalTelecommunications Union, 1981).Standard ISO 8208 fo r Information ProcessingSystems, "X.25 Packet Level Protocolfor Data Terminal Equipment," InternationalStandards Organization ( 198 5) .The Ethernet: A Local Area Network, DataLink Layer and Physical Layer Specifications,Version 2. 0 (<strong>Digital</strong> Equipment Corporation,Intel Corporation, and XeroxCorporation, Order No. AA-K759B-TK,1982).IEEE Project 802 Local Area NetworkStandards , "IEEE Standard 802.3CSMA/CD Access Method and PhysicalLayer Specifications," Approved IEEE Standard802.3-1985, ISO/DIS 8802/3 Ouly1983).J. Schoch and J. Hupp, "Measured Performanceof an Ethernet Local Area Network,''Communications of the ACM, vol . 23,no. 12 (December 1980) : 711-721.F. Tobagi and B. Hunt, "Performance Analysisof Carrier Sense Multiple Access withCollision Detection," Computer Networks, vol . 4, no. 5 (October/November1980): 245-259.W. Hawe, M. Kempf, and A. Kirby, "TheExtended Local Area Network Architectureand LANBridge 100," <strong>Digital</strong> <strong>Technical</strong>jo urnal (<strong>September</strong> 1986, this issue) :54-72.IEEE Project 802 Local Area Networks ,"Medium Attachment Unit and BasebandMedium Specifications, 1 OBASE2," IEEEP802.3/83j0.21E, Section 10.IEEE Project 802 Local Area Networks ,"Broadband Medium Attachment Unit andBroadband Medium Specifications," IEEEwP802.3j84j0.46B, Section 11.Standard ISC 7498 for Information ProcessingSystems , "Open International StandardsOrganization ( 1982).24. <strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 1986


Raj]ainWilliam R. Hawe IPerfonmance Anarysand Modeling of <strong>Digital</strong>,sNetworking Architecture<strong>Digital</strong> has some of the highest performing networking products in theindustry today. Transfer rates of 3.2 megabits per second and higherhave been measured on an Ethernet. These high speeds result from carefulperformance analyses and planning at all stages of the developmentcycle. A set of case studies iUustrates these analyses. These studies includeperformance modeling for adapter placement in the physical layer;buffering in the data link layer; path splitting in the network layer; crosschannelpiggybacking, timeout and congestion algorithms in thetransport layer; and file transfer and terminal communications in theapplication layer. Completing the paper are studies on network trafficmeasurements and workload characterization.Performance analysis is an integral pan of thearchitectural design and implementation of networksat <strong>Digital</strong> Equipment Corporation. Thisdeliberate strategy has helped to make us theindustry leader in networking products. Some ofthese products have the highest performanceavailable today. Task-to-task transfer rates ofmore than 3.2 megabits (Mb) per second havebeen measured on an Ethernet local area network(LAN) connecting two MicroVAX II systems.1This paper describes a number of case studiesthat illustrate the analyses done to ' improve theperformance of <strong>Digital</strong>'s network products.These analyses are ongoing; they are planned forevery stage in the life cycles of products. Thedesign life cycle of a product consists of the followingstages: conceptualization, prototyping,marketing research, development, sale, and fieldsupport. Each stage takes place in a differentorganization within <strong>Digital</strong>. A research organizationusually conceives an idea for a new product.An advanced development team then developsthe architectural specification and builds a prototypeto demonstrate the feasibility of the idea.In tum, the marketing organization decides if theproduct can be sold and how competitive it willbe. If they decide that the idea should become aproduct, the development organization will performthat task.Each of those organizations has a team ofperformance analysts who ensure that the bestalternatives are chosen at each stage. The salesorganization also measures product performanceand develops capacity planning and performance-tuningtools. The field support organizationsmonitor performance at customer sites andfeed the information back to the developmentorganizations. They then improve the productthrough revisions, field changes, and updatedmodels.To conduct performance studies, we use analyticalmodeling, simulation, and the taking ofappropriate measurements. Which of those techniquesto use depends upon the product developmentstage and the time available to do thestudy. Queuing theory, including operationalanalysis, is extensively used in analytical modeling.2·3 Simulation models are usually developedto solve specific problems4•5•6 or often they aresolved via queuing network solvers. Measurementsof operational characteristics are taken ofthe system as well as the workload, using bothsoftware and hardware monitors. Traffic measurementsare taken on <strong>Digital</strong>'s own networks,as well as on those at customers' sites.1•7 Toolsfor capacity planning, monitoring, and modelingare also used by the teams doing these performance analyses.8·9 Sometimes we have to<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 198625


Performance Analysis and Modeling of <strong>Digital</strong>'s Networking Architecturedevelop new performance metrics10 or statisticalcomputation algorithms.11This paper presents the diversity of the performanceanalysis techniques used to ensure thatour networking products operate at high efficiencies.Many performance studies of our productshave been published; we do not intend toreproduce them here. We have selected a representativegroup of unpublished case studies toillustrate the diversity of our approach to performanceimprovement. One typical problem fromeach of the key layers of the networking architecturewill be discussed. A discussion of workloadcharacterization and traffic analysis will closethe paper.Physical Layer PeiformanceWe conducted many performance studies within<strong>Digital</strong> to help set the parameters of the 10 Mbper-secondEthernet LAN. This is the same Ethernetthat, with certain modifications, we proposedfor standardization and was later adoptedas the IEEE 802.3 standard. The two most interestingproblems in the physical layer design areclock synchronization (phase lock loop versuscounter) and the placement of adapters on theEthernet cable. We describe the latter problemand the proposed solution below.At each adapter, some fraction of the incomingsignal is reflected back along the cable. Ifadapters are placed in close proximity, theirreflections may reinforce each other and interferewith the signal.The adapter designers had specified that a totalnoise level of 25 percent of the true signal levelwas an acceptable limit. Since half of this noisenormally comes from other noise sources(sparks, radiation, etc.), the reflected voltagemust be less than 12. 5 percent of the signallevel.The cumulative reflection is actually strongestat the transmitter itself because of the attenuationof the signal and its reflection as they propagatethrough the cable. Since the transmitter isnot adversely affected by the reflection, however,adapters placed next to the transmitter arethe most sensitive to reflection problems. Therefore,those adapters were the best candidates foranalyzing problems caused by reflections.It is essential to maintain some minimumseparation between adapters. To assist networkinstallers, the Ethernet cable is marked a:t2.5-meter intervals; the specifications state thatthe adapters should be placed only at thosemarks. That spacing was determined from amodel that simulated many different r:andomplacements of a given number of adapters on the 'cable and determined the worst-case reflection.The simulation model showed that the worstcase occurs when approximately 1 00 adaptersare placed on the marked cable. With 100 nodes,the reflected voltage exceeded 10 percent of thetrue signal in only 24 of the 10,000 configurationsthat were simulated. In fact, the maximumreflection observed for any placement was 12.1percent, well below the 25 percent noiseallowance.It is easy to see why the 1 00-adapters caseperforms worse than other cases with bothmore and fewer adapters. With the cable markedat 2.5-meter intervals, a single Ethernet segment(500 meters) can accommodate up to200 adapters. When the number of adapters issmall, their reflections will be too small to causeany problem. On the other hand, if the numberof adapters is close to the maximum of 200, thereflections from neighboring adapters will tendto cancel each other out.The cable marking alone' is no guaranteeagainst experiencing reflection problems. Giventhis or any other marking guideline, it is still possibleto position adapters so that the reflectionsreinforce. This happens if the adapters areplaced A/2 apart, A being the wavelength atwhich transmissions are taking place. For example,for a 1 0-MHz signal traveling at a speed of234 meters per microsecond, A is 23.4 meters,the speed divided by the frequency. Hence, if theadapters are placed approximately 11.7 metersapart, their reflections will reinforce.Data Link Layer PerformanceA number of our studies about the performanceof the data link layer in Ethernet have alreadybeen published.12•13 M. Marathe compared fiveback-off algorithms and concluded that none wassignificantly better than the binary exponentialback-off algorithm.12 This simulation-based analysisalso showed that the number of retriesshould be increased from the original 8 to 16.Response times at the user level have also beenstudied. Such studies show that a 10 Mb-per-secondEthernet can support up to several thousandtimesharing users.14 A capacity planning tool wasdeveloped to study the system-level performancefor any given configuration.6 The performance26<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New Productsstudies of extended LANs are discussed in thisissue of the <strong>Digital</strong> <strong>Technical</strong> ]ournal.1 5•16The case study described here uses a very simpleanalytical model to assist in designing anEthernet adapter.Three common approaches used for interfacingmachines to a LAN are depicted in Figure 1 .Case A represents the approach used in the<strong>Digital</strong> UNIBUS Ethernet Adapter (DEUNA)product. Case B represents the approach used inthe Ethernet adapters made by 3COM Corporationfor UNIBUS and Q-bus systems. Case C representsthe approach used in adapters like the<strong>Digital</strong> Q-bus Ethernet Adapter (DEQ NA) product.Each approach has certain advantages anddisadvantages.In Case A the packets received are first bufferedon the adapter, then moved via direct memoryaccess to buffers in the host's memory. Packetsto be transmitted follow the same path,except in reverse. The throughput limit of thedevice is limited by how quickly it can movepackets between the adapter and the host' sbuffers.In Case B the packets received are also bufferedon the adapter. In this case, however, thepacket buffer memory is dual ported, with thehost side being mapped into the address space ofthe host. This scheme allows the host to examineABBACKPLANE HOSTLAN ADAPTER BUS MEMORY• • L _r-;;lCOMMON ADDRESS SPACEpackets in the buffers on the adapter, withoutusing any backplane bus bandwidth to receivethem. However, to receive the packets, the hostmust copy them to buffers elsewhere in memorywith a programmed move:In Case C the packets flow in real-time intobuffers in the host memory. The backplane is isolatedfrom the channel with a first-in, first-out(FIFO) control point. This approach reduces theoverhead on the host, as well as that on theadapter processors. In this case, however, excessiveDMA-request latencies may cause overflowin the FIFO control; hence a packet may be lostwhen received. When packets are transmitted,these latencies may cause an underflow in theFIFO control; hence a packet may be aborted.The performance of the DEUNA adapter is sensitiveto two factors: the number of receivebuffers in the adapter, and the number of wordsto be transferred per UNIBUS capture. The numberof receive buffers chosen affects the packetloss rate in the adapter. The number of wordstransferred affects the response to disks and otherdevices on the UNIBUS system. Transferring toomany words per UNIBUS capture may cause adisk to experience "data lates, " indicating thatthe disk could not get the bus for data transferwithin the required time.We wanted to know if the number of bufferschosen by the designers of the DEUNA adapterwould cause these problems to occur.We used a simple analytical model to determinethe packet loss rate an d a simulation modelto determine the words per UNIBUS capture. Thesimulations showed that the DEUNA adaptershould transfer only on e word per UNIBUS capture.The an alytical model is described here.The packet-arrival process is assumed to followa "bulk Poisson" distribution in whichbursts of packets arrive at a rate of A. bursts persecond. The number of packets per burst isassumed to be geometrically distributed with amean B. The burst size is thus described by theformulaFigure 1Three Ways of Organizing BuffersPr (k packets in burst) =(1- 1/B) X B (k - I) , k 1For mathematical convenience we assume thatall service times are exponentially distributed:this assumes that packet lengths are exponentiallydistributed, with a mean of L words perpacket. The UNIBUS bandwidth is approximately800,000 words per second. A fraction of thatbandwidth, J.L words per second, is available forDigiUU TecbnicalJournalNo. 3 <strong>September</strong> 198627


---Performance Analysis and Modeling of <strong>Digital</strong>'s Networking Architecturethe transfers between the DEUNA buffer andmemory. The rest of the bandwidth is used up bytransfers betWeen the disk and memory, and byother devices on the UNIBUS system.If the DEUNA buffer has a capacity for notmore than N packets, then any packet arriving ata full buffer will be lost. Let us first calculate theprobability P ( n), 0 < n < N, that there are npackets (including the one, if any, in servie)queued at the DEUNA adapter. The distributionof the number of packets in the queue has a relativelysimple form:P(n) = (1 - P)/(1 - p X a N ) for n = 00.04i:: 0.03::::;ii5a? 0.0211.(/)(/)0...J 0.01B =BURST SIZEandP(n) = P(O) X (pjB) X a


New Productsequiprobable splitting. Since the mean waitingtime on a saturated path is infinite, the averagewaiting time of all sources over all paths willinclude an infinite term and therefore will alsobe infinite.The performance impact of path splitting canbe seen from the following example. Assume thesimple configuration of two senders and threelines shown in Figure 3. Sender 1 has accessto paths 1 and 2; sender 2 to paths 2 and 3.Each sender selects either accessible path withequal probability. Without path splitting, bothsources might select the same path (path 2) withprobability 1/4, or select separate paths withprobability 3/4 . The mean waiting time (assumingM/M/1 servers) isWn = 3/4 X (1/( - X)) + 1/4 X (1/(-2 X X))8SERVICE RATE = 10 --------------0.0 0.1 0.2 0.3 0.4 0.5ARRIVAL RATENOTE: ALL TIMES ARE NORMALIZED BYTHE PACKET SERVICE TIMEX.JJI.)-;- PATH 1.JJI.)-;- ·PATH 2 L___:_j JII)-;;- PATH 3(a) Two Sources and Three Paths87 .JJI.)-;-! IIO-;Er-iFigure 3IPROBABILITY 3/4 : PROBABILITY 1/4(b) No Splitting(c) Equiprobable SplittingTwo Senders Transmitting onThree LinesIIFigure 4Mean Waiting TimeWith equiprobable path splitting, the input rateto paths 1 and 3 is X/2 and the rate to path 2 is X.The probability of a packet following path 1, 2,or 3 is 1j4, 1/2, and 1/4 respectively. The meanwaiting time isWs = (1/4 + 1/4) X (1/(JL-Xji)) + 1/2 X (1/(JL-X))The values of the mean waiting time both withand without splitting, Ws and Wn respectively,are illustrated in Figure 4. Observe that saturationoccurs much earlier when there is nosplitting.Another advantage of path splitting is that itmakes traffic less bursty. Bursty traffic presents aserious problem in performance control, both inaverage performance and in predictability. Withbursts, the mean waiting time can greatlyincrease; in fact, if there is an average B packetsper burst, then the mean waiting time will beabout B times that value predicted by an identicallyloaded M/M/ 1 queueY The waiting timevariance will similarly increase, since the firstand last packets in a burst will experience markedlydifferent waiting times. The overall performanceis very difficult to control or guarantee insuch a situation.Path splitting has a major advantage in this situationbecause it breaks up the bursts, sendingeach packet over a different path. The performanceof a network with bursty traffic is thusappreciably improved. In fact, if there are more<strong>Digital</strong> TecbnicaljournalNo. 3 <strong>September</strong> 198629


Performance Analysis and Modeling of <strong>Digital</strong>'s Networking Architecturepaths usable by a source than there are packetsper burst, then burstiness will have little effecton either the mean or the variance of waitingtime. On the other hand, only two or three alternativeequiprobable paths are enough todecrease the bursty-packet waiting time fromone-half to two-thirds for the first hop. Thepacket bursts will tend to spread apart as theypropagate, so that the improvement in subsequenthops will be somewhat less.Transport Layer PerformanceSeveral studies have been published on the performanceof the transport layer in the DNA structure.18·19·20·21 One of the published studies is ontimeout algorithms. We found that under sustainedloss, all adaptive timeout algorithmseither diverge or converge to values lower thanthe actual round-trip delay. 18 If an algorithmconverges to a low value, it may cause frequentunnecessary retransmissions, sometimes leadingto network congestion. Therefore, divergence ispreferable in the sense that the retransmissionsare delayed.HOSTNODETERMINALNODEAPPLICATION TRANSPORT TRANSPORT APPLICATIONLAYER LAYER LAYER LAYERFigure 5CREDIT = 1Eight Transport Level PacketsOne key lesson we learned from the timeoutalgorithm research was that a timeout is also anindicator of congestion in the network. Therefore,not only should the source retransmit thepacket on a timeout, but it should also takeaction to reduce future input into the network.There is a timeout-based congestion control policycalled CUTE (congestion control using timeoutsat the end-to-end layer) that manages theseactions. 19Among the new features of DNA Phase IV arecross-channel piggybacking, acknowledgmentwithholding, and larger flow-control windows.These features were introduced as the result of astudy that concluded that straightforward terminalcommunication over a DECnet networkwould be slow. This conclusion lead eventuallyto the development of a new local area transportprotocol, called LAT, for terminal communications.These enhancements were also addedto the DNA transport protocol . This study isdescribed below.In the DNA structure, each transport connectionhas two subchannels: one for the user, andone for control. The user subchannel carries userdata and their acknowledgments, called acks.The control subchannel is used for flow-controlpackets and their acks. Protocol verification canbe easily achieved if the two subchannels areindependent so that information on one channelis not sent on the other. In studying terminalcommunications over a IAN, we discovered thateach terminal read took eight transport protocoldata units (TPDUs) , as shown in Figure 5. Eachunit consists of two application level packets: aread request, and a data response. Each packetrequires a link service packet from the respectivereceiver; this service packet permits the senderto send one packet. The remaining four units aretransport level acks for these four packets.Given the CPU time required per packet, wecomputed that communication for remote terminalstakes four times as much CPU time as that forlocal terminals. Therefore, our goal was toimprove performance by a factor of four. We proceededin three ways to solve this problem. First,we modified application programs to utilizelarger flow-control windows; seco·n d, wesearched for ways to reduce the number of pack:'ets per I/0 operation; third, we tried to reducethe CPU time required per packet. The first goalwas achieved by multibuffering, discussed laterin the section "Application Layer Performance."30<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New ProductsThe second goal was achieved by• Cross-channel piggybacking-This techniqueallows transport control acks to be piggybackedon normal data packets or acks.• Delayed acks -The receiver can delay an ackfor a small interval. This delay increases theprobability of the ack being piggybacked onthe next data packet.• Ack withholding -The receiver does notacknowledge every packet, particularly ifexpecting more packets from the source. Thesource can explicitly tell the destination towithhold sending an ack by setting a "No AckRequired" bit in a packet.• No flow control -This option allows flowcontrol to be disabled for those applicationsoperating in request-response mode and thushaving a flow-control mechanism at the applicationlevel.• Multiple credits per link service packet ­Credits are not sent as soon as each bufferbecomes available. Unless the outstandingcredits are very low, a link service packet issent only when a reasonable number of buffersbecomes available.To achieve the third goal, reducing the CPUtime per packet, we used a hardware monitor tomeasure the time spent in various routines. Wefound that in a single-hop loopback experiment,only one third of the CPU time at the source wasattributable to DECnet protocol routines. Theremainder was associated with the driver for theline adapter; operating system functions, such asbuffer handling and scheduling; and miscellaneousoverheads associated with periodic events,such as timers, status updates. Of the time spentin the DECnet protocol, 30 percent was spent incounter updates and statistics collection. Similarly,21 percent of the time spent in the linkdriver was used in a two-instruction loop thatimplemented a small delay. The net result ofmodifying these routines and im plementing thearchitectural changes mentioned above is thatwe achieved our target of improving the performanceby a factor of four.Application Layer PerformanceThe three key network applications are file transfer,mail, and remote terminal communications.Earlier, we discussed some of the terminal com-munication performance issues. The new LATprotocol has been designed to provide efficientterminal communication. This protocol and itsperformance are described in this issue of the<strong>Digital</strong> <strong>Technical</strong>journa/ .22 In this section, wewill describe some performance issues in file·transfer.File transfer. in DNA takes place via a networkob ject called a file access listener (FAL); whichin turn uses an application level protocol calledthe disk access protocol (DAP). Measurements ofan initial version of FAL revealed that the remotefile transfer took an excessively long elapsedtime. A subsequent analysis showed that the single-block"send-and-wait" protocol used by FALwas responsible for that excessive ti me. Thelocal FAL waited for the remote write operationto finish before sending the next block. Thus theadvantage of larger flow-control windows ·offered by the transport protocols were ignoredby the application software.:The suggested remedieswere to allow multiblocking and multibuffering.·Multibuffering consists of allowing severalbuffer writes to proceed simultaneously, anaction similar to the window mechanism used atthe transport layer. Multibuffering allows paralleloperations at the source and destinationnodes and at the link, thus considerably reducingthe elapsed time and enhancing throughput.Experiments have shown that there is considerablegain in throughput as the buffering levelincreases from one to two. Further increases doresult in better performance, but the amount ofgain is smaller.Multiblocking consists of sending more thanone block per FAL write, which decreases CPUtime and the di sk rotational latency (the timespent in waiting for the disk to come under theheads at the start of each write). As with multibuffering,the elapsed time is considerablyreduced and the throughput is enhanced.Wo rkload Characterization andTraffic AnalysisThe results of a performance analysis studydepend very heavily on the workload used. Tokeep up with continuously changing load characteristics,we regularly conduct system and networkworkload measurements. Workload characterizationstudies enable us not only to use thecorrect workload for our analysis but also toimplement our products more efficiently.<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 198631


---Performance Analysis and Modeling of <strong>Digital</strong>'s Ne tworking ArchitectureA study of the system usage behavior at six differentuniversities showed that a significant portion(about 30 percent) of the user's time isspent in editing.23 This conclusion led us to useour text editor (EDT) as the key user level benchmarkfor network performance studies. The studyresults also led to the transport layer performanceimprovements discussed earlier.A study of network traffic at M.I.T. showed thatthe packets exhibit a "source locality."7 That is,given a packet going from A to 8, the probabilityis very high that the next packet on the link willbe going either from A to 8 or from 8 to A. Theseobservations helped us improve our packet-forwardingalgorithm in bridges. The forwardingdecision is cached for use with the packets arrivingnext. A two-entry cache has been found toproduce a hit rate of 60 percent, resulting in significantsavings in table lookup.The principal cause of source locality is theincreasing size of data objects being transportedover computer networks. The siz es of dataobjects have grown faster than packet sizes have.Packet sizes have generally been limited by thebuffer sizes and by the need to be compatiblewith older versions of network protocols. Transferof a graphic screen could involve data transfersof around two million bits. This increase ininformation size ·means that most communicationsinvolve a train of packets, not just onepacket. The commonly used Poisson arrivalmodel is a special case of the train modelJThe two major components of a networkingworkload are the packet size distribution andthe interarrival time distribution. ]. Shoch and]. Hupp made the classic measurements of thesecomponents for Ethernet traffic. 24 Their testshave been repeated many times at many places,including <strong>Digital</strong>. The bimodal nature of thepacket size distribution and the bursty nature ofthe arrivals are now well accepted facts; we willnot elaborate further on them.The utilization of networks is gen erally verylow. Measurements of Ethern et traffic at one ofour software en gineering facilities with 50 to 60active VAX nodes during normal working hoursshowed that the maximum utilization during any15-minute period was only 4 percent. Althoughhigher momentary peaks are certainly possible,the key observation, confirmed by other studiesas well, is that the network utilization is normallyvery low. While comparing two alternatives,say H and L in Figure 6, some analystswould choose alternative H, which performs betterthan L under heavy load but worse under lightload. Our view of this choice is quite different.We feel that, while high performance at heavyload is important, it should not be obtained atthe cost of significantly lower performance atnormal, light load levels. Therefore, the choicebetween L and H would also depend upon theperformance of Hat low loads.w()z


New ProductsTable 1DECnet Packet StatisticsAverage Maximum MinimumDECnet packets (percent of total packets)DECnet bytes (percent of total bytes)Routing packetsPercent of DECnet packetsPercent of DECnet bytes86684599 6099 3252 166 2Transport (ECL) packetsPercent of DECnet packetsPercent of DECnet bytes969599984834ECL data packetsPercent of DECnet packetsPercent of DECnet bytesUser transmitted dataPercent of DECnet bytesPercent of ECL data byteslntraEthernet ECL data packetsPercent of DECnet packetsPercent of DECnet bytes44603965797951 381 468 28450349193 24The table also shows that, typically, 80 percentof all packets and bytes are used in intranetworkcommunication. That is, only 20 percent ofthe observed traffic originated from or was destinedfor a node not in the facility.SummaryPerformance analysis is an integral part of thedesign and implementation of network architecturesat <strong>Digital</strong>. Analytical, simulation, and measurement techniqu es are used at every stage of anetwork product' s life cycle. This consciouseffort has made <strong>Digital</strong> the industry leader in networking.Over the past decade, the link speeds haveincreased by two orders of magnitude; however,the performance at the user application level hasnot increased in proportion, mainly because ofhigh protocol processing overhead. The key toprodu cing high performance networks in thefuture, therefore, lies in reducing the processoroverhead.We have described a number of case studiesthat have resulted in higher performance for the<strong>Digital</strong> Networking Architecture. This performanceincrease has come about by reducing thenu mber of packets, simplifying the packet processing,and implementing protocols efficiently.AcknowledgmentsThe studies reported in this paper were doneover a period of time by many different people,some of whom are no loqger with <strong>Digital</strong>. Wewould also like to acknowledge Dah-Ming Chiu(DEUNA bu ffers) and Stan Amway (traffic measurements). We would like to express ou r gratitudeto them and other analysts for allowing us toinclude their results in this paper.References1. W. Hawe, "Technqlogy Implications inLAN Workload Chracterization," Proceedingsof the 1985 InternationalWo rkshop on Workload Characterizationof Computer Sy stems and Co m­puter Networks (October 1985) : 111-130.2. K. Ramakrishnan and W. Hawe, "Performanceof an Extended LAN for ImageApplications," Proceedings of the FifthInternational Phoenix Conference onCo mputer Co mmunications (March1986): 314-320.3. M. Marathe and S. Kumar, "AnalyticalModels for an Ethernet-like Local AreaNetwork Link, " Proceedings of SIGMET­RICS 'Bl (1981) : 205-215.<strong>Digital</strong> TecbnlcaljournalNo. 3 <strong>September</strong> 198633


--Performance Analysis and Modeling of <strong>Digital</strong>'s Networking Architecture4. I. Chlamtac and R. Jain, "Building a Simula- tion Users Group Eighteenth Meetingtion Model for Efficient Design and Perfor- (CPEUG'82) (October 1982): 375-389.man ce Analysis of Loca1 Area Networks, "15. W. Hawe, M. Kempf, and A. Kirby "TheSimulation (February 1984): 55-66.Extended Local Area Network Architecture5. R. Jain, "Using Simulation to Design a Com- and LANBridge 100, " <strong>Digital</strong> <strong>Technical</strong>puter Network Congestion Control Proto- jo urnal (<strong>September</strong> 1986, this issue):col, " Proceedings of the Sixteenth Annual 54-72 0Modeling and Sim ulation Conference16. W. Hawe, A. Kirby, an d B. Stewart, "Trans-(April 1985): 987-993.parent Interconnection of Local Networks6. J. Spirn, J. Chien, and W. Hawe, "Bursty with Bridges, " jo urnal of Telecommunica-Traffic Local Area Network Modeling," tions Networks , vol. 2, no. 2 (<strong>September</strong>IEEE jo urnal on Selected Areas in Commu- 1984): 117-130.nications , vol. SAC-2, no. 1 (Jan uary17. J. Spirn, "Network Modeling with Bursty1984): 250-258. Traffic and Finite Buffer Space, " Proceed-R. Jain an d S. Routhier, "Packet Trains: ings of the ACM Computer Network Perjor-Measurements an d a New Model for Com- mance Symposium (April 1982): 21-28.puter Network Traffic, " IEEE jo urnal on 18. R. Jain, "Divergence of Timeout AlgorithmsSpecial Areas in Communications (Forth- for Packet Retran smissions, " Proceedingscoming, <strong>September</strong> 1986).of the Fifth International Phoenix Confer-8. N. La Pelle, M. Segar, an d M. Sylor, "Theence on Computer Communications(March 1986): 174 -179.Evolution of Network Man agement Products," <strong>Digital</strong> <strong>Technical</strong> jo urnal (Septem- 19. R. Jain, "A Timeout Based Congestion Con -ber 1986, this issue): 117-128 .trol Scheme for Window Flow ControlledNetworks, " IEEE jo urnal on Sp ecial Areas9. M. Sylor, "The NMCCjDECnet Monitorin Communications (Forthcoming, Oc totember1986, this issue): 129-141.Design,'' <strong>Digital</strong> <strong>Technical</strong> journal (Sep-ber 1986).20. K. Ramakrishn an, "Analysis of a Dynamic10. R. Jain, D. Chiu, an d W. Hawe, "A Quantita-Win dow Congestion Control Protocol intive Measure of Fairness and DiscriminationHeterogeneous Environments Includingfor Resource Allocation in Shared Com-Satellite Links," Proceedings of the ComputerSystems, " <strong>Digital</strong> Equipment CorpoputerNetworking Sy mposium (ForthcomrationInternal Research Report, TR-30 1ing, November 1986).(1984).11. R. Jain and I. Chlamtac, "The P 2 21. D. Chiu, "Simple Models of Packet ArrivalAlgorithm Control," <strong>Digital</strong> Equipment Corporationfor Dyn amic Calculation of Quantiles -and Internal <strong>Technical</strong> Report, TR-326 (1984).Histograms without Storing Observations,''Communications of the ACM, vol. 28 , 22. B. Mann, C. Strutt, and M. Kempf, "Termino.10 (October 1985): 1076-1085.nal Servers on Ethernet Local Area Networks,"<strong>Digital</strong> <strong>Technical</strong> jo urnal (Sep-12. M. Marathe, "Design Analysis of a Local tember 1986, this issue): 73-87.Area Network, " Proceedings of the Com-7.23. R. Jain and R. Turner, "Workload Charac-terization Using Image Accounting,'' Proceedingsof the Computer Performanceputer Networking Symposium (December1980): 67-81.13. W. Hawe and M. Marathe, "Performance of Evaluation Us ers Group Eighteentha Simulated Programming Environment, '' Meeting (CPEUG'82) (October 1982):Proceedings of Electro '82 (1982): 111-120.21/4/1-8.24 . J. Shoch and]. Hupp, "Performance of the14 . M. Marathe and W. Hawe, "Predictin g Eth- Ethernet Local Network, '' CommunicaernetCapacity-A Case Study, " Proceed- tions of the ACM, vol. 23, no. 12 (Decemingsof the Computer Performance Evalua- ber 1980): 711-721.34<strong>Digital</strong> <strong>Technical</strong>]ournalNo. 3 <strong>September</strong> 1986


john P. MorencyDavid PorterRichard P. PitkinDavid R. OranThe DECnet/SNAGateway ProductA Case Study in Cross Vendor NetworkingConnecting <strong>Digital</strong>'s network products with those from IBM Corporationhas been a problem, since the network architectures diifer. SNA has ahierarchical node structure and a subset architecture supporting logicalunit types. In contrast, the DNA architecture has a symmetric peer-topeerstructure, with all nodes free to communicate. The DECnetjSNAGateway product allows components from both networks to communicate.Its architecture enables cross-network connections and specifieshow messages will be structured. A significant feature is network managementto measure the performances of components. The software hasthree servers that facilitate the flow of data across the gateway.Recent technological trends in the computerindustry have rapidly brought networks to be theequivalent of systems. It is true that computernetworks have long been used to realize distributedapplications. Until recently, however,such networks generally represented isolatedpockets of computing power within organizations.Now, increased pressures to both reducecosts and achieve greater organizational productivityhave stimulated the drive for more effectivenetwork integration. In the future, companieswill be establishing single information"utilities" that will allow end users to accessneeded resources without regard to their physicallocations.Many issues must be resolved to create this single,integrated structure. One of the most significant,addressed in this paper, is how to deal withmulti-vendor computing equipment within a singleorganization. Here, the problem is not onlyone of establishing common communicationsprotocols, but also integrating that support intoend-user computing to minimize the disruptionof existing services. Clearly, the scope of thisproblem increases as a function of .the number ofincumbent vendors.Network Interconnection IssuesThe interconnection of systems into a networkhas been the subject of many studies. The goalsof these studies have tended to vary based uponorganizational need; hO\yever, the followingcommon questions can be i identified1:• What functions will be ' provided to the endusers?• Should those functions be equally accessibleby users in all the interconnected subnetworks?• What security constraints must be in effect toprevent network resources from being compromised?• What level of transparency can be provided toend users so that accss to new networkresources is accomplishfd via existing mechanisms?f• What level of network protocol compatibilitywill be required to allow effective interworkingbetween any two arbitrary subnetworks?• What levels of performance capability will berequired?• What "political" considerations have to betaken into account when interconnecting subnetworks?·• How can the combined network be mosteffectively managed?<strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 1986 ·35


The DECnet/SNA Gateway Product• How effectively can component fault isolationbe accoryplished?'• How will resource utilization be crosscharged?• How effectively can the combined networkmigrate to new technologies?The most important prerequisite for answeringthese questions is a clear understanding of enduserneeds at all times. Failure to understandthose needs can result in a significant expenditureof effort to solve the wrong problem, usuallyat the wrong time.Possible Solutions to TheseQuestionsTwo primary approaches to answering thesequestions are possible. First, an organizationcould take upon itself the effort of building customhardware, software, and procedures to effectthe desired solution. Unfortunately, thisapproach is usually an enormous task with drawbacksin terms of cost, time, maintainability, andSyStem migration, to name but a few. Some organizationshave done it, however, with varyingdegrees of success.The second approach is to use standard productsas the means to the desired end. Thisapproach that can take several forms:1. An organization could acquire computingequipment from only one vendor. While certainlylimiting the intercommunication risk,this approach has potential drawbacks interms of flexibility and cost-effectiveness.Furthermore, it creates the risky situation ofa business's having only a single supplier fora key organizational resource.2. An organization could limit purchases ofequipment to several vendors. While offeringbetter flexibility and cost control, thisapproach can complicate an interworkingstrategy unless the ability of the equipmentfrom different vendors to operate together iscarefully scrutinized.Neither approach, however, is satisfactory for theorganization that owns equipment from morethan three or four vendors and does not wish toincur the risk and expense of building customsolutions. This organization must depend onsome external communications standard that issupported by all the equipment that it intends toacquire.The Advent of Open SystemsFortunately for this organization (and many ofthe world's major corporations fit into this category), the international standards process isbeginning to provide a framework to solve thisproblem. Substantive definition is now underwayof the services and protocols at each layer ofthe Open Systems Interconnect (OSI) model,shown in Figure 1. Common services spanning amultitude of vendor equipment will begin to berealized by the end of the present decade. Insome cases, subsets of OSI services are being utilizedmuch now by major users to bring aboutstandards in particular application areas. Twoprominent examples are General Motors with itsManufacturing Automation Protocol (MAP) forfactory applications, and Boeing Computer Serviceswith its <strong>Technical</strong> Office Protocol (TOP)for the office environment.These efforts are significant and over time arecertain to create a much more open environment.However, what types of solutions areavailable now for organizations whose applicationsdo not fit these examples and who cannotwait for full OSI implementations? Such organizationsare generally faced with either buildinga custom solution or making use of vendor-suppliedsolutions. Two prominent examples ofthe second approach are the Systems NetworkArchitecture (SNA) from IBM Corporation andthe <strong>Digital</strong> Network Architecture (DNA) from<strong>Digital</strong> Equipment Corporation. The next fewsections introduce the key properties of theseFigure 1APPLICATION (LAYER 7)PRESENTATION (LAYER 6)SESSION (LAYER 5)TRANSPORT (LAYER 4)NETWORK (LAYER 3)DATA LINK (LAYER 2)PHYSICAL (LAYER 1)Open Systems Interconnect Model36<strong>Digital</strong> TecbnlcalJournalNo. 3 <strong>September</strong> 1986


New Productsarchitectures and discuss some key considerationsto be addressed when interconnecting thetwo.Systems Network Architecture -An OverviewSNA has been in existence since 197 4. It isdefined by IBM Corporation as '' ... a totaldescription of the logical structure, formats, protocolsand operational sequences for transmittinginformation units through and controllingthe configuration and operation of networks. "2As such, SNA is an all-embracing networkarchitecture, implemented in products frommainframes to personal computers. Current estimatesare that from 30,000 to 40,000 networknodes in the world today operate some · form ofSNA interface.The layered structure of SNA is shown in Figure2. One property of SNA that makes it differentfrom most existing network architectures isits hierarchical node structure and subset architecture.It is also unique in that it accommodatesparticular function types (known as logical unittypes) .SNA Node Typ esShown in Figure 3 are both a sample SNA topologyand an illustration of all possible node typesthat can logically coexist within the network.The physical unit type 5 (PU_T5), or host node,is the functionally richest node. It is typicallybased on System-370 architecture and contains acomponent known as the system services controlpoint (SSCP) . SSCP is responsible for much ofthe control and management of up to all of anSNA network and usually contains the primaryapplication subsystems.Application subsystems are usually complexapplication programs which support both interactiveterminal access along with value-addedfunctions, such as transaction processing or generaltimesharing. These subsystems include theCustomer Information Control System (CICS) ,the Information Management System (IMS) , andthe Time Sharing Option (TSO) program products,all of which network users usually need toaccess. The implementation of SNA on a hostnode is typically split between the primary applicationsubsystems and the Advanced CommunicationFunctionjVirtual TelecommunicationsAccess Method (ACFjVTAM) program, whichimplements the SSCP function.TRANSACTION LAYER'PRESENTATION SERVICES LAYERDATA FLOW CONTROL LAYERTRANSMISSION CONTROL LAYERPATH CONTROL LAYERDATA LINK LAYERPHYSICAL LAYERFigure 2iLayers of SNASNA also makes heavy use of communicationsfront-end processors that typically areeither IBM 3705- or 3725-class machines.These processors are classed as physical unittype 4 (PU_T4). They typially perform all theclassic front-end tasks, such as line polling, datalink handling, message unit routing, flow control,and error recovery and notification. ThePU_T4 function is generally implemented in theAdvanced Communication FunctionjNetworkControl Program (ACFjNCP) software . GivenIcurrent SNA definitions, it is not possible to supportan SSCP function on a PU-T4.CLUSTERCONTROLLERNODE(PU_T2)Figure 3IHOSTNODE .(PU_T5)FRONT ENDNODE(PU_T4) ;ITERMINALNODE(PU_T1)ISNA Node TypesICLUSTERCONTROLLERNODE(PU_T2)<strong>Digital</strong> TecbntcalJournmNo. 3 <strong>September</strong> 198637


The DECnetjSNA Gateway ProductFrom an architecture standpoint, both thePU_T5 and PU_T4 nodes have a fairly highdegree of intelligence and possibly some massstorage. Thus their associated SNA definitions arefairly rich in function and rather complex in definition.On the other hand, such devices as theIBM 3274 information-display control unit andthe 3 7 7 6 remote job entry workstation areassumed to be limited in both intelligence andstorage. These operations are detailed in thephysical unit type 2 definition. The SNA nodedefinition for this class of device is more limitedin that both node and end-user communicationsoperate in the slave mode of a masterjslave relationship.The final node type in SNA terminology istypically associated with single-unit, limitedfunctionterminals, such as the 3 767 communicationsterminal and the 3 2 71 model-1 1 displayunit. This type is known as physical unit type 1.It was much more prominent in the early days ofSNA when the architecture was more orientedtoward terminal-mainframe communication thanis the case today. Given current technologytrends, it is likely that this particular node typewill become increasingly de-emphasized, exceptperhaps as a migration mechanism for the interconnectionof pre-SNA devices or non-IBMequipment.Program-to-Program Communicationwithin an SNA NetworkInterprogram communications within an SNAnetwork are realized via an architectural componentknown as a logical unit (LU). The LU can beenvisioned as a port from which an applicationprogram can obtain the services of an SNA network.SNA communications through a logicalunit are managed via an entity known as a logicalunit services manager. This entity is responsiblefor interfacing end-user communicationsrequests into the SNA network.Logical units are further classified by the typeof layered function the application programschoose to realize. There are specific logical unittypes that are pr edefined by IBM Corporation tocorrespond to particular layered functions that"standardize" the use of SNA capabilities in realizingmainstream usages. Most logical unit typespredefine terminal- and printer-to-host programfunctions; these LUs are called types 1, 2, 3, 4and 7. The exception to this classification is inthe definitions of logical unit types 0 and 6.2.Logical unit type 0 is unique by virtue of its"nondefinition" (that is, it can be defined by auser to implement any form of desired programto-programfunction). Logical-unit type 6.2 issignificantly different from the other LU types.Its definition changes the semantics of an LUfrom that of a network port to that of a distributedoperating system. LU6.2 is used mainlyfor transaction processing; it is a primary indicationof the future direction of SNA. 3An Example of LU-to-LUCommunicationThis section discusses briefly the LU-to-LU communicationwithin an SNA network. 4 Start withthe simple SNA topology as shown in Figure 4.Assume that an end user attached to a 3270 displaystation cluster controller node B wishes toenter into an SNA session, or communication dialogue,with a CICS subsystem executing on hostnode A. For this session to begin, one side mustinitiate the request. Typically, that is done at thedisplay station via either an unformatted log-onor a formatted SNA initiate-self message. Thismessage is issued by the logical unit servicesmanager at the cluster controller in response to3270CLUSTERCONTROLLERDISPLAYSFigure 4HOST SYSTEMCICSSUBSYSTEMFRONT ENDPROCESSOR3270CLUSTERCONTROLLERDISPLAYSSample SNA TopologytSNASESSION38<strong>Digital</strong> TecbnicaiJournalNo. 3 <strong>September</strong> 1986


New Productssome display-specific user action. Such an actioncould be pressing the system request (SYSREQ)key and typing some log-on text. The requestaction is then directed to the SSCP in the hostnode, which in this example is the controllingSSCP for both the cluster controller and the CICSsubsystem.After receiving the log-on request, SSCPinforms the CICS subsystem that a user at a particularLU in the network wishes to communicate.If the subsystem chooses to accept the connection,it does so via a request that causes anSNA message unit, known as a "bind," to flowover the network to the destination LU. A bindindicates the willingness of the subsystem tocommunicate with the terminal and includes alist of session parameters to which both sides areexpected to conform. If these parameters areacceptable to the display management logic inthe cluster controller, this logic then requeststhe logical unit services manager to transmit anSNA "positive" response to the bind message. Atthis point both sides are ready to begin exchanginguseful data.Useful data is exchanged by both partnersusing protocol sequences defined by the logicalunittype 2 definitions (the SNA logical unit typedefined for 3270-to-host program communication). The exchange continues until one partner(typically the user at the display) decides to terminatecommunication. Termination is generallyaccomplished via the transmission of either anunformatted log-off or a formatted terminate-selfmessage from the control unit to the SSCP. Uponreceiving this request, the SSCP logic informsCICS that the user wishes to terminate communications.At that point CICS requests that an SNAunbind message be sent to the control unit to terminatethe session properly and to deallocate allassociated resources. This generic protocolexchange is illustrated in Figure 5.<strong>Digital</strong> Network Architecture -An OverviewDNA (announced in 1975) has been in existencealmost as long as SNA and is implemented onapproximately the same number of networknodes. DNA was originally conceived as a meansto facilitate DEC-to-DEC communication inapplications areas such as program-to-programcommunication, remote file transfer and access,remote terminal access, and down-line loadingof diskless systems. DNA's scope has beenexpanded to include areas such as DEC-to-non­DEC communications, particularly terminal-hostaccess, access to non-<strong>Digital</strong> systems over X.25-based public data networks (PDN), and access tothe resources of SNA-based hosts. The structureof the DNA architecture is illustrated in Figure 6.Unlike the somewhat hierarchical SNA structure,the DNA structure is symmetric .and peerto-peer,with any two processes being free to!CICSBIND=.:c..:.::.... _______UNBINDFigure 5-DATA EXCHANGE -'ICLUSTER CONTROLLERUNFORMATTED LOGON/INITIATE-SELFUNFORMATTED LOGOFF/TERMINATE-SELFCICS- Cluster Controller SessionExchangeUSER LAYERNETWORK MANAGEMENT LAYERNETWORK APPLICATION LAYERSESSION CONTROL LAYEREND COMMUNICATIO LAYERROUTING LAYERDATA LINK LAYERPHYSICAL LA YEAFigure 6itLayers of DNADigittil Tecbnical]ournalNo. 3 <strong>September</strong> 198639


The DECnetjSNA Gateway Productcommunicate, provided that naming and securityconstraints have been satisfied. DNA has onlytwo node types: end and routing. The nature ofapplication process communication is determinedby using either <strong>Digital</strong>-defined protocolsat the network application layer or protocolsdefined by end users. A communication instancebetween two partners is called a logical link,with partners being identified via a networknode name (associated with a network-uniquenode address) and a particular object typeidentifier. 5Interconnection Issues betweenDNA and SNA NetworksAn interconnection between DNA and SNA networksinvolves a number of the questions raisedearlier in this paper about network interconnection.Some effective answers to these questionsare provided by a product called the DECnetjSNA Gateway. This product was developed by<strong>Digital</strong> Equipment Corporation to address thesemany issues. Other issues, such as the nature ofhardware interconnection used, the extent towhich common services are shared, the architecturalinterface of each network (and its associatedusers) to the other, the cross-network certificationof products, and installation factors,were all addressed as pan of the development ofthis product.6·7The remaining sections of this paper addressthe DECnetjSNA Gateway from two perspectives.The first examines the architectural philosophyand protocols upon which the gateway is based.The second examines the actual components ofthe gateway in terms of both the hardware packagingand software modules used to give thedesired degree of interconnection. In both sectionsattention is given to the question of howeffectively the design approach addressed manyof the aforesaid issues.The Need fo r a DNAjSNA GatewayA fundamental decision in attempting to bridgethe gap between two different network architecturesis whether to simply incorporate one architectureinto the other or to employ some form ofgateway. In the case of a DNAjSNA bridge, reproducingall of SNA into each DNA implementationwas not feasible, given SNA's size and complexity.On the other hand, implementing the DNAarchitecture into the confines of an SNA nodewas an attractive technical alternative. This con-cept was rejected, however, due to dauntingmaintenance and support considerations. Basedon these factors, we adopted the gatewayapproach.Building this gateway was a tricky businessbecause the two architectures differ not only intheir detailed protocol specifications but also inthe services provided at layer boundaries.Despite superficial similarities (both architectureshave seven layers, both support meshtopologies, and so forth) , SNA and DNA are aboutas dissimilar as can be.For example, at the network layer, DNA routingprovides a connectionless service using anadaptive routing algorithm; conversely, SNApath control provides a connection-oriented serviceusing quasi-fixed routing. At the transportlayer, the DNA structure uses a standard, symmetric,three-way handshake to establish connection;whereas SNA uses an asymmetric, threepartynegotiation. At the session layer, the DNAarchitecture provides simple process-bindingand access-control functions; whereas SNAprovides complex data-phase services, suchas chains, brackets, and multiple acknowledgmentschemes. At the application layer, the differencesare even more pronounced. A centralDNA application service is file transfer andaccess, which SNA does not support at all.Conversely, a widely used SNA application serviceis remote job entry, for which DNA has nocounterpart.Possible Gateway ArchitecturesThere were three possible architecturalapproaches to bridging this gap. First, a protocoltranslation gateway could be used to find a sufficientlysimilar pair (or pairs) of protocols andprovide a deterministic mapping between theirmessages. Second, a one-way encapsulation gatewaycould be used to select one protocol of onearchitecture and encapsulate all higher-layerprotocols of the other architecture. Third, a twoway,or mutual, encapsulation gateway could beused to operate as two one-way gateways to carrythe higher-layer protocols of each architectureover the lower-layer protocols of the other.Now, protocol translation gateways have the(at least theoretical) advantage of providingtransparent communication between two incompatiblenetwork architectures. They do that byhiding the differences inside the gateway box.Our initial attempts at designing an SNA gateway40<strong>Digital</strong> TecbnlcalJournalNo. 3 <strong>September</strong> I 986


New Productscentered around this model. We abandoned it,however, because the two architectures providedsuch different services that, even if theprotocols could be mapped, both user interfaceswould have to be altered, perhaps radically. Thethird alternative, a mutual-encapsulation gateway,suffered from these same maintenance andsupport difficulties; since DNA higher-layer protocolswould have to be built to run on SNAnodes.Therefore, we chose a one-way encapsulationgateway because it appeared relatively easy toimplement the SNA higher-level protocolswithin a DNA network. Moreover, this solutiondid not require any DECnet-specific software orhardware components to be introduced into theSNA environment.One-way encapsulation operates at the transportlayer. The SNA notion of a "session" ismapped onto the DNA notion of a logical link.This mapping is accomplished by carrying allSNA session data, including the connection establishmentmessages, over to the data phase of theDNA logical link. Choosing this mapping meantthat SNA-oriented applications on DNA nodescould be written as though they used directly theservices of the transmission control layer of SNA.This has worked quite well in practice since wehave implemented many mainstream SNA applicationprotocols successfully through the gateway,including 3270 data-stream and DISOSSaccess. The main disadvantage is that each newSNA application protocol requires a completeimplementation on the end-user's DNA nodebefore an application can be run in the DNAuniverse.In some cases the need to have applicationspecificcode on the same node can be avoidedby building a "server," an approach describedlater in this paper. In an architectural sense, theserver is just one user of the encapsulation gatewaymechanism described above.<strong>Digital</strong>'s SNA Product ArchitectureThe SNA product architecture has been developedby <strong>Digital</strong> to provide a framework within. which our network products can be designed.The most important objectives for this architectureare• To promote and support the idea of theDECnetjSNA product set as a fa mily of products,with each product being a part that fitswell into the whole• To allow for the easy incorporation ofDECnetjSNA functions into other <strong>Digital</strong> products,thus providing our Customers with integratedsolutions• To enable the modular development of functionalpieces so they can be re-used in severaldifferent products• To provide an architectural base that is com-1moo between the network interconnect (gatewaybox) and single-system interconnectproducts• To provide a structure that can accommodatefuture developments in both hardware andsoftware, without negating existing investments1None of the preceding objectives are surprising;the architecture defines a workable segmentationof the SNA function that we can and do usein product development. The SNA product architectureis the master architecture; we have alsodeveloped other architectural specifications thatprescribe specific aspects 1 of individual products.The SNA gateway-access and gateway-managementarchitectures are described later in thispaper.Layered Structure of the ArchitectureIThe SNA product architecture distinguishes fiveseparate layers in a product. In descendingorder, these layers are as follows :• The functional layer• The SNA interface layer i• The common network layer• The data link layer• The physical layerThe layers are shown in Figure 7.In a particular product, t he layers of the architecturecan be physically distributed in a numberof different ways . Currently, we use two distinctdistributions, called the gateway access modeland the server model. The important differencebetween the two models is how much of aproduct's function is physically located in thegateway node. !In the gateway access model, the functionaland SNA interface layers are contained in aprocess that executes in the end-user's node,whereas the common network layer and lower<strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 198641


The DECnetjSNA Gateway ProductUSER 3270 DIA/PROG T.E. DCABASIC MODEJLUO I LU2 J LU6.2 1EXTENDED MODEjSTANDARD COMMON NET INTERFACEGATEWAYACCESS METHODISINGLE SYSTEMACCESS METHODCOMMON SESSION CONTROL ANDPATH CONTROL FUNCTIONSlSTANDARD DATA LINK INTERFACEI ETC.STANDARD DRIVER INTERFACEETC.TO SNA NETWORK (3725 FRONT END)Figure 7FUNCTIONALLAYERSNAINTERFACELAYERCOMMONNETWORKLAYERDATA LINKLAYERPHYSICALLAYERLayers of <strong>Digital</strong>'s SNA ProductArchitecturelayers execute in the gateway node. The transportmechanism by which the SNA interfacelayer communicates with the common networklayer is defined in the section SNA GatewayAccess Architecture.In the server model, all five layers execute inthe gateway node (more properly called a servernode in this role) . The way in which the endusergains access to the server depends on theserver itself; the SNA product architecture doesnot specify that way. Existing DNA protocols areused where appropriate.Functional LayerThe functional layer is the highest in the SNAproduct architecture and implements the actualend-user function. The translation from SNA presentationprotocols to DECnet presentationmedia and formats takes place in this layer. Thefunctional layer can contain programs suppliedeither by customers or by <strong>Digital</strong>'s applicationsoftware groups, as well as entities that are partof DECnetjSNA products. Two such productsare described later: the DECnetjSNA VMSDISOSS document exchange facility (DDXF) ,and the DECnetjSNA VMS remote job entry(RJE) .SNA Interfa ce LayerThe SNA interface layer provides access into theSNA network. Three different access levels ofinterface are offered: a basic interface, which isvery close to that offered by the common networklayer; a so-called extended interface,which offers generic support for the data flowcontrol and transmission control protocols; andseveral LU-mode interfaces, each of whichimplements a particular logical unit type (sessiontype) .The reason for the existence of three differentlevels of access is to some extent historical. Thatis, we fi rst gained design experience using thebasic level of interface and then were able toabstract and isolate the functions comprising thehigher levels.The boundary between the functional and theSNA interface layers is somewhat indistinct dueto the different levels of interface offered by thelatter. Nevertheless, in any particular casethe structural distinction is a useful one, providingas it does a standard model for programstructure.The choice of which interface level to useinvolves a compromise between ease of use andflexibility in manipulating the SNA protocol.It is analogous in some ways to the choice ofwhether to program in assembly language or ina higher-level language. The LU-mode interfacesoffer specialized, easier-to-use interfa ces;the lower-level interfaces allow greater controlover protocol operation at the cost of additionalprogramming.The SNA interface layer is the lowest that ispublicly accessible. Several programming interfaceproducts (basic mode, LUO, LU6.2 and3270 data stream) are available and form part ofthis layer. In fact, the definition of these prod-. ucts was derived from the definition of the SNAproduct architecture.42<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New ProductsCommon Network LayerThe common network layer provides routing andmultiplexing functions between the da ta linklayer below and the SNA interface layer above.Data units received are routed to the entity (inthe SNA interface layer) that owns the SNA sessionto which the data units belong. Data unitssent are routed to the appropria te data link layer.This layer implements the SNA path-controlprotocols and some - but not all - of thetransmission-control protocols, including thecommon session control. The PU and LU servicesmanagement functions are also pa rt of thislayer.Data Link LayerThe data link layer provides error-free transmissionof data units over a physical link. Currentimplementations of the data link layer supportonly the SDLC secondary station mode. However,the structure could allow the addition of otherdata link protocols (such as X. 25) in the future.This la yer corresponds exactly to the da ta linklayer in both the DNA and SNA architectures.Physica! LayerThe physical la yer provides the means to controland use the physical connections that transmitdata between systems. This la yer can support awide variety of device types and interface standards,and can be implemented in various hardwarejsoftware mixtures. This layer correspondsexactly to the physical la yer in both the DNA andSNA architectures.<strong>Digital</strong>'s SNA Gateway AccessArchitectureThe SNA gateway access architecture prescribes ·the transport mechanism that allows SNA inter·fa ce layer modules in a DECnet host node to gainaccess to common network layer modules in anSNA gateway node. This mechanism is at theheart of the gateway access model for productdesign. Hence it is used implicitly by most of thecurrent DECnetjSNA product set.Overview of the ArchitecturalStructureTwo modules are needed to implement the SNAgateway service: the SNA access module and theSNA gateway module. The two modules communicateby means of a protocol that operates overa DECnet logical link. The protocol is termed theSNA gateway access protocol, or GAP, and isdepicted in Figure 8. GAP is a fairly straightforwardDNA application layer protocol. It makesuse of the features provided by the DNA sessioncontrol and lower la yers, in , particu1ar, flow control,error control, and message segmentationand reassembly. Hence GAPineed not contain anysuch mechanisms itself.The SNA access module is! part of the SNA interface layer, implementing what was referred toabove as the ba sic level of interface. The SNAgateway module runs as a separate process in thega teway system, in which the module uses servicesthat provide it with the functions of pathcontrol and.common session control.A brief example is presen ted in the followingsection in lieu of listing the specific operationsof GAP in detail. This example illustrates some ofthe message flows that take place between theSNA access and gateway modules.Example of Message !flowsThis example describes the( exchanges needed toestablish a session and to transfer data, with thesession being terminated by the IBM application.Figure 9 illustrates the actions tha t take place. Inthe following description, the "user" is a higherlevelentity that utilizes the SNA gateway service.In the context of the SNA product architecture,such a user will in fact be part of the SNA interface layer.To initiate the connection, the user programissues a connect call. Included in the parametersof this call are the name of the ga teway node, thesecondary LU (SLU) address to be used, the nameof the primary LU (PLU) to be the session pa rtner,and sundry other SNA parameters requiredfor the connection. 1The SNA access module ihen allocates internalIresources and establishes a DECnet logical linkto the SNA gateway module in the specified gatewaynode. In turn, the SNA gateway allocatesresources for the session and waits.The SNA access module then transmits a GAPconnect message to the SNA gateway module.The SNA gateway module allocates the requestedSLU address and transmits an SNA initiate-selfmessage to the SSCP, which informs the PLU viaan SNA control-initiate message.Eventually, the PLU transmits a bind to thegateway. The bind is forwarded to the SNA access ·module as a bind-data mes1>age and ultimately tothe user program in resporyse to a read-bind call.I<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 198643


The DECnetjSNA Gateway ProductThe user program then agrees to the session byissuing an accept message, which causes the SNAaccess module to send a bind-accept message tothe SNA gateway module. The gateway modulethen acknowledges the bind (that is, transmits apositive response), and the LUs are now consideredto be in session.The user program can now exchange datamessages with the IBM application. To effectan exchange, the program uses the transmit-callsand receive-calls functions of the SNA accessmodule. Note that higher-level protocol initializationmay well be needed before true enduserdata exchange can begin. Such details,however, are not known to the SNA access andSNA gateway modules. During this data transferphase, the SNA gateway module operatesas a simple message switch, passing data toand from the SNA access module withoutinterpretation.At some point, the PLU will terminate the sessionby sending an unbind message to the gateway.At that point the SNA gateway module disconnectsthe logical link with the SNA accessmodule, supplying an appropriate reason code inthe disconnect message. The user program willread this reason code and issue a close-call tocause the SNA access module to deallocate itsresources.Relationship between Productsand ArchitectureThe SNA product architecture allows considerablefreedom with respect to the distribution offunctions when a particular product is beingdesigned. Various amounts of a product can belocated in the DECnetjSNA Gateway, whatever isappropriate for the design. <strong>Digital</strong>' s currentproducts conform to either the gateway accessmodel or the server model.DEC netSYS TEMI(USERMODULESNAACCESSMODULEIDEC netNETWORK)SNA GATEWAY ACCESSPROTOCOL (OPERATEDOVER A DNA LOGICALLINK)DECnetSYSTEMCONTAININGGATEWAYSNAGATEWAYMODULEISNAPROTOCOL EMULATIONMODULESNA/SDLCDATA LINK CONTROLMODULE(SNANETWORK)Figure 8Structure of <strong>Digital</strong>'s SNA Gateway44 <strong>Digital</strong> TecbnicaljournalNo. 3 <strong>September</strong> 1986


New ProductsThe DECnetjSNA VMS DISOSS documentexchange facility (DDXF) is an example of thegateway access model. DDXF allows documentsto be transferred between V AXJVMS systems andIBM DISOSS. On the other hand, the DECnetjSNAVMS remote job entry (RJE) is an example of theserver model. RJE is a traditional remote batchworkstation emulator that allows VAXjVMS usersto submit jobs to an IBM host for processing andto have the job output returned to the V AXJVMSsystem.Building a product to the gateway modelallows the greatest use of common code modules.The product designer need concern himselfonly with the functional layer and, if no standardinterface module is available, with the SNA interfacelayer. Also, no product-specific software hasto be included in the gateway node.Building a product to conform to the servermodel can re move some of the processingfrom the host node. The cost, however, will beCONNECTincreased use of gateway resources and the introductionof product-specific; support in the gateway.Whether those trade ,offs are acceptable1depends on the product.DDXF as an Exa mple of the GatewayAccess Model ·The IBM DISOSS system use three important IBMarchitectures:• The Document Content Architecture (DCA) ,which prescribes the format and content ofdocuments• The Document Intercnange Architecture(DIA) , which defines the ! format and protocolsused to transfer documents between office systems• The logical unit type 6.2 (LU6.2), whichdescribes the SNA session protocols used toimplement the communications functionf' JCONTROL--'---=---i INITIATE-SELFPOINTLcoNTROL-,Nmm jBIND IBIND-DATASEQUENCEOFEVENTSI BIND-ACCEPT-DATAj_I DATADATADATAUNBIND IUSERPROGRAMJABORTGAP OVER ADNA LOGICALLINKL JGA'EWAYSNA SESSIONPROTOCOLSlPR,MARYLOGICALUNITFigure 9Gateway Access Message Flow<strong>Digital</strong> TecbnicalJournal 45No. 3 <strong>September</strong> 1986


The DECnetjSNA Gateway ProductDECnetSYSTEMI RINPUT JOBFILESJOB WORKSTATION DECnet' AcE_-_______oPE__-RAoR-__________ --;---OUTPUT FROMPRINT/PUNCHSTREAMS1F LE--(DECnetNETWORK)ICDEC netSYS TEMco NTAININGGAT EWAYRJESERVERISNAPROTOCOLEMULATORIjSNA/SDLCDATA LINKCONTROLFigure 10Structure of Remote jo b Entry Fa cilityThe functional layer for DDXF handles both theDIA protocols and the conve rsion to and fromDCA document formats. The functional layeruses the LU6 .2 component of the SNA interfacelayer. Internally, the LU6.2 interface uses the servicesof the extended-mode interface, which inturn is supported by the basic-mode interface.The basic mode of the SNA interface layer must inturn communicate with the common networklayer; the latter is located in the DECnetjSNAGateway node, and communication takes placeusing the SNA gateway access mechanism discussedearlier.The common network layer uses the servicesof the SDLC module in the data link layer. Inturn, this layer uses whichever physical layermodule is appropriate to the communicationshardware.R]E as an Example of the Server ModelThe functional and SNA interface layers for RJEboth appear in the RJ E server process, which executesin the gateway. Figure 10 depicts the structureof RJE. The functional layer converts data toand from the formats used by SNA remote jobentry. This layer uses the DECnet remote fileaccess facility to read and write files located onthe DECnet host node. The SNA interface layerimplements the LUI protocol for remote workstations.The RJE SNA interface layer uses thecommon interface layer software in the gatewaynode, as does the SNA gateway module.The end user communicates with the RJ Eserver by using one of two command interfaces(workstation operator or unprivileged user) residenton the DECnet host node. These interfaceprograms use a private protocol, operated over aDECnet logical link, to pass commands to theserver .46<strong>Digital</strong> TecbnicaJJournaJNo. 3 <strong>September</strong> 1986


New Products<strong>Digital</strong>'s SNA Gateway ManagementArchitectureThe problem of network management is probablythe greatest impediment to effective intervendornetworking. By network management wemean configuration, performance monitoring,and fault diagnosis.In the DECnetjSNA Gateway product, wechose to partition the management problem intothree parts: management of the DECnet components,management of the SNA protocol components,and management of individual servers.This division mirrored both the logical andphysical structure of the gateway and made themanagement problem tractable.We did not attempt to implement a managementgateway. It is not possible for a networkmanager on a <strong>Digital</strong> system to display or modifyoperational parameters of the SNA network. Noris it possible for the manager of an SNA networkto display or modify parameters of the DECnetnetwork or of the DECnetjSNA Gateway itself.(It is possible, however, for either manager tolog in remotely to a system in the other networkand use its management utilities. A briefdescription of this capability is described in thesection Remote Access.)Management of DECnet ComponentsThe management of the DECnet components ofan SNA gateway node is performed according tothe DNA network management model, usingstandard DECnet management utilities from ahost DECnet node. (This paper does not discussDECnet management. See reference 8, theDEC net DNA Phase IV Network ManagementFunctional Specification , for more details.)Gateway Management EntitiesFor the SNA components of the gateway, weher;e define a model that is similar in a generalsense to th e model described in the DECnetDNA Phase IV Network Management FunctionalSpecification. Our model treats the SNAsoftware as a number of named entities, each ofwhich is the focal point fo r some managementoperation. An entity typically has pa rametricinformation that can be read and perhaps writtenand maintains current state information thatcan be read. An entity also maintains a set ofcounters that can be read andjor zeroed andsends event messages to a central logging facileity. The manageable entities are the line, the circuit,the PU, the LU, the access name, and theserver.Not all entities possess all the above properties.For example, the access name is a fairlystatic entity possessing only a name and someassociated parameters. However, the accessname has no state, no counters, and generatesno events. In contrast, a: circuit has a name,Iparameters, and a state anq also maintains countersand generates events. ( ·The line and circuit entities correspond to theidentically named entitie of DECnet management.The line entity represents the physicallayer, the circuit entity th data link layer. ThePU and LU entities represent the SNA physicaland logical units respectively. These have nodirect counterparts in DECnet management.There are certain similarities between the PUand DNA' s ECL, and the LU and DNA's session(both of which are subsued into the managemententity called executor node).An access name entity has no direct counterpanin the IBM environment. This entity is simplya shorthand form for specifying any parametersrequired to establish a session between theDECnet and SNA networks! A user process is ableIto specify an access name instead of providing. Ithe individual connection parameters. A serverentity is one that represents some son of serviceprovided to users. Two eamples of applicationservers in the SNA gateway node itself are theSNA gateway module and the RJ E server module.In a sense, the view of the gateway that resultsfrom these entity definitions is a simplificationof the gateway architecture. This simplified· view is desirable because it lessens the effortneeded to understand and hence to manage thegateway. This idea of having a simplified modelof the network for the purpose of managementis common to both SNA and DNA.Configuration ManagementConfiguration and oper,a ting parameters forthe SNA components in[ the gateway are setor modified by using al management utilityrunning on the DECnet qost node. This utilitycommunicates with the SNA network managementlistener in the gateway node.The general command syntax is derived fromthat of DECnet management but has beenchanged to accommodate the details of SNA<strong>Digital</strong> Tecbnlcal]ournalNo. 3 <strong>September</strong> 198647


The DECnetjSNA Gateway Productoperation. A few examples of the syntax aregiven as follows:SET LINE DSV-0 DUPLEX FULLSET CIRCUIT SDLC-0 ADDRESS 40STATION 1D OOOOODECSET ACCESS NAME CICS CIRCUIT SNA-0APPLICATION CICS6 LU LIST 1-16In these examples, LINE DSV-0, CIRCUITSDLC-0, and ACCESS NAME CICS identify theentities to which each SET command is to apply.The rest of each SET command string is asequence of parameter names, each followed bythe value to be set. Thus ADDRESS 40 indicatesthat the address parameter (the SDLC secondarystation address) is to be set to the value hexadecimal40.Performance MonitoringSNA gateway management maintains counters,associated with various entities, which recordstatistics. These include the amount of data transmitted,the number of CRC errors occurring on aparticular data link, any buffer availability problems,and so forth. By examining these countersperiodically, the DECnet network manager cansee how well the various components of the gatewayare performing and whether or not any problemsneed to be investigated.Fault DiagnosisThere is considerable overlap between performancemonitoring and fault diagnosis. Frequently,poor performance is the first indicationof a fault; thus counters can be viewed as faultdiagnosisaids. Event-logging messages are alsouseful in diagnosing faults; for example, frequentcircuit-down events could indicate hardwareproblems. Event messages are generated byvarious software components in the gateway andare sent to the DECnet host. Usually the eventsare displayed on the operator console of thehost; they may also be collected in a file for latei:analysis.Gateway management also supports commandsthat allow loopback testing to be performed.The system can isolate failing hardwarecomponents by using loopback at various levels.Management of ServersServer management is perhaps the least welldefined area of the SNA gateway managementarchitecture. The range of different servers thatmay be implemented makes it difficult toinclude sufficient support for all servers. Thus ifnecessary, each server implements its own managementutility.Remote AccessAlthough neither the DECnet network managernor the SNA network manager can directly controlthe other, it is possible to use remote terminalaccess mechanisms to effect some degree ofindirect control. The manager must log in to asystem in the "other" network, and hencebecome a user of that network, in order to gainaccess to the network management programs.The DECnetjSNA VMS 3270 terminal emulator(TE) allows the DECnet network manager to logon to IBM applications, including the SNA networkmanagement utilities, such as NCCF. Hecan thus control the SNA network to the extentthat he has the privilege to do so.The DECnetjSNA distributed host commandfacility (DHCF) allows the SNA network managerto log on to a VMS system through the gateway.Once logged in with sufficient privilege, he canissue NCP commands and hence control theDECnet network.DECnet/SNA Gateway ComponentsThe DECnetjSNA Gateway provides a protocolhandlinginterface between SNA and DECnet networks.The gateway, introduced in 1982, was thefirst product to provide remote functions over anetwork from a closed server system. The gatewayconsists of several software components;Figure 11 provides an overview of its majorparts. The base software is the RSX-1 1S operatingsystem with a communications supervisor calledthe CommExec. These two components providethe environment for both the DECnet-RSX softwareand the RSXjSNA protocol emulator.RSX- 11 S, Communications Executive,and DECnet-RSX SoftwareWe chose the RSX-1 1S software for the followingreasons:• The RSX-11S operating system was well documentedand provided a good developmentbase by means of the RSX- 11 M operating system,a well-tested one.• The RSX- 1 1 S system, being a memory-only system,required no expensive peripherals thatwould add to the cost of the gateway.48<strong>Digital</strong> TecbnlcaljournalNo. 3 <strong>September</strong> 1986


New Products• Th e RSX-l lS system could be down-lineloaded. This was proven and establishedtechnology.• RSX-l lS had host support for system images.Thus the RSX-l lS system provided a sound startingpoint. Furthermore, th e RSX communicationsexecutive and DECnet-RSX products alsoprovided a known base that worked well withthe host facilities provided in <strong>Digital</strong>'s operatingsystems.RSX Communications Executive9The communications executive is a group of softwaremodules that create an environment withinwhich data communication software can executein cooperation with an operating system. Tailoredto the needs of th e communications software,th is special environment shields data communicationsprograms from involvement withthe internal mechanisms of the host operatingsystem. Just as the operating system supervisesthe execution of user programs on the computer,so the communications executive supervises theexecution of data communications software.Together with the software it manages, the communicationsexecutive can be considered a dedicatedcommunications subsystem.RSXjSNA Protocol EmulatorThe RSXjSNA protocol emulator (PE), an existingproduct, provided a starting point for theIBM SNA network connection requi red by thegateway. Moreover, the PE fsed an early versionof the CommExec. We were able to reduce theengineering effort required to build the gatewayfrom a complete design of basic network functionsto an upgrade of the RSXjSNA PE th atwould work in the DECnet Phase IV environment.Updating the RSXjSNA PE was an easiertask than doing a complete design because theexisting RSXjSNA product was stable and its limitationswere clearly undestood.For example, th e RSXjSNA PE requiredsupport for the SNA message "unbind-bindforthcoming," used with IBM TSO sessions.Th e RSXjSNA PE also needed to supportthe pacing and segmenation of messages.Pacing is th e IBM SNA meth od of flowcontrol; it allows the n l etwork to regulatebuffer usage on an individual session basis.PDP-11/24 SYSTEMRSX-1 1S OPERATING SYSTEMCOMMUNICATIONS EXECUTIVE. SNA INITIAL LOAD- PROGRAM (SNAIL)IBM SNANETWORKSNAPROTOCOLEMULATORGATEWAY ACCESSSERVER (GAS)DECnet-RSXDEC netNETWORKREMOTE JOBSERVER (RJSRV)HOST COMMANDFACILITY SERVER(HCFSRV)Figure 11DECnetjSNA Components<strong>Digital</strong> Tecbnical]ournal 49No. 3 Septem ber 1986


The DECnetjSNA Gateway ProductSegmentation allows user code to send andreceive buffers larger than the buffers used at thedata link level.Philosophy behind the GatewayIn building the gateway, we wanted a system thatcould be developed completely within <strong>Digital</strong>'sengineering groups. We also wanted a systemthat would allow the values chosen by our engineersto be overwritten by customers with theirown rietwork values. This capability requiredthat data structures be allocated dynamically duringthe initial gateway startup.User Level ComponentsThe user level components are as follows:• The initial load program (SNAIL) , which managesSNA configuration with only a DECnetinterface• The gateway access server (GAS) , whichswitches messages from one network to theother• The remote job server (RJSRV) , which providesIBM remote job submission functions• The host command facility server (HCFSRV) ,which provides IBM terminals with a methodof reaching DECnet-VAX hostsSNAIL, the Initial Load ProgramWe chose to accommodate customers' networkvalues to the gateway during initialization bymeans of a plain text file (configuration file) .Such files are common; to all of <strong>Digital</strong>'s operatingsystems, and network routines are provided ··that can read these files.The text in the configuration files for the SNAnetwork is divided into three areas based on thethree SET commands used to configure the gateway.(A SET command in the SNA gateway is similarto the one used for DECnet NCP.)• The first SET command is used to determinethe type of modem signal control being used.This SET LINE command allows two choices:full duplex, meaning the. modem leads areheld "high"; and half duplex, meaning themodem leads are varied according to the sendingand receiving rules.• The second SET command defines the circuitparameters. This command sets up the datalink station address, the number of SNA logicalunits that the circuit can handle, an identificationvalue used in dial-up configurations, andother items in the SNA network.• The third SET command defines a shorthandname, SET ACCESS, for a number of SNA networkconnection items. This command definesa list of LUs, a circuit name, and an IBM application. The user can specify a single accessname rather than all the needed parameters.SNAIL is an RSX-privileged task that deciphersthe RSXjSNA PE data structures and reads theconfiguration file on the DECnet host node.SNAIL parses the commands from the configurationfi le, traces the RSXjSNA PE data structures,and then places the configuration information inthe correct location in the data structures. In thecase of LU databases and access names, eachstructure is allocated and linked into the existingdatabase.Part of the SNAIL code detects errors duringthe command parsing of the configuration filerecords . Not having a console, the gatewayneeds a means of reporting errors, and DECnetevent messages supply that means. The numberand contents of each line in error are mergedinto a line of text that is sent to the DECnet hostnode.After loading the gateway, the DECnet networkmanager must check that the software and informationfrom the configuration fi le have been. loaded correctly. This checking is done by moni·toring the DECnet event messages that appear onthe host. These messages provide status informationon successful steps and error information forfailed steps.Gateway Access Server (GAS)GAS provides support for the gateway access protocols(GAP). GAS is basically a messageswitcher that receives messages from the IBMSNA network session and sends them along thecorrect DECnet logical links. GAS also takes messagesreceived from DECnet logical links andsends them to the correct IBM SNA sessions. ADECnet logical link and an IBM SNA session areassociated with one another at connection time.Connections into the IBM applications are alwaysinitiated from the DECnet side of the gateway.GAS concerns itself only with the SNA bindmessage because it determines the buffe r sizethat the gateway will receive from and transmitto the IBM SNA network. These sizes are allo-50<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


cated from a common buffer pool, and each sideof the connection has a maximum allocationlimit. The buffers are allocated until theallocation exceeds the maximum allowed.This method provides the best allocation ofbuffer memory, but it does not guarantee afixed number of sessions since a single buffercan be allocated that exceeds the allocationlimit for a session. Therefore, the 32 sessionsthat the gateway documentation discussesonly occur if the allocation limits are notexceeded by the sessions. The server must performprotocol work only at the start and end ofthe session.Remote job Server (R]SRV)RJSRV is by far the most complicated program inthe gateway. RJSRV supports multiple SNAremote job entry workstations. Each workstationcontains a DECnet control link, multipleIBM sessions, and multiple network files.This handling of many different linkages hasalmost transformed the server from a simplemessage switcher to a "micro-operating"system, since it performs these activities inreal time. This micro-operating system providesa scheduler for events, common terminationroutines, and common buffer allocationmethods.As with GAS, a DECnet host program initiatesthe connection with RJSRV, thus establishingthe workstation connection. The number ofworkstations and the sessions per workstationare limited only by the available memory inthe gateway. Because of this limitation,each workstation is treated as an RSX programlogical address space (PLAS) region. Sessionscan then be allocated from the PLASregion.The messages from the DECnet control linkare parsed, and the actions taken vary dependingon the current workstation state. At the sametime, IBM SNA sessions may be active, receivingprinter or punch records, or transmitting readerrecords. The server provides all the SNA protocolsfor transmission control (TC) , data flowcontrol (DFC) , and function management headers(FMH) . In addition, RJSRV provides supportfor SNA character strings (SCS) and LU I compression.These facilities and the permutationsof different states make R)SRV rich in functionalityand fairly complex in terms of its internalstructure.<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986Host Command Facillty Server(HCFSRV) !HCFSRV is a program lying1 midway in complex-1 .ity between GAS and RJSR:V. HCFSRV performssome SNA protocols fo the sessions thathave been established. It d,iffers, however, fromthe other servers in that the IBM applicationinitiates the connection. After the IBM applica-1tion session has been eStablished, HCFSRVreceives the VMS host name and establishes aIDECnet logical link with that node. HCFSRVthen continues to provide SNA protocol supportIafter the session to the VMS host has been established.This server can haddle multiple sessions. Ifrom the IBM network, but the number of sessionsis limited by the arriount of buffer spaceavailable.riIThe Gateway Hardw4reThe DECnetjSNA Gateway [software runs on twohardware configurations': a PDP-1 1 j24 withRX02 disks, DMRlls for (the DECnet connection,and DUPl ls for the sNA connection; and1the <strong>Digital</strong> Ethernet Communications Server(DECSA) . DECSA is the nerwork equivalent of acommunications controller, such as the DZ 11,I•DMF32, or DUPl l. A serv(lr is a shared resourcefor the hosts in an Ethernet andjor wide areaInetworks connected to an 1Ethernet. The seerperforms specific communications functions for, Ithese hosts. The hardware components arepackaged in a freestanding, table-top unit withself-contained power and cooling; it can operatein an office environment oi- in a computer room.IAt start-up, the unit performs a brief self-test.Then the appropriate senter software is downlineloaded from a Phase W DECnet host on the• Isame Ethernet, and the unit begins operations asa DECnetjSNA Gateway.SummaryWe ·have enumerated the! many diverse issuesthat need to be addressed as part of a networkIinterconnection process. This process is, to saythe least, a complex one. An effective networkinterconnection scheme Ican result only froman effective architectural! and implementationprocess. 1·Numerous aspects of cross-network interconnectmust be considered i'f the final result is tomeet the end-user's ne ,eds. The followingaspects should be considered.IIIIIIiI51New Products


The DECnetjSNA Gateway Product• One must clearly understand the properties ofall architectures to be interconnected to determinethe most effective level of interconnectionbetween them from a base services standpoint.• In implementing the interconnect, one musttake consistent approaches that take intoaccount both the turnkey functions to beimplemented as well as end-user requirementsconcerning those functions. (For example, is iteffective to split functions across multiple systems,and if so, what are the benefits?)A modular approach that uses effectively bothhardware and software "building blocks" is alsoimportant for reliability, maintainability, andreusability considerations. Thus it is as importantto provide a modular implementation consistingof known, proven software segments as it is touse a framework that allows "mixing and matching"pieces to facilitate the development of newfunctions for various base systems. Once a structurehas been defined, the turnkey functionsthemselves must be of a bidirectional nature,allowing users in one environment equal accessto the resources 'of the other (provided, ofcourse, they are authorized to do so) .Coincident with all this functionality is theneed to manage it effectively, either from a centralizedpoint in the network or at the distributedpoints closest to the actual work being done. Aninterconnect structure must be chosen thatallows the continuing use of existing mechanismswith convenient "hooks" to access otherenvironments, if needs so dictate. Finally, theapproach chosen must be flexible enough toallow existing structures to migrate convenientlyto more cost-effective technologies as theybecome available, all without disrupting the userinterface. The preservation of existing userinvestment must always be a key concern.All these goals were met in the existingDECnetjSNA Gateway product set. Our approachis the.result of a carefully considered structure,not of an ad-hoc collection of functionality. Thatstructure facilitates the rapid development ofnew functionality today and preserves existingapplication investments for the increasingly distributedprocessing world of tomorrow. Weexpect these products to be key components ofthe network that eventually becomes thesystem. ·AcknowledgmentsThe authors would like to acknowledge the contributionsof the following members of <strong>Digital</strong>'sIBM Interconnect Engineering team, withoutwhose diligent effort much of our successwould not have been possible: Scott Davidson,Dave Garrod, Bob Fleming, and LadanPooroshani of the Littleton team; Bob Ellis fromColorado Springs; Richard Benwell, Carol Charlton,and Chris Chapman of the Reading, U.K.,team; and Craig Dudley of Systems and CommunicationsSciences. Their , leadership efforts havetruly resulted in leadership products.References1. V. Cerf and P. Kirstein, "Issues in Packet­Network Interconnection,'' Proceedingsof the IEEE , vol. 66, no. 11 (November1978) : 1386-1408.2. Systems Network Architecture: <strong>Technical</strong>Overview (Armonk: IBM Corporation,Order No. GC30-3073, March 1982).3. Systems Network Architecture: TransactionProgrammer's Reference Manualfo r Logical Unit Type 6. 2 (Armonk: IBMCorporation, Order No. GC30-3084, May1983).4.Systems Network Architecture: Sessionsbetween Logical Units (Armonk: IBM Corporation,Order No. GC20-1868, April1981).5. <strong>Digital</strong> Network Architecture Phase IVGeneral Description (Maynard: <strong>Digital</strong>Equipment Corporation', Order No. AA­N149A-TC, May 1982) .6. J. Morency, "The SNA Gateway-The Foundationfor the Information Bridge," Proceedingsof INTERFA CE '83 (March1983): 146-154.7. ]. Morency and R. Flakes, "Gateways: AVital Link to SNA Network Environments,''Data Communications Oanuary 1984):159-166.8. DECnet <strong>Digital</strong> Network ArchitecturePhase IV Network Management FunctionalSpecification (Maynard: <strong>Digital</strong>Equipment Corporation, Order No. AA­X437A-TK, December 1983).52<strong>Digital</strong> TecbnlcalJournalNo. 3 <strong>September</strong> 1986


New Products9. J. Forecast, J. Jackson and J. Schriesheim,''Communications Executive ImplementsComputer Networks," Computer Design(November 1980): 71-75.Other ReferencesR. Bradley, "Interconnection Draws DEC, IBMNetworks Closer," Data Communications (May1985): 241-248.D. Korf, "New Ways of Communicating withIBM: A User View," Proceedings of INTERFA CE'85 (March 1985) 241-248.J. Martin, Design and Strategy fo r DistributedProcessing , (Englewood Cliffs: Prentice Hall,Inc., 1981).Systems Network Architecture Format and Protoea/References Manual: Architectural Logic(Armonk: IBM Corporation, Order No. SC30-3112-2, November 1980).<strong>Digital</strong> <strong>Technical</strong> journalNo. 3 <strong>September</strong> 198653


WiUiam R. HaweMark F. KempfAlanJ. Kirby IThe Extended Local AreaNetwork Architectureand LANBridge 100A study was conducted to identify the wide variety of application needsand environments fo r broadband local area networks. This study concludedthat no single local area network in isolation was capable of completelysolving the broad range of networking problems of interest. Theproject team investigated alternative ways to provide solutions to theseproblems, including various local network technologies and interconnectionschemes. From this investigation the team developed the ExtendedLAN Architecture, capable of incorporating a variety of LAN technologies.Using this architecture, the team designed a high-performance implementationof an Ethernet-to-Ethernet bridge, which led directly to the !AN­Bridge 100, a product satisfying the original goals.In early 1982, <strong>Digital</strong>'s Networks and Communi·cations Group in conjunction with CorporateResearch initiated an advanced developmenteffort called the Broadband Project. The project'soriginal goal was to recommend which broadbandproducts should be implemented duringthe next two years and which technologiesshould be contained in those products. Therewere several motivations behind this goal.First, there was significant uncertainty withincomputer companies, including <strong>Digital</strong>,about the most appropriate physical medium forlocal area network (IAN) products. At that time,<strong>Digital</strong> had - and still has - a strong commitmentto the Ethernet concept using basebandcoaxial cable. It was clear that while most applicationswere served very well by an Ethernetusing baseband coaxial cable, some applicationswere better served by other media, such as CATV,fiber-optic, or twisted-pair cables. The increasingnumber of installations using private broadbandtechnology, with its moderate bandwidth,led the team to focus on this technology.Second, the DECOM broadband Ethernetmedia access unit and related products wereunder development within <strong>Digital</strong> at that time.Therefore, an effective mechanism for interconnectingbroadband and baseband products wasneeded. There was a clear need to have LANproducts that could intemperate, at least at thenetwork level .Third, the project team agreed that some IANapplications would require significantly morephysical extent than could be offered with eitherthe baseband or broadband Ethernet products.Therefore, some means of offering a greaterextent was required. As will be shown later, theresults of the Broadband Project were very differentfrom the ones that had been anticipatedwhen it was initiated.Defining the ProblemThe project team first proceeded to investigatethe user environments in which these networkswould be utilized. There were three types ofenvironments ofconcern to the project: the businessoffice, the university campus, and the factory.Clearly, assumptions about these environmentswere not mutually exclusive, but thenames evoke the problems to be solved in eachone. The next step was to gather more input oncustomers' requirements, applications, andphysical environments.Some information had already been collectedby team members on previous visits to customersites, including a heavy-manufacturing facility54<strong>Digital</strong> Tecbnk4ljournalNo. 3 <strong>September</strong> 1986


lNew Productsand a university campus. This informationbel ped the team to construct a refined set ofquestions to be asked on visits to other customers.Subsequently, the team visited two moreuniversities and several commerCial sites wherecontinuous process monitoring and control, andresearch were performed. The team also examinedone of <strong>Digital</strong>'s sites that represented anextensive office environment.The team discovered several generalized typesof message traffic that were characteristic of theapplications studied. These types were terminalto-computer,computer-to-computer, and realtimetraffic.1 Unfortunately, most customerswere unable to deliver actual network loads andtraffic matrices for their environments. Therefore,the team had to derive models for thosegeneralized types of traffic, using previous measurementsof internal workloads and some educatedassumptions. These models were subsequentlyused to evaluate several architecturesoffered by the team as candidates to meet theproject's goals.The environmental model for each traffic typeshows particular characteristics. The terminalto-computermodel has a large number of terminals,all needing access to a small or moderatenumber of host computers. Although the aggregatethroughput is small, the traffic is bursty. Inaddition , the cost to connect each terminaldevice to the network must be small (i.e., notlarge compared with the cost of an inexpensiveterminal).Computer-to-computer traffic needs full logicalconnectivity and has higher throughput (upto several megabits per second per computer)Table 1EnvironmentOfficeDefinitions of EnvironmentsExtentLess than3 kilometers<strong>Number</strong> ofStationsLess than130than the previous model. The traffic for thismodel is also bursty. Furthermore, the projectIteam thought that, as workstations and personalcomputers became commbn, this class of trafficwould soon become muth more widespreadthan terminal-to-compute traffic.The real-time environment is characterized bya large number of devict!s (thousands) whose. Irequirements to communicate are quite hierarchicallystructured. The applied load for a realtimeenvironment is more accurately modeled bydeterministic arrivals. Mdreover, most applicationsin this environment xpect the variance ofthe access latency to be lrlw in the IAN .The team next defined hominal environmentsfor an office, a campus, ana a factory. These definitionsare summarized inl Table 1.In these definitions, harsh and benign environmentsrefer to the envirorlmental characteristicsin which the IAN needs to I operate. For example,in a harsh environment one might expect a widerange of operating temperluures or the presenceof strong electromagnetic lfields.Added to the definitions were . a number ofIfacts that customers stressed or that were ofgeneral use to the projec. These facts were asfollows:I . .. .• Many customers had a variety of standard andnonstandard higher-led protocols runningon their LANs. Clearl, any solution had totake those existing protocols into account.• Despite using nonstaJdard protocols, cus-1tomers generally implemented their LANsIwith subsystems compiant with a standard,such as one of the IEEE 802 'tandards.PhysicalEnvironmentsBenignIIIFrequency ofStation MovementI OccasionalCampusLess than25 kilometersLess than10,000· Benign withina buildingPossibly frequentHarsh betweenbuildingsFactoryLess than8 kilometersLess than2200HarshRare<strong>Digital</strong> TecbnlcalJournalNo. 3 <strong>September</strong> 198655


The Extended Local Area Network Architecture and LANBridge 100• In addition to the valid technological andenvironmental reasons fo r choosing a particularLAN technology, some customers hadbased their choices upon faulty assumptions.This was particularly noted in discussions onthe delay variance of token-based systems invarious normal recovery modes.• The importance of performance-monitoring ·and serviceability features were emphasizedalmost universally by customers.At this point it was clear that the original projectgoal of investigating only broadband technologywas too narrow. Using broadband technologyalone could not satisfy the broad requirements ofthe environments identified by the team . Therefore,the team expanded its scope to encompassthe larger problem of providing a wide variety ofservices (terminal-to-computer, computer-tocomputer,and real-time) in the three environments(office, campus, and factory) .It was also clear that there were two fundamentalapproaches to providing those services.First, the team could attempt to develop a LANarchitecture, or enhance an existing one, that· could cope with the wide range of nodes, distances, media, performance, and cost constraints. Second, the team could attempt todevelop a mechanism for interconnecting thevarious LAN technologies.LAN Tecbnology Alternatives· The team decided first to evaluate a variety ofsuitable media access methods. Each alternativeand the conclusions reached by the team aresummarized below.Carrier Sense Multiple Access with CollisionDetection (CSMAjCDj 2This was the alternative most familiar to the teammembers, since <strong>Digital</strong> was currently buildingproducts utilizing CSMAfCD for both basebandand broadband media. The performance ofCSMA/CD does not degrade rapidly as a functioof the number of connected nodes (see Figure1). However, its extent (maximum signal propagationpath length), transmission rate, and minimumpacket size are not independent because offi nite propagation delays .3·4 Therefore , toincrease the physical extent of CSMA/CD LAN,the minimum packet size or the transmission rateor both must be decreased to ensure that thereare no undetected collisions.w 10 2...J-


New ProductsToken Passing Bus5The characteristics of the token-bus accessmethod scale reasonably well with physicalextent (see Figure 1), but poorly with the numberof nodes.3 This situation is complementary tothat of CSMAjCD. The sensitivity of a token busto the number of nodes makes it unsuitable for asingle LAN with many nodes. The token bus, likeCSMAjCD, is well suited to implementation on aCA1V-like cabie plant.Token Ring6The performance characteristics of the IEEE802.5 token ring are somewhat similar to thoseof the token bus. However, an IEEE 802.5 tokenring station will not reissue a token until the previouslytransmitted frame has circulated completelyaround the ring. This characteristicmakes the ring more sensitive than a token bus toincreasing physical extent. Moreover, a tokenring cannot be applied directly to a branchingtreephysical topology, such as the one in aCA1V-like cable plant.Slotted RingThe design tradeoffs made in most slotted ringsresult in small slots, usually less than 20 bytes.Therefore, it is important to minimize the slotoverhead, such as source and destinationaddresses and error detection fields. Such operationsare usually associated with connectionorientedservices, such as voice transmission. Inslotted ring networks, mechanisms are oftenpresent to impose a measure of "fairness" in thenetwork. Those mechanisms make it difficult foran individual station to acquire a significant fractionof the instantaneous transmission rate. Suchnetworks are often inadequate for handling thebursty traffic expected in the environments ofinterest.Time Division Multiplexed (TDM) BusThe principal disadvantages of using a TDMstructure are that the number of time slots isfixed, and each time slot is assigned to only onestation. Thus, with a large number of stations,even with low network utilization, the meanwaiting time is large. Furthermore, since the busis allocated in tum to each station, the maximumthroughput of any station is limited to the datatransmitted in that station's slot. The TDM bus iswell suited to isochronous traffic, such as voiceor video.Frequency Division Multiplexed (FDM) BusThe characteristics of an FDM bus are somewhatsimilar to those of the TDM bus. The FDM bus hasan additional degree of freedom in that it couldhave slots of different bandwidths. The problemwith the FDM bus, however, is logical connectivity.To have full connectivity, each node mustmonitor each frequency band for messages destinedfor that node. In practice, this monitoringis prohibitively expensive.1As an alternative, onecould apply a reservation' system to either theTDM or FDM buses. The dharacteristics of suchan approach, however, arejmuch better suited toa connection-oriented ser\rice, such as voice orIvideo, rather than one with bursts of data..II.Hy brid of FDM and CSM1fCDA hybrid scheme utilizing multiple slow-speed(approximately 1 million bits per second, orMbps) CSMAjCD channels, each in its own6-MHz band, is another specific alternative thatwas considered. Without increasing the minimumpacket size used in an Ethernet, eachCSMA/CD channel can span an extent of approximately30 kilometers. Multiple CSMAjCD channelscould be used to increase the aggregatecapacity of the network. Unfortunately, logicalconnectivity cannot be achieved without somemechanism for switching packets between thesechannels. Furthermore, th bandwidth availableto any station is limited to a rate of 1 Mbps. Sincethere is no industry standard for a 1-Mbps, 6-MHzCSMAjCD LAN, selecting;this approach wouldmake necessary an attemp t to standardize it.These evaluations con}rinced the team thatnone of these access methods sufficed for buildinga single LAN capable b f successfully operatingin all dimensions of interest to the project.Not one of these alternatives was capable ofdirectly employing all the types of media that thecustomers wished to utilize. Furthermore, anychoice was constrained , by the desire for anaccess method with a defined standard havingthe appropriate parameters. The project teamwould have to find a way to interconnect at leasta subset of the standard LANs if the project wereto be successful.LAN Interconnection AlternativesThe team next investigated a variety of interconnectionmethods, each of which had certainadvantages and drawbacks.<strong>Digital</strong> TecbnlcalJournalNo. 3 <strong>September</strong> 198657


The Extended Local Area Network Architecture and LANBridge 100DECnet RouterThe architecture for DECnet Phase IV+ could beused to create a DECnet network for the interconnection.Such a network would be fully capableof handling the number of nodes needed inany of the three environments of interest. In factthis was quite an attractive alternative . Onecould choose the data links in such a network tooptimize the cost and performance. For example,Ethemets placed in local areas could offergood response at low and moderate networkloads. Then a token bus, with its capability tohandle high utilizations and large extents, couldbe used as a backbone to interconnect therouters. Those would in tum connect to the Ethemets.The sensitivity of the token bus to largenumbers of nodes would be minimized since theonly nodes on the token bus would be therouters. Unfortunately, not all customer nodesuse the DECnet routing protocol, making thisalternative useful for only a subset of the nodesin a network.Central SwitchTo complete the logical connectivity of a networkcomposed of multiple LANs, the team consideredan architecture organized around a centralswitch element. Conceptually, the switchcould be connected to all LANs in the networkand then selectively forward packets to the LANwith the destination end station. This alternativehas most of the advantages of the DECnet routersolution discussed above. Normally, however, allend stations need the same routing protocol. Toavoid this problem the switch must either supporta variety of routing protocols (and translateamong them) or somehow perform its switchingtask in a way transparent to the end stations. Asingle switch of sufficient capacity and reliabilityto do either task was likely to be fairly complexto design and manufacture_ It would alsoneed to scale in a cost-effective manner for awide range of networks.Bridge 7 •8•9A bridge, or MAC layer relay, is a device connectingtwo or more LANs so that a node on one LANmay communicate with a node on another' just asif they were on the same IAN. In operation, abridge is a store-and-forward switch that isolatestraffic to only those IANs on which the trafficmust appear. For example, in Figure 2, trafficbetween nodes X and Y would not appear on theLAN to which node Q is connected. Bridges makeuse of data link layer addresses to make forwardingdecisions. A bridge receives all frames from aparticular LAN and then decides, based upon thedestination address in the MAC header, whetherto forward each frame.A collection of LANs interconnected by bridgesis referred to as an extended IAN. In general,bridges may be used to connect LANs of differenttypes, as shown in Figure 2. Therefore, this alternativecan successfully utill.ze diverse LAN technologies,if appropriate, to optimize some function(e.g., low cost, high performance, or acombination of these) . Furthermore, a bridgeappears to be merely another station on each LANto which the bridge is connected. Therefore,multiple IANs, each fully configured, can beconnected to eliminate their practical constraintson distance, number of nodes, media,and utilization.Based on the reasoning above, the teamselected the bridge alternative as the one bestsuited to realize the expanded goals of the project.Thus the team began to develop an architecturalspecification for extended LANs andbridges. The team also began to develop a workingbreadboard model that eventually led to theIANBridge 100 product. The architecture thatwas evolved for extended IANs and bridges isdescribed in the next section. lOCALAREA NETWORKSFigure 2Bridged Network Configuration58<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New ProductsAdvantages of BridgesBridges used to connect IANs have several usefulproperties:• Traffic Filtering - Bridges isolate each IANfrom traffic that does not have to traverse thatIAN. For example, in Figure 2, traffic betweennodes A and B is not sent on the IANs to whichP and Q are connected. Because of this filtering,the load on a given IAN can be reduced,thus decreasing the delays experienced by allusers on the extended IAN.• Increased Physical Extent - IANs are limitedin physical extent (at least in a practicalsense) by either propagation delay or signalattenuation and distortion. Being a storeand-forwarddevice, a bridge forwards framesafter having gained access to the appropriateIAN via the normal access method. In thisway the extended LAN can cover a largerextent than an individual IAN. The penalty forthis coverage is a small store-and-forwarddelay.• Increased Maximum <strong>Number</strong> of Stations -Because of either physical layer limitations orstability and delay considerations, most IANarchitectures have a practical limit on thenumber of stations on a single IAN. Since thebridge contends for access to the IAN as a singlestation, one bridge may "represent" manynodes on another IAN or an extended IAN.• Use of Different Physical Layers - Some IANarchitectures support a variety of physicalmedia (baseband and broadband coaxialcables, and optical fiber cable) that cannot bedirectly connected at the physical layer.Bridges allow these media to coexist in thesame extended IAN.• Interconnection of Dissimilar IANs - IANs ofdifferent architectures are typically interconnectedwith routers or gateways. Often thesedevices are complex with only moderatethroughput, not an appropriate situation for aLAN environment. It is possible to build abridge connecting dissimilar LANs (withinconstraints discussed in the section ."PerformanceConsiderations"). For example, such abridge would allow stations on an IEEE 802.3(CSMAjCD) IAN to send frames to stations onIEEE 802.4 (token bus) or IEEE 802.5 (tokenring) LANs. 2• 5•6The Extended LAN ArchitectureGeneral GoalsAn ideal extended IAN should possess a nuinberof characteristics that translate into design goalsfor the architecture. These design goals are asfollows:• Minimize Traffic - The primary traffic on theindividual LANs should be generated by theuser stations. Traffic due to complex routingalgorithms should be eliminated or at leastminimized.• No Duplicates - The· bridges should notcause duplicate frames to be delivered to thedestinations during noral operation.• Sequentiality - The col:nbination of LANs andIbridges should not permute the frame orderingas transmitted by thb source station. ·I• High Performance --+ The extended LANshould preserve the cl!taracteristics of highthroughput and low dely that users expect inLAN environments. In !practice, this meansthat the bridges shoulp be able to processframes at the maximum rate they can bereceived. Since LANs operate in the multimegabit-per-secondrange, fulfilling this goalrequires a fast switching operation.• Frame Lifetime Limit - Frames should not beallowed to exist in the extended IAN for anunbounded time. Some higher-layer protocolsmay operate poorly if frames are undulydelayed. This fact is especially true for protocolsused for interactive applications. Theseprotocols depend on the low delay characteristicsof a LAN. An example is the Local AreaTransport (LAT) protocol.10• Low Error Rate - LANS typically have a loweffective bit error rate. igher-layer protocolsare often designed to take advantage of thisIlow rate, which allows hem to operate mol"eefficiently since they cn assume that errorsIare infrequent. Extendetl IANs should not significantlyincrease this rror rate.I• Low Congestion Loss -; Individual LANs minimizecongestion by employing access controlschemes that prevent excessive traffic fromentering the LAN. Extended LANs are morevulnerable to congestion loss since thebridges may be forced to drop frames whenDtgital Tecbnical]ournalNo. 3 <strong>September</strong> 198659


The Extended Local Area Network Architecture and LAN Bridge 100the ones queued to be transmitted match theavailable buffers. This phenomenon should beminimized by designing the bridges properlyso that they are not bottlenecks. It will also beminimized by configuring . (placement and sizing)the extended IAN properly.• Generalized Topology - To increase theavailability of the extended IAN, it would beuseful to allow arbitrary interconnections ofIANs by means of bridges. This interconnectionallows duplicate bridges and IANs to beconfigured in parallel, thus increasingavailability.Specific GoalsAlthough the ideal goals described above servedas a framework for development, other specificgoals were formulated for the architectureitse1f. I ,7,B,9• No modifications should be needed to stationsthat adhere to the existing IEEE 802 standards.2·5·6 Therefore, the extended IAN will betransparent to the end stations. This goal simplifiesthe end-station hardware and softwaredesigns.• The interconnection of all IEEE 802 MAC protocolsmust be accommodated.• Automatic recovery from state changes in theextended LAN , including LAN , bridge, andend-station fa ilures, must be accomplished.• Connectionless and connection-orientedIEEE 802.2 logical link control (LLC) protocolsshould be supported efficiently. Theextended IAN should also be independent ofall higher-layer protocols. Such independenceis needed to suppon the diverse set of protocolsthat wi ll exist. ll, I2, I3• A bridge should not require explicit notificationof station location by the end stations or amanagement entity. The bridge should learnautomatically of the stations' locations withoutcommunicating with other bridges. (Thebridges do communicate with each other tomaintain the logical tree topology. This communicationis independent of the activities ofend stations.)• Management intervention should not berequired to make the network operational; thebridges should autoconfigure. For example, it·should be possible to simply plug the IANsinto a bridge , then apply ac power to thebridge for the network to operate. No commandsfrom a network manager should berequired to achieve normal operation.• Growing from a single-segment LAN to anextended IAN should be accomplished withoutprior planning. The architecture mustprovide simple, efficient mechanisms (networkmanagement, etc.) to manage growtheasily. This goal means that the owner of a IANdoes not have to anticipate that he will, atsome time in the future, install bridges andmore IANs to build an extended IAN. Thus hisIAN can become an extended IAN without hishaving to plan the ultimate configuration ofthe extended LAN when the first LAN isinstalled. This fact is important since experiencehas shown that networks never growaccording to the plans made at the outset.• Predictable, stable performance, includingpredictable route selection for a given topology,should be provided under normal andfailure modes. In addition, to make diagnosis,maintenance, and management of the networkeasier, the routing algorithm should be deterministic.For example , it should compute agiven topology based on the current state ofthe network . If that state changes , thealgorithm will compute a new topology. If thenew state now changes back to the originalstate, the algorithm should produce the originaltopology, not a completely new one. Withoutthis feature it is very difficult to reproducefailure scenarios when diagnosing the networkor to plan the network predictably.• No overhead should be required in the endstations to communicate with stations on thesame or different IANs.• No overhead should be imposed on stations asa penalty for communicating with many panners(such as file servers or gateways to otherextended IANs) .• End-station and bridge MAC addresses can beassigned with any policy (global, local, flat,hierarchical, etc.) desired by the users. Thearchitecture should require only that each endstation and bridge have a unique addresswithin the extended LAN . If addresses areassigned globally, then the extended LAN60<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New Productsshould have the added advantage of requiringno management intervention when previouslydisjoint extended IANs are merged.• A general mesh topology including backupbridges and IANs should be supponed transparentto the end stations for increased reliabilityand availability.• End-to-end data integrity should be providedacross the extended IAN for all normal andmulticast/broadcast frames when the MACsare the same type. This capability holds acrossany connected subset of the topology that isthe same MAC type. Thus the frames are notmodified and the MAC frame check sequence(FCS) has end-to-end significance for theextended IAN.Architectural OverviewWe used the goals above to develop a uniquearchitecture for the extended IAN concept. Thisarchitecture is described in this section. Followingthat description are sections on bridge performanceand resources. These are two important,but often neglected, topics that should beconsidered when specifying an architecture. Performanceis especially critical to the properfunctioning of products that are eventually builtto use the architecture. For the first time, performanceanalyses were included as an integral panof the architectural specification.The algorithm used in the bridge is very simple.The algorithm maintains an associationbetween the end-station MAC address and theMAC entity on the bridge through which that stationhas been observed. The associations arestored in a table, also called a forwarding database,in the bridge . The bridge maintains thattable by observing traffic on the IANs to whichthe bridge is attached, operating in "promiscuous"mode. In this mode the bridge monitors allframes that appear on the IAN. For each framereceived , the bridge notes the source MACaddress and the MAC entity on which the framewas seen. The bridge also searches in the tablefor the destination MAC address. If that address isfound, the frame will be forwarded on the MACentity indicated in the table. Of course, if theindicated MAC entity is the same one thatreceived it, the frame will be dropped since thedestination is known to be on that "side" of thebridge. If no association is found, the frame willbe forwarded on each MAC entity except the onethat received the frame.Frames with group addresses (i.e., multicastaddresses) are always forwarded on each MACentity since the bridge has; no way of knowingwhich end stations should receive the framesthat are addressed to grdups. This concept,called protocol regionalization, can effectivelylimit the propagation of these messages throughthe extended IAN, thus allowing cenain applicationprotocols to be confined to variousregions.14 This confinement is done for reasonsof performance, management convenience, andprivacy.· IThe table is simply a cace of station addressto-MACentity associations! for stations that arecommunicating. As with any caching scheme,the problem of stale data exists. Therefore, thetable entries are aged out on a time scale that islong enough to minimize. overhead, yet shortenough to capture station rp.ovements.The algorithm learns the location of end stationsdynamically and assumes that few of themIsimply receive traffic without ever sendingIreplies. If the station location is not known, thenframes directed to it are forwarded on all MACentities. Our experience shows that in a typicaloperation only one frame from a station isrequired for most, if not all, bridges in theextended IAN to learn the stion's location. Typicalhigher-layer protocols more than satisfy thisrequirement. )The initialization phase qf a bridge is specifiedin a panicular fashion. The ' bridge is powered onand then passively observes· the traffic on its MACentities for a number of seconds. During thistime the bridge accumulates associations in itsforwarding database, after which it comes on lineand begins forwarding oprations. This initialpassive learning period prdvents the bridge fromunduly flooding the extened IAN with framesdestined to stations it hasn't yet heard from. Aswith all parameters in the algorithm, the durationof this learning period during power-up isnot critical. It should simply be long enough towitness frames from a large percentage of theactive stations.As specified so far, the algorithm will not modifyframes as they are passed through a bridgebetween IANs of the samejtype. This restrictionprovides the additional benefit of end-to-end FCScoverage for normal and broadcast frames within<strong>Digital</strong> TecbntcaiJournalNo. 3 <strong>September</strong> 198661


The Extended Local Area Network Architecture and LANBridge 100the extended LAN. When forwarding betweendissimilar MACs, the LLC protocol data units(PDU) are extracted from MAC data frames andforwarded on the next MAC. An FCS for that MACis computed normally by the bridge. Bridges ofthis type should guard against errors in memoryor on datapaths. Also note that the IEEE 802.5token ring may require byte reordering, which,however, can be dealt with at the controllerinterface.Because the bridge does not modify frames,there is an inherent mechanism for loop detectionencoded into them. Conventional networklayer routing algorithms detect loopswith hop counts or other frame lifetime (age)controls, losing transparency in the process.ll· 12• 1 3 The Extended LAN Architecturerestricts the logical topology to a tree that preventsloops from occurring, while preservingtransparency.It is desirable to maintain proper operationof the extended LAN if it is misconfigured.Therefore, we designed an algorithm to automaticallyand transparently transform a generalmesh topology into a spanning tree, thuspreventing packet looping.15 This algorithmalso allows redundant bridges and links to beused as backups, thus increasing the availabilityof the network. Availability is very importantsince the extended LAN is quite of the basis formuch, if not all, of the communications. Thisalgorithm was implemented in the LANBridge100.The spanning tree algorithm imparts the followingcharacteristics to an extended LAN:• A spanning, acyclic subset of a general meshtopology is maintained.• A very small, bounded amount of memory. per bridge is required, independent of thetotal number of LANs or the total number ofbridges.• A very small, bounded amount of communicationsbandwidth is required on each LAN ,independent of the total number of LANs orthe total number of bridges.• Lost messages are tolerated and the broadcastnature of multiaccess LANs is utilized efficiently.• Participation by the end stations is notrequired.• The computed topology converges in a maximumof twice the round-trip delay across theextended LAN.• The computed topology is deterministic,meaning that it can be calculated deterministicallyby the network designer.• Bridges implementing this algorithm cancoexist with simpler bridges not implementingit. Loops will still be broken, providedthat no loop exists composed solely of bridgesthat do not implement the algorithm.• Duplicate packets are not generated whenredundant bridges are used for backup.• An effectively unlimited number of bridges issupported. The only practical limit is the performancecharacteristics one wishes to havefor the extended LAN. We size the extent ofthe network based on the delay and throughputcharacteristics it achieves, not on arbitraryrestrictions. An example based on modelsdeveloped of the IAT protocol is given later inthis paper.• No a priori knowledge of the topology isrequired.• Optional user-defined "primary" routes or"backup" bridges are permitted; otherwise,the routing is automatic.Performance ConsiderationsThe performance of an extended LAN is determinedby a number of design parameters, includingthe expected capacity of the backbone andsubnets, the overall system capacity, the appliedload, apd the frame loss rates. The designer mustbe concerned not only with providing adequateperformance for current usage but also withallowing future growth .Ideally, a system is designed to be sufficientlyrobust to accommodate changes in its user populationas well as its characteristics. For example,studies have been conducted to measure userworkloads in a program development environment.1 6 The study results could be used directlyto estimate the applied load due to some numberof those users. To do that, however, requires thatthe designer also model all the protocol layersinvolved in transferring this information acrossthe extended LAN.It is important to size the capacity of anextended LAN. Given the characteristics of user62<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New Productsdemands, capacity is expressed in terms of thenumber of users supported. The difficulty withusing this number as the independent variable isthat the designer must account for the resourceconsumption from all layers of the protocol. Thatis generally hard to do .Another difficulty is that the performancerequirements may vary for different higher-levelprotocols. Some may be delay sensitive , Forexample, terminal access protocols that returnechoes end to end are quite sensitive to delay.Other terminal access protocols that allow localediting and echoing are not so delay sensitive.File transfer protocols are not sensitive to thedelay but require high throughput. Therefore, ·to determine the capacity ofan extended LAN,the .designer must investigate both delay andthroughput as applied to the requirements ofparticular protocols and applications that use theLAN.Certain LANs constrain the configuration,owing to either physical layer limitations (suchas the distance over which the line drivers canoperate) or the interaction between the accessmethod of the data link layer and the propagationdelay. For example, Ethernet places a limit onthe maximum number of repeaters betweenany two communicating stations. l7·18 This constraintassures that the propagation delay budget,which is assumed by the access method protocol,will not be exceeded in any configuration.In an extended LAN, the designer may also wishto constrain the configuration based on theperformance expectations of higher-layer protocols. For example, he may require that therebe no more than a certain number of bridgesbetween two stations that use a delay-sensitiveprotocol. In general, the constraints are morecomplex when an extended LAN is configuredwith dissimilar LANs since the individualLANs may provide diffe rent delay /throughputcharacteristics.Another problem when determining capacityis estimating the amounts of traffi c remaininglocal to a subhet and leaving that subnt. Theworst case occurs when all traffic must be forwardedfrom a subnet through one or more levelsof backbone, thus creating the largest demand onthe resources of the backbone. One way thedesigner can handle this situation is to assumethat all the locally generated traffic must also becarried by the backbone. Increasing the load willthen define the system saturation point at which·the resources of the subnets will likely be underutilized.The additional capacity of the subnetcan then be used only for local traffi c. This calculationdefines the limits for the system withrespect to the ratio of local ;traffic to total trafficthat is possible.Using the above principles we developeda capacity-sizing methodplogy for extendedLANs . The diameter of ah extended LAN isIsized in the following fa shion. The averageone-way delay across the l ongest path cannotexceed 10 milliseconds, chosen as the delaybudget based on analyses of higher-layerprotocols that ate delay sensitive. One such protocolis the LAT protocol used for terminalaccess. 1 0A detailed simulation of that protocol was usedto study different configurations and values ofan average delay budget. The 1 0-milliseconddelay budget allowed for ariance in the delayand kept the protocol operation in the normalstates (without timeouts, ec.) . In addition, theoperating point must be sei: so that none of thelinks in the extended LANI run at greater than90 percent utilization . (Note that this utilizationmay occur at an offered lo :i d of much less than90 percent.) On an Ethernet this limit occurs foroffered loads of anywhere from 4 5 percent to90 percent utilization. The difference betweenthe utilization and the offered load is the overheadon the link. On an Ethernet this differenceincludes delays caused by collisions; on tokenrings it includes delays for token passing and thelike.This methodology assures, that the componentlinks in an extended LAN: are all running instable operating regions, and that the delay isIsimilar to that on a single LAN. The fulfillmentof these conditions is impohant so that the per-'formance expectations of hgher-layer protocolsare still met. Depending on the type of LANs usedin an extended LAN , the number of bridgesallowed (in series) will b ' e different. TokenaccessLANs often have higher average delaysthan Ethernet LANs. These delays could consumesome of the delay budget, which averages 1 0milliseconds. In the case of all Ethernet links, thenumber of bridges allowed in series is somewherebetween seven and nine. Further discussionof the performance aspects of extendedLANs may be found in the paper "Performance ·Analysis and Modeling of <strong>Digital</strong> 's Networking11Architecture. "19·<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 198663


The Extended Local Area Network Architecture and LANBridge 100Bridge ResourcesThe major resources of concern in a bridge arethe buffering required to store and forwardframes , the table space for the forwardingdatabase, and the CPU cycles to executethe algorithm. Note that CPU cycles are alsorequired to perform network management. Typically,any bridge implementation must guaranteethat network management commands are eventuallyexecuted . For example, suppose a bridgewas heavily loaded because of a slow outboundIAN. A network manager wanting to disconnectthat bridge may be unable to do so if all receivedframes are being dropped because of buffer congestion.Therefore , one important aspect ofimplementing a network management architectureis that some amount of buffering must bepreallocated to handle those messages . Moreover,scheduling must be accomplished so thatthe network management process in the bridge isguaranteed to make progress . This guarantee is amatter of correctness and therefore should bestated in any effort to make the architecture astandard.Buffers are also required to hold frames whilethey are waiting to be either processed or fo r­warded. As depicted in Figure 3, bridge can bemodeled as a queuing system in which the servicecenters represent the forwarding processand the outbound LANs. Congestion can occur atthree places:1. Upon reception, owing to the lack of receivebuffers2. After reception, owing to queuing for theforwarding process3. After the forwarding process, because of congestionon the outbound LANProper bridge design can solve the first twosources of congestion. The third problem, however,is a general one for bridges, routers, andany store-and-forward device. 20 There are severalways that the bridge designer can address thisproblem. We first make a general observationabout the required service rate of the servicecenters in a queuing network. Steady-state congestionat the forwarding process can be avoidedcompletely if the network can always make forwardingdecisions faster than the summation ofthe interarrival times of the smallest framesacross all the inbound LANs. The forwardingdatabase must be consulted for each frame onwhich a forwarding decision is made. There aremany ways to do that very efficiently.The table discussed earlier is really only acache of station address-to-MAC entity associations;a search of that table is required to locatean entry. If the table is ordered, then a binarysearch can locate the entry in question. There areother alternative search methods, such as segmentedhashing. The implementation of this pro-BRIDGE,-------------------------------------------IIIIIIBUFFERST. DISCARDL--------------------------------------------j..L...L-L....I,IIIIIIFigure 3The Two Port Bridge Resource Model64<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New Productscess is one of the key aspects of the bridge technology.This facet is covered later in the paper inthe discussion of the technology used in the IAN­Bridge 100 product.A final point with respect to caching is inorder. Further enhancements in performance canbe obtained by recognizing something about thenature of the traffic on IANs. Extensive measurementson token rings, Ethernets, etc. have uncoveredseveral important facts. These are related tothe nature of higher-layer protocol and applicationoperation. One is that, given that a framefrom station S and station D has just beenobserved on the LAN , the probability that thenext frame observed is either from D to S or alsofrom S to D is very high.21 Thus, if the bridgekeeps the last few associations it has obtainedfrom the database, it is very likely that the nextframe will use one of those associations. Keepingthem further reduces table access rates. Itamounts to a two-level cache .With these observations we now focus on congestionat the receive or transmit buffers. Congestionat the receive buffers can- be avoidedthrough proper machine organization. For example,a bridge using separate controllers for eachIAN, each controller having its own local buffering,will have to assure that sufficient bufferingis available to maintain stability in the queue(particularly during transient bursts of frames) .Frames will have to be moved (out of the controllerbuffers or shared mmory) into anotherbuffer to queue for the forwrding process. WithIrespect to bridge delay, ths time must also beincluded in the forwarding process. With respectto bridge throughput, the bottleneck server willdetermine the peak.Therefore, the only place any congestion willoccur in these bridges is at the outbound IAN.This congestion will occur if that LAN is notfast enough for the volurrle of traffic it mustcarry. This problem is an isse of IAN speed, notbridge speed. The philosophy is to designbridges so that they will not be bottlenecks.Most of these comments apply to any routingalgorithm and hold true whether a table or aframe must be searched. AQ.d they hold true for·all the MACs.Effect of Bridges on Ethernet LinksIBridges have several effects' on the performanceof CSMAfCD IANs. One effect is due to the filteringfunction that prevents traffic from entering asubnet that it need not traverse . Recall thatbridges operate above the data link MAC layer asshown in Figure 4. Preventing this traffic flowAPPLICATIONLAYERAPPLICATIONLAYERNETWORKLAYERIINETWORKLAYER------- ·--------------------------------------------------------------------j _______ -------BRIDGEBRIDGE FUNCTIONSDATA LINKLAYERPHYSICALLAYERPHYSICAL MEDIUMDATA LINKLAYERDATA LINKLAYERPHYSICAL PHYSICALILAYERLAYERPHYSICAL MEDIUMpiDATA LINKLAYERPHYSICALLAYERFigure 4Bridges and Data Links<strong>Digital</strong> <strong>Technical</strong>journal 65No. 3 <strong>September</strong> 1986


The Extended Local Area Network Architecture and LAN Bridge 100reduces the applied load on the LAN , thusimproving performance for the local users.Another effect is more subtle. Consider aCSMAfCD system with an extent of D meters andN stations distributed uniformly over that extent.Without using bridges, all N stations have toshare the resources of that one LAN extendedover D meters. The delay and capacity are determinedby the applied load, as described above . Ifadded in the center of the system, the Ethernetwill be partitioned into two Ethernets, each withN/2 stations . Thus the collision windows oneach partition have been cut in half. The smallercollision windows cause less bandwidth to bewasted per collision. The net effect on the systemis not only to reduce the load applied to agiven Ethernet (through filtering) , but also toimprove the overall efficiency or capacity of thatEthernet since the extent it must cover is smaller.In effect, the Ethernet gets more efficient as theload applied to it is reduced. Given these factors,along with performance information characterizingthe behavior of the Ethernets under load, thebridge designer can investigate the performanceof bridged networks using these IANs.4 ,22The remaining section of this paper discussesthe IANBridge 100, which is an implementationof the Extended IAN Architecture .Development of the LANB ridge 100The bridge architecture was developed in parallelwith the first implementation of that architecture. An Ethernet-to-Ethernet bridge, wasdesigned as a prototype to demonstrate the usefulnessand practicality of the bridge concept.This was called the "Brooklyn Bridge ." The prototypehardware and software were operated in alaboratory environment. After some operatingfor a time, the prototype was installed in Tewksbury,Massachusetts, between an Ethernet and<strong>Digital</strong>'s Engineering Network (ENet) . This prototypeled to a full-scale product developmentproject, called Janus, that resulted in the IAN­Bridge 100.The prototype incorporated some , but not all,of the higher-level reliability and availability featuresof the final LANBridge 100. Many of thefinal features were incorporated by the productdevelopment groups as the final product designevo lved. Another feature added in developmentwas a fiber-optic Ethernet extension that allowsnetworks to be extended farther than is possiblewith the Ethernet cable alone .In the following sections, the design goals andprinciples of the IANBridge 100 are discussed,along with the trade-offs that had to be made inthe design.Design GoalsThe design of the IANBridge 100 was guided byone primary principle: The bridge characteristicsshould not cause the performance of theextended network to degrade . Network congestioncould cause such a degradation, but not thebridge. Therefore, the bridge had to have sufficientprocessing power to receive any possiblestream of incoming traffic and make the correctforwarding decisions. If some or all traffic is tobe forwarded, the bridge must queue it for forwarding.If the outbound Ethernet is congested,however, it may be impossible to fo rward thepackets and some or all may eventually have tobe discarded. This is a problem of network utilization,however, and not bridge design.Although a bridge must discard packets duringperiods of prolonged congestion, it should notdiscard traffic during periods of transient congestion.Therefore, the bridge must provide sufficientbuffering to prevent packet loss during thetraffic peaks occurring in any properly operatingIAN. When the individual Ethernets are operatingclose to saturation, any IAN 's performancewill be generally unsatisfactory regardless ofthe presence or performance of bridges. When aLAN is operating in the range in wh ich goodperformance can be expected, transient congestionmay still occur. In this case, packet lossin bridges can be avo ided with a boundedamount of buffer memory, an amount that can bepredicted.Since bridges can be installed in series, theissue of store-and-forward latency becomesimportant. In varying degrees, higher-level protocolsare intolerant of delay; therefore, storeand-forwarddelay must be kept to a minimum.Ideally, this delay should be equal to the packetreception time; however, some additional time isneeded to make the forwarding decision. Even ifthe decision process were partially overlappedwith the packet reception , this decision couldnot be made until after the frame checksequence (FCS) had been received at the end ofthe packet. The nature of the applications inwhich bridges were expected to be used led usto choose 100 microseconds as the maximumlatency for minimum-sized packets . Of course,66<strong>Digital</strong> TecbnicalJournalNo . 3 <strong>September</strong> 1986


New Productslonger packets will have a greater store-and-forwarddelay, proportional to their length.A bridge must be able to store informationabout the location of the stations attached to itsLAN. If insufficient room exists to store all thestations in the station database, the bridge mustforward messages for any station that did not fit.This situation leads to inefficiency in the operationof the LAN, since traffic may be forwardedunnecessarily. Based on trends in network applications, we judged it unlikely that a singleextended LAN would grow to over 8000 simultaneouslyactive stations within the lifetime of thisproduct. Therefore, the storage limit was set at8000 station addresses.In addition to providing a low packet loss rateand efficient filtering, a bridge should not contributesignificantly to the data error rate of theextended LAN. Even more importantly, an undetectedbit error corrupting a packet while it is ina bridge should be detected at the destinationstation. This detection can be ensured to a highprobability by forwarding the received cyclicredundancy check (CRC) instead of generating anew CRC in the bridge.It is essential that a bridge be reliable andavailable, since it is as important in an extendedLAN as in the Ethernet cable itself. Therefore, theLANBridge 100 had to power up and operate correctlywithout the intervention of any person orother machine. Since a bridge is important to theproper operation of an extended LAN, a mechanismto ensure high network availability was alsorequired. Parallel standby bridges provided thatmechanism. The bridges also had to be able todetect and correct for many unworkable configurations,such as looping topologies, that mightresult from installation errors.Although a bridge must operate without intervention,a network manager should be able toobserve parameters and counters associated withthe bridge's operation. He should also be able toalter some of those parameters. A centrallylocated bridge is an ideal place from which toobserve activity in order to isolate faults andgather information. That information can then beused to make decisions about changes andenhancements to the particular network configuration.Of course, all these goals should be met with acost as low as possible. Although a bridge providesmany valuable features, it neverthelesscompetes with single Ethernet cables, withirepeaters, and with routers. Jo be successful, theLANBridge 100 had to be perceived as having acost/performance advantae relative to thoseother options.IDesign PrinciplesDesigning the LANBridge 1 00 hardware startedIwith the premise that a! general-purposemicroprocessor is often th most cost-effectivemethod for implementing sveral complex functionsin a single system. Microprocessors are flexibleand economical because the same set ofhardware logic can be used to perform many differentfunctions under program control . For anygiven technology, however, 'they are rarely as fastas dedicated logic. Therefore, the bridge wasdesigned to implement as many functions as possiblein microcode; special hardware logic wasused only for time-critical functions.With a sufficiently fast microprocessor, applyingthe principles above to the LANBridge 100requirements resulted in the high-level blockdiagram shown in Figure 5. This diagram is use,ful for understanding the general principles ofbridge operation and represents the first pass atthe LANBridge 100 design.Of course, this design: was based on thehypothesis that some available microprocessorwas fast enough to do all the required work inthe allotted time. In reality, orne hardware assistwas required. Moreover, the memory sizes andthe implementations of th Ethernet interfacesIhad not been considered. [fhese complicatingfactors are now examined ip more detail, alongwith the trade-offs that were required.1IProcessor and Suppor( LogicIA preliminary performance analysis of thebridge design in Figure 5 showed that all bridgefunctions, except associating network addresseswith forwarding informtion, could be handledby several available microprocessors. Assumingthat this exception could be. performed by externallogic, it was possible to consider other priceand performance requirements and select themost suitable processor.In this design, the microprocessor is directlyinvolved in making forwarding decisions, butwith hardware assistance. It also coordinates thepacket-forwarding process, although actual datamovement is the responsibility of the Ethernetinterface logic. Thus the microprocessor hassome stringent real-time requirements. Further-I<strong>Digital</strong> <strong>Technical</strong> journalNo. 3 <strong>September</strong> 198667


The Extended Local Area Network Architecture and LANBridge 100MICRO-PROCESSORlPROGRAMMEMORYIETHERNETADDRESSMEMORYPACKETETHERNETETHERNETME M ORY INTERFACE INTERFACEETHERNETLANETHERNETLANFigure 5High-level Block Diagrammore , it must have additional time available toperform other functi(;)ns, such as updating timersand running the spanning-tree algorithm to correctpossible faulty network configurations. Atpower-up or system reset, the microprocessormust verify via diagnostic code that the entiresystem is operating correctly.The design of the real-time code paths wasfairly straightforward. It was written in a detailedoutline form for a generic processor. That. allowed us to understand the requirements andselect a processor with enough power. Fromthis design , it was quite clear that a highperformancemicroprocessor was required. The10-MHz MC68000 chip from Motorola, Inc., waschosen based on its available power and attractiveprice.In this design, the microprocessor has a privatememory. Thus instruction and local-data accesswill not conflict with packet data flowing to andfrom the two Ethernet interfaces. Some of thismemory is ROM, · which contains all the codeneeded by the bridge to be fully functional onpower-up. The bridge also contains RAM, used asa writeable data area. There is a small amount ofnonvolatile RAM (NVRAM) , which stores systemspecificparameters that must survive power failuresand be available on the next system start-up.Ethernet InterfaceThe Ethernet interface is a complex function thatis implemented most economically in VLSI. Theinterface can be implemented at the board level ,but only at considerably greater expense. Since aVLSI implementation was clearly the most attractiveoption, we explored a number of alternativesources for it.Data integrity is one of the more importantconsiderations in designing a bridge . In particular,the bridge should not cause undetectabledata errors in a packet delivered to a destinationstation. This injunction implies that either thepacket memory in the bridge must have a verylow probability of error or the original CRC generatedby the source station must be forwarded ·with the packet. If the original CRC travelsthrough a bridge with the packet, then anypacket memory errors will be detected as transmissionerrors at the destination station.The only available chip set that allowed packetsto be transmitted without a recalculated andappended CRC was one made by Advanced MicroDevices Corporation. This chip set was calledLANCE (Local Area Network Controller for Ethernet). Although other considerations were important,this very important capability was thedeciding factor in our selection process.Network Address Look- upThe network address look-up mechanism is oneof the most interesting aspects of the LAN­Bridge 100 design. Upon receiving a packet, thebridge must locate the information associatedwith its destination address so that a forwardingdecision can be made. In addition, the sourceaddress must be added to the database unless it68<strong>Digital</strong> TecbnicaljournalNo . 3 <strong>September</strong> 1986


New Productshas already been added. Therefore , two 48-bitnetwork addresses must be located for eachpacket received. It cannot be assumed that the48-bit source and destination addresses found invarious packets have any known relationship toeach other. On the other hand, the addresses arelikely to occur in groups because each of the variousequipment manufacturers has been assigneda block of addresses. The look-up function mustoccur quickly,-since only a small ponion of thetime available for processing each packet can bedevoted to this. one function.There are several possible techniques for lookingup network addresses. One straightforwardapproach is based on a software search performedby the microprocessor. With this technique,the microprocessor fetches a networkaddress from a packet and then searches the database.Even with efficient search algorithms andthe fastest available microprocessor, however,this technique is much too slow, especiallywhen.the database is filled to capacity.A second.•approach, also based on software, isto use the source or destination address as anindex into a: table. This technique has the advantageof.being fast; yet, it is quite impractical ,since the table length would be almost 280 billion(248) entries.A third solution is hashing, which might be fastenough in software and could also be easilyimplemented in hardware. In this technique,48-bit addresses are transformed by an arithmeticfunction to a hash address with a smaller maximumvalue of the address . For example, ifthe maximum hash value were .216, a directtable look-up could be ;performed, using thehash address as an index. The disadvantage ofhashing is that the distribution of-networkaddresses is not known a pFiod; therefore, manynetwork addresses could translate to the samehash address. This duplication could resultin either unnecessary forwarding or incorrectfilterig.'Thus all three software solutions were unusable.It became clear that some type of hardwareassist was required. The most attractive hardwareassist from the -standpoint of speed and ease ofuse was content addressable memory (CAM) .Unfortunately, the available CAMs were bestsuited for use in cache memory applications,since they are small and faster than needed (thus,more expensive) . These CAMs also do not scale.wt Hrin width; for example, 8-bit wide CAMs can-not be easily used in parallel to form a 16-bit ora 48-bit wide CAM. .:IThe only feasible alternative remaining was toemploy a hardware-assisted search using economical,commercially avaHable memories fordata storage. Binary search w:as cb.sen as thesearch algorithm. This search technique is fastsince it requires at most log( n) probes, where nis the number of entries i n the table. Unfortunately,the table must be kept soned in numericorder. That is not a severe disadvantage, however,since the table can be soned in place withoutinterrupting search operations.In the IANBridge 100, tl;ie search function isperformed entirely in hardWare at the request ofthe .microprocessor. The microprocessor loadsthe search hardware, or binary search engine,with network addresses fetched from a packet.The engine then runs in parallel while the processor,doesother work. Aftr 3.9 microseconds,the microprocessor logic [ returns to read the·results of the ·search.Although-searching is a hardware function, themicroprocessor uses software to order the tableof network addresses. Reordering must be doneonly when new stations are, added or when inactivestations are removed. these events happenrelatively infrequently, and analysis and experiencehave shown that software is fast enough notto hinder operations. If there are several changesin a shon time · (for example, during the initiallearning period) , they are cached and added, atlower priority, to the search table.IPacket Memory SizeUpon determining that a packet received on oneEthernet should be forwarded to another Ethernet,the IANBridge 1 00 must queue the packetfor transmission. Since Ethernet is a shared channel,the bridge must prov i de buffering to storeall packets that might be queued while the Ethernetis busy with traffic from other users. Theamount of buffer memory must be large enoughto avoid excessive packet loss resulting frombuffer exhaustion, yet smll enough to be costeffective.1Over the long term, if the average traffic generatedby a bridge and other users on a single Ethernetexceeds the total capacity, only/an infiniteamount of memory will prevent packet loss. Inthis case latency will incease without bound.This situation is rather uninteresting from the\Standpoint of bridge design. The system user willDJglltll•TecbnicoljournalNo. 3 ·september 198669


The Extended Local Area Network Architecture and LANBridge 100have exceeded not only the capabilities of anyrealizable bridge, but of the underlying networktechnology as well. However, there is the moreinteresting question of the magnitude and durationof traffic transients in real networks, and theamount of memory required to avoid packet lossduring those transients.The size of the bridge memory can be boundedif one notes that most high-level protocols builton a datagram service like Ethernet expect tiinelypacket delivery. If the delivery of a packet isdelayed excessively, these protocols treat thepacket like a lost datagram and retransmit it. Inthese circumstances forwarding the originalcopy has the undesirable effect of generatingmultiple copies, thus increasing network congestion.One way to avoid this problem is toemploy the concept of maximum packet lifetime.This concept is enforced by the bridge andc:in be used to place an upper limit on bridgememory requirements. Unfortunately, even arather short packet lifetime requires largeamounts of memory. For example, a lifetime oftwo seconds requires five megabytes of memory(lOMB/second X 2 seconds X 2 directions -:- 8) .Although declining costs may make memories ofthis size practical in the future, packet lifetimecould not be used to size memory for thisproduct.Another way to size bridge memory is to simulatethe behavior of the system under variousworkloads. The LANBridge 1 00 can be modeledrather easily if one notes that the bridge takes lesstime in deciding to discard or forward a packetthan the packet takes to transmit. In this case theforwarding operation will not be a bottleneck,and the receive buffers will never hold morethan one packet. The transmit buffers, however,will be emptied at a variable rate depending onthe load on the Ethernet.We considered four different workloads: thefirst based on an existing timesharing environment,the second on a file transfer application,the third on a process control system, and thefourth on a hybrid of office and process control .The results showed that 16KB of memory (ineach direction) was inadequate. Increasing thememory size beyond 64KB gained very little interms of performance . Therefore, since 64KBmemories were readily available, a 16-bit memorybus provided the required 128KB in a convenientand cost-effective manner.Packet Memory PerformanceThe packet memory system in the bridge wasdesigned to handle worst-case conditions withoutallowing overruns, underruns, or processormemory · cycle starvation. The worst case occurswhen both Ethernet interfaces are continuouslyreceiving but not forwarding Ethernet packets ofthe minimum size (64 bytes) . The packet sourceand destination addresses are examined onlywhen a packet is received. Therefore, if the packetswere forwarded, fewer memory cycles wouldbe required. If the packets were longer, thenumber of memory cycles needed to deal withbuffer descriptors would be reduced, sincemore data would be contained in each buffer.Under worst-case conditions, the memory musttransfer approximately 108 16-bit words perminimum packet time (63.3 microseconds) .When the required memory refresh cyclesare also included, one memory cycle takes580 nanoseconds.The packet memory system must also providelow-latency microprocessor access so that valuableCPU cycles are not lost because of packetmemory wait states. Since the LANCE chips areburst-transfer direct memory access (DMA)devices, any shared bus system would have aninherently unacceptable latency. Therefore, aftersome analysis, a four-port memory system waschosen: a port for each LANCE, a port for theCPU, and a port for the refresh operation. Thememory is fully buffered so that the RAMs themselvescycle every 300 nanoseconds, eventhough the LANCE has a 600-nanosecond minimumcycle time, and the CPU has a 400-nanosecondminimum cycle.The Final LANB ridge 100 DesignA high-level block diagram of the final !.AN­Bridge 100 design is shown in Figure 6. Its logicis quite similar to the original prototype designin Figure 5. The primary change was the additionof hardware to assist in locating forwarding information.In addition, the single bus has beendivided into several buses to increase the totalbandwidth of the system.SummaryThe LANBridge 1 00 is the first product to implement<strong>Digital</strong>'s Extended LAN Architecture. Withthis product, customers may easily expandbeyond the confines of a single Ethernet to an70<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New ProductsIMC6BOOOMICROPROCESSORII IBINARYROMMEMORYSEARCH LANCE LANCECONTROLCONTROLRAMNVRAMETHERNETADDRESSRAMIPACKETRAMFigure 6extended network with as many as 8000 activestations. Up to eight Ethernets may be interconnectedin series (22,400 meters of cable) . Aggregatebandwidth will increase proportionally withevery added Ethernet, and usable bandwidth willalso increase substantially in many applicationenvironments.The Extended IAN Architecture itself is nowa key part of <strong>Digital</strong>'s networking strategy.As exemplified in the LANBridge 100, thisarchitecture permits substantial expansion inthe physical limits of a given IAN technology.These physical dimensions include geographicextent, number of stations, and aggregate bandwidth.Bridges implementing the ExtendedLAN Architecture also provide increasedavailability as a result of the spanning treealgorithm and the standby operation mode. Thetransparent operation of high-performancebridges enhances significantly the capabilitiesand services offered by both <strong>Digital</strong> and non­<strong>Digital</strong> equipment.The Extended IAN Architecture also provides aunifying mechanism in networks composed ofmultiple homogeneous or heterogeneous IANs.The bridge concept may be extended to interconnectIANs with different physical layers, suchas baseband and broadband Ethernet. Within certainconstraints, it may also be used to interconnectdissimilar IANs.ETHERNETLAN'riETH E RNETLANHigh-level Block Diagram with MC68000 jIiThe origins of the LANBridge 100 and theExtended IAN Architecture may be traced to aproject whose original goals only vaguely recognizedthe need for such a mechanism. Both theproduct and architecture are the result of gatheringand analyzing customer requirements, followedby applying innovative design techniques.AcknowledgmentsThe authors would like to acknowledge the othermembers of the original p;roject team. In addition,the work of Bob Shelley and Tony Robillardwas instrumental in the deign and constructionof the bridge breadboard. Tony Lauck, RadiaPerlman, George Varghese, ! Mike Soha, as well asthe the entire LANBridge 1 00 development teamIprovided invaluable additions and insight to thearchitectural process.References1. B. Stewart and W. Hawe, "Local Area NetworkApplications," Telecommunications(<strong>September</strong>, 1984): 96f-96u.2. IEEE Project 802 Local Area NetworkStandards , "IEEE Standard 802.3 CSMA/CD Access Method and Physical LayerSpecifications," Approved IEEE Standard802.3-1985, ISO/DIS 8802/3 (July'1983).'IDtgttal TecbntcalJournalNo. 3 <strong>September</strong> 198671


The Extended Local Area Network Architecture and LANBridge 1003. W. Bux, "Local-Area Subnetworks: A Per- Open System Interconnection (DecemformanceComparison," IEEE Transac- ber 1983): 1388-1393.tions on Communications, COM-2914. W. Hawe and G. Varghese, "Extended(10) (October 1981): 1465-1473.Local Area Network Management Princi-4. W. Hawe and M. Marathe, "Predicting Eth- pies," <strong>Digital</strong> Equipment Corporationernet Capacity-A Case Study, " Proceed- technical paper ,submitted to IEEE 802ings of the Eighteenth Computer Perfo r-Standards Committee, IEEE Documentmance Evaluation Users Group Con- <strong>Number</strong> IEEE-802.85.1.98 (October 31,ference (October 1982): 375-388. 1984).5. IEEE Project 802 Local Area Network 15. R. Perlman, "An Algorithm for DistributedStandards , "IEEE Standard 802.4 TokenComputations of a Spanning Tree in anPassing Bus Access Method and PhysicalExtended LAN ," <strong>Digital</strong> Equipment Cor-Layer Specifications," Approved IEEEporation technical paper submitted toStandard 802.4-1985, ISO/DIS 8802/4IEEE 802 Standards Committee, IEEE Doc-(December 1985).ument <strong>Number</strong> IEEE-802.85.1.97 (October31, 1984).6. IEEE Project 802 Local Area NetworkStandards , "IEEE Standard 802.5 Token16. R. Jain and R. Turner, "Workload CharacterizationUsing Image Accounting," Pro-Ring Access Method and Physical Layerceedings of the Eigteenth Computer Per-Specifications,'' Approved IEEE Standardfo rmance Evaluation Users Group Con-802.5-1985, ISO/DIS 8802/5 (Decemberference (October 1982): 111-120.1985).17. The Ethernet: A Local Area Network,7. B. Stewart, W. Hawe, and A. Kirby, "LocalData Link Layer and Physical LayerArea Network Connection,'' Telecommu-Specification, Version 2. 0 (<strong>Digital</strong> EquipmentCorporation, Intel Corporation, andnications (April 1984): 54-66.8. W. Hawe, A. Kirby, and B. Stewart, "Trans- Xerox Corporation, Order No. AA-K759BparentInterconnection of Local Networks TK, November 1982).with Bridges,'' journal of Telecommuni-18. R.M. Metcalf and D.R. Boggs, "Ethernet:cations Networks , vol. 3, no. 2 (Summer,Distributed Packet Switching for Local1984): 116-130.:. ,,Computer Networks," Communications9. W. Hawe, A. Kirby, and A. Utuck, "An of the ACM, vol. 19 Ouly 1976): 395-Architecture for Transparently Intercon- 403.necting IEEE 802 Local Area Networks,"19. R. Jain and · Hawe, ''Performance Analy-<strong>Digital</strong> Equipment Corporation technicalsis and Modeling of <strong>Digital</strong>'s Networkingpaper submitted to IEEE 802 StandardsArchitecture," <strong>Digital</strong> <strong>Technical</strong>journalCommittee, IEEE Document <strong>Number</strong>(<strong>September</strong> 1986, this issue) : 25-34.IEEE-802.85.1 .96 (October 31, 1984).20. M. Gerla and L. Kleinrock, "Flow Control:10. B. Mann, C. Strutt, and M. Kempf, "Termi-A Comparative Survey," IEEE Transacworks,"<strong>Digital</strong> <strong>Technical</strong> journal (Sep-nal Servers on Ethernet Local Area Nettionson Communications, vol. COMtember1986, this issue) : 73-87.28, no. 4 (1980): 553-574.21. R. Jain and S. Routhier, "Packet Trains:11. DNA Phase IV Routing Layer Specifica-Measurements and a New Model for Comtion,Version 2. 0. 0, (Maynard: <strong>Digital</strong>puter Network Traffic," IEEE journal onEquipment Corporation, Order No. AA-Sp ecial Areas in CommunicationsX435A-TK, 1982).(Forthcoming, <strong>September</strong> 1986) .12. Internet Transport Protocols (Rochester:22. F. Tobagi and V. Hunt, "Performance:Xerox Corporation, Order No. XSIS.Analysis of Carrier Sense Multiple Access028112, 1981).with Collision Detection," Computer13. R. Call on, "Internetwork Protocol," Pro- Networks , vol. 4, no. 5 (OctoberjNovemceedingsof the IEEE, Special Issue on ber 1980): 245-259.72 <strong>Digital</strong> TecbntcalJournalNo. 3 <strong>September</strong> 1986


Te rminal Servers onEthernet Local Area NetworksBruce E. Mann[ Colin Strutt!Mark F. Kempf I<strong>Digital</strong>'s terminal servers provide flexible, cost-effective connectionsbetween terminals and host systems in a local area network (LAN). Theproduct developers tried several approaches before developing te LocalArea Transport (LAT) protocol as the basis for all terminal servers. TheLAT architecture supports connections to multiple hosts over: a highbandwidth Ethernet LAN. LAT establishes a single virtual drcuit 'betweena terminal server and each host, and individual sessions are multiplexedover a virtual drcuit. A unique directory service permits terminal serversto be configured automatically, learning about hosts as they becomeavailable. The latest implementations support mixed-vendor environmentsand <strong>Digital</strong>'s major operating systems.The Original ProblemIn 1981, <strong>Digital</strong> faced the task of designing amethod for connecting a few hundred "dumbterminals" and printers to a VAXcluster system.If, as in the past, the terminals were connected toa single computer, then many of the advantagesof clustering would be negated. Instead, it wasproposed that terminals be connected to a"front-end" terminal server shared by all membersof the cluster. This front end would thenallow more flexible connections. A user terminal,for example, could connect to any processorin the VAXcluster group, rather than directlyconnecting to just one. Our goal was to migrateour existing installed terminal base gracefullyfrom single-processor attachments to VAXclustersystems.The original effort to provide this server wascalled the CI-Mercury project by our developmentgroups. We aimed to attach this terminalserver directly to the high-speed cluster interconnect,called the CI, so that the server functionedas a switch. However, the cost of thisscheme proved to be excessive. (The cost for theinterface to the CI itself was about $20,000.)Moreover, a connection to the CI would haveresulted in a server that could connct only tonodes in a single cluster.We also studied other vendors' switch offeringsas front-end terminal switches. These productsfunction much as do the dataswitch prod-ucts available today; thar is, backplane multiplexerson the CPUs are switched to the terminals.The problems with! this approach wereexcessive cost, the lack ofl <strong>Digital</strong> technology inthis product area, and po9r availability.Because of these complexity and cost factors,the original CI-Mercury project was replacedwith one called Pluto. This project envisionedusing an Ethernet as the interconnect, thuslowering the attachment cost dramatically.This server was based on a PDP-11 central processor,and we chose a variant of the RSX- 11Soperating system for the initial kernel software.The lower-layer communiCations protocols usedbetween Pluto and the V Ax cluster nodes werethe DECnet protocols, suc b essfully used in otherproducts.j .We believed that Pluto could be cost effectivein large installations; how ver, its initial cost wastoo high to be competitive in smaller configurations.This cost factor was especially important asEthernet became an integral part of <strong>Digital</strong>'sstrategy. With Ethernet, it became practical andcost effective to distribute small terminal serversthroughout an office environment rather thanconcentrating all termina interfaces in a large,centrally located server. Therefore, in late 1981,work began on an eight-line terminal server, theIprimary goals being low 'cost and high performance.Internally, this pnhect was dubbed PlutoJunior, later called Poseidbn.<strong>Digital</strong> TecbnicaljournalNo. 3 <strong>September</strong> !98673


Terminal Servers on Ethernet Local Area NetworksLate in 1983, significant problems wereencountered in the design ofthe Pluto and Poseidonterminal servers. The CTERM protocol, anew design of a layered DECnet protocol offloadingcharacter-processing overhead from thehost to the terminal server, proved to be morecomplex than anticipated. Measurements of message-processingoverhead and estimates of theoverhead in the DECnet-VAX software showedthat CPU consumption in the host system wouldbe a problem for keystroke editors. Existing studiesshowed that terminals were used in keystrokemodes, rather than command-line modes, morethan fifty percent of the time. Moreover, thePluto server itself was experiencing severe performanceproblems. For example, CPU saturationoccurred when running less than six terminalsat 9600 baud, even when the terminalinterfaces used direct memory access (DMA) .Finally, a number of issues, not consideredduring the requirements phase, became moreapparent:• How could a VAXcluster system be viewed asa single system rather than as individuallyaddressable nodes?• How could the terminal load be balancedacross nodes in the V AXcluster system?• How could the management of the terminalservers be automated?Thus the use of the CTERM protocol for terminalservers in both Pluto and Poseidon was halted.(In fact, the Pluto project with an RSX kernelwas used successfully as the basis for a number ofdifferent servers in the Ethernet CommunicationsServer, or DECSA, family, including theDECnet Router, DECnet RouterjX.25 Gateway,and DECnetjSNA Gateway products. The samehardware base, though with a completely rewrittensoftware kernel, formed the basis for the finalEthernet Terminal Server.) 1However, the original task still remained;therefore, an alternative solution was proposed,based upon work done using a new architecturecalled local area transport (LAT) . The LAT solutioninvolved three essential components thatwere 'unique to that architecture:• A new transport and naming architecture toreplace the DNA routing, transport, and sessionlayers• A new operating system for the terminal server• A new "port" driver for the terminal driver ofthe VMS operating systemThe Development of LATIn late 1981, the prototype of the original LATserver was developed on a VT 1 0 3 terminalserver, which contained a small Q-bus backplanewith a PDP-1 1/23 system and an Ethernet controller.(An Ethernet controller made by 3COMCorporation was used since <strong>Digital</strong> had no Ethernetproducts available at that time.) This earlywork involved quantifying the maximum character-echodelay that a person could comfortablytolerate. We learned that an experienced touchtypist encounters difficulties when the echo timeexceeds 100 milliseconds. By extrapolating fromthis fact, we deemed that the network and CPUefficiency of the entire LAT subsystem should bedramatically improved. The approach was to"procrastinate" for up to 80 milliseconds aftercharacters were received from the terminals ateach server. This delay had the very desirableeffect of reducing the number of messages processedby the Ethernet, the host systems, and theterminal servers. (Eighty milliseconds is implementableas a multiple of either the 60-Hz linefrequencyclock common in the United States orthe 50-Hz line-frequency clock common inEurope and other countries.)In early 1982, we created a VMS driver(LTDRIVER) using a dedicated Ethernetcontroller to support the LAT server prototype.By April 1982, log-in to a VMS system froma server was achieved; about two weeks later,the performance relative to the then currentmultiplexer, the DZ-11, was measured. TheLAT connection was easily able to outperformthe DZ-11 (a programmed-interrupt controller)under a wide variety of loads. Under manyloads, the LAT connection was shown to outperformthe DMF-32 (one of a number of DMAcontrollers) .In early summer 1982, we convertedLTDRIVER to the shared Ethernet port driver.This conversion allowed a single Ethernet controllerto be used simultaneously for LAT software,and DECnet and other communicationssoftware. Unfortunately, this change yielded asignificant performance degradation. At thistime, however, the VMS Development Group wasdesigning a lower-level program interface to theEthernet driver that would allow system-levelVMS usage of the Ethernet. Currently, this interfaceis used to implement V AXcluster support viathe Ethernet.74<strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 1986


New ProductsBy late 1982, we decided to include both IATand CTERM support in the Pluto terminal server,but only IAT support in Poseidon. In addition,the original code from the prototype Vf1 03 terminalserver was migrated to a UNIBUS PDP- 11'system; this code was called I.AT-11.By early 1983, a significant number of VMSdevelopers were using the prototype I.AT-11servers. This software was maintained by the IATdevelopers. It was important that the softwareworked reliably since the VMS developers wereusing it in developing the VAXcluster software.As noted earlier, the original developmentteam for the CTERM terminal server on Plutoexperienced a number of problems. Therefore,in early 1984, a new terminal server was implementedon Pluto, based on the I.AT- 11 code andnot on the RSX software. This new server, containingsoftware only from IAT, was referred tointernally as Plato.The prototype I.AT- 11 code was developedinto a product to run on version 3. 7 of the VMSsystem. This product became available in July1984, somewhat before VMS V AXcluster supportappeared in V AXfVMS version 4. 0. One monthlater, the Ethernet Terminal Server, the productname for the Pluto terminal server, became available.The risk of having the VAXcluster offeringadversely affected by an unproven terminalserver was limited by releasing it with the earlierversion of the VMS system. Thus we took advantageof extensive "free" testing from over 1000internal users.In March 1985, the DECserver 100, the productname for the Poseidon terminal server, wasreleased. The DECserver 100 implementationwas radically different from the other terminalservers.DECserver 100Although the Ethernet Terminal Server andI.AT- 1 1 products provided the benefits of serverbasedterminal interconnect, they did not fullyimplement <strong>Digital</strong>'s ter';Ilinal server strategy. Forserver technology to bcome pervasive, it mustcompete with other terminal connection methodson the basis of cost alone. In cluster andmulti-host systems, servers provide necessaryand desirable added functions. Therefore, theyshould be compared with other connectionmethods by assigning some value to the additionalfeatures and then using cost/performanceas the deciding factor. In small single-systemienvironments, the added fJatures of server technologyare not necessarily perceived as addingvalue; then cost becomes the sole factor for comparison.<strong>Digital</strong>'s servers are at a disadvantage inthis situation because they offer features that costmore. <strong>Digital</strong> must pursue dual path to developservers for some applications and to maintain andexpand backplane terminal interfaces for others.As noted earlier, we knew that the EthernetTerminal Server could compete effectively oncost alone for large numbers of terminals; forsmaller configurations, however, it could competeonly on the basis of grater functionality. Itsfixed cost is relatively high, although the incrementalcost for each terminal added is low. Thuswe started to design a low-cost terminal server.The first decision we made was an importantone: the product would i be a local terminalserver and nothing more. !Telephone data linesusually terminate inside computer rooms. Therefore,Pluto, which is suited to computer roomconfigurations, already filled the need for a terminalserver with modem control capabilities.Poseidon was specifically designed to be distributedalong an Ethernet throughout an officeenvironment, near the atached terminals. Ofcourse, multiple Poseidons could also be used inwiring closets and computer rooms.We also believed that Pluto already provided ahardware base for other communication serverapplications; therefore, P9seidon need not supportapplications other than terminal serving.I -Although often desirable (rom the standpoint ofthe company's total product set, generality isalso the archenemy of low cost. Hardware thatserves many functions also has capabilities thatare unused in some applications. Those unusedcapabilities represent a cot from whkh no benefitis derived when an iolated application is1viewed.On the other hand, hardware designed for aparticular application can optimize cost and performanceby eliminating any unnecessary capabilities.The Ethernet Terminal Server and DECserver100 illustrate both ;ends of this spectrum.The hardware base for the former functions in anumber of general roles related to communications,such as the DECnet Router or DECnetjSNAGateway products. Consequently, this producthas a high entry cost, but a low incremental costas each terminal is adde


Terminal Servers on Ethernet Local Area NetworksA second equally important decision was madeearly in the project: the product managersdefined and then enforced a very aggressive costgo:'ll in terms of dollars per connection. That goalwas set in two passes. In the first, the engineersdid a preliminary cost analysis, taking intoaccount competitive pressures and currentlyavailable technology. In the second, the productmanagers decided the original goal was too high,lowered it, and then challenged the engineers tomeet it. This challenge gave the engineers everyincentive to squeeze cost out of the design .Although some cost reductions seemed quiteinsignificant and not worth the effort, in the endthe old adage of "watch the pennies and i:he dollarswill watch themselves" proved to be true.The insistence on meeting the cost goal also preventedus from adding "bells and whistles," withtheir associated costs and complexity, to therequirements list as the project progressed.Starting system design, we immediately facedan inescapable trade-off in the design options. Inthe ideal case, the cost per terminal to connect asingle isolated terminal should be the same ascost per terminal to connect, say, 16 terminals.That is, the cost steps should be uniform as terminalsare added to the system. Unfortunately,some of the costs in a server system are essentiallyfixed. For example, the power and packagingcosts are approximately the same whether aserver accommodates one terminal or four.These fixed costs result in a relatively large initialcost step, followed by smaller steps as terminalsare added, followed by another large stepwhen an additional server is added. We realizedthat a compromise was needed between step sizeand the potential for amortizing fixed cost overseveral terminals. As the design progressed, wedecided that eight terminals per server providedan acceptable step size that allowed us to meetthe cost-per-line goal.Work started on the hardware design witha clear cost goal, but with no preconceived· requirements for the implementation. It seemedfairly obvious that an eight-line server could bebuilt on a single printed circuit board. Sincethere is a substantial expense simply in connectingmultiple boards, we decided very early thatdirectly incorporating any pieces of existingproducts was too expensive. The server wouldbe a single board designed from scratch,although we were free to borrow design ideasfrom other products. We also decided to use onlyhigh-volume, and therefore inexpensive, componentswhere possible - a decision driven partiallyby the desire to shorten the design time.Mter these decisions, work started in earnest.One of the most important issues was makingsure there was enough processing power. Sincewe had confined the problem to a specific application,we could size the processing requirementsquite accurately. Pluto had to deal withmany potential applications and an expandablenumber of terminals, Poseidon with exactly oneapplication and eight terminals. Pluto has onemain processor with assist processors added asterminals are added; Poseidon did not have toexpand and needed only one processor ifit hadsufficient power. At this time, several extremelypowerful 16-bit processors became generallyavailable. We evaluated them, including somefrom <strong>Digital</strong> as well as other vendors. SincePoseidon would not be programmed by customers,the extensive PDP- 11 and VAX instructionsets were not really needed. We decidedfinally to use the Motorola 68000 chip, whichwas the lowest cost, most readily availablemicroprocessor with sufficient power.As the design progressed, we considered everypossible cost reduction option. For example, thedynamic RAMs are refreshed by software sincesufficient processing time exists to do that; thecost of refresh hardware could thus be eliminated.Chips were selected to perform multiplefunctions whenever possible. For example, theterminal interface (UART) chips have integraltimers used to control the software refresh, thetimer interrupt, and the watchdog timer. Essentially,the interrupt logic uses very little externallogic to tum around the interrupt priority levelto generate the vector address.Thus the design resulted in an extremely lowcost,fixed-function terminal server, the DECserver 100, which has proven to be, by far, themost popular member of <strong>Digital</strong>'s terminalserver fa mily. Figure 1 depicts the initial LATproduct.The LAT ArchitectureThe LAT ProtocolOne initial goal of the LAT architecture. was toconnect terminals to host systems using the Ethernetas a data link. Even today, LAT is•still usedprimarily for connecting terminals to hosts.However, its application has spread to connect-76<strong>Digital</strong> TecbnicaljournalNo. 3 <strong>September</strong> 1986


New ProductsVAXcluster SYSTEMVAX/VMSHOSTDECserver 100ETHERNETTERMINALSERVERTERMINALSTERMINALSDIAL-INMODEMSFigure 1Initial LA T Producting other asynchronous devices, such as printersor links to hosts other than those directly connectedto an Ethernet.The goals of the LAT protocol are as follows:• To permit dumb terminals to be connected tomultiple hosts• To be a transparent character transport mechanism(implying that character echo must beperformed by the host and not by a server)• To support a high-bandwidth LAN technology(specifically the Ethernet)• To use a fixed maximum bandwidth that ismuch less than the total LAN bandwidth,which should be used in a fair and predictablemanner• To be an efficient data link protocol, relativeto the higher-layer DECnet protocols, such asCTERM operating in a LAN environment• To provide for low CPU loads and memory useon the host system at the expense of higherCPU and memory utilization on the terminalservers• To allow for simple terminal server implementations,which means low-cost and highperformancehardware implementations• To permit automatic configuration so that, forexample, servers can determine, without manualintervention, the names and addresses ofhosts on the EthernetThe LAT protocol maks certain simplifyingassumptions:• Communication is local to a single logical Ethernet(possibly connected by repeaters andbridges); thus no routing capability isrequired.• Communication is inherently asymmetric,which simplifies connection management andpermits straightforwa!rd host implementations..I. .• The bandwidth of the Ethernet ( 10 megabitsper second) is much greater than the bandwidthneeded for a given terminal (e.g., 9,600bits per second), so that a timer-based protocolis appropriate.The normal model of dumb terminal usage isone of low-speed data entry, say a: few charactersper second, and higher-speed display in bursts ofseveral hundred characters at a time, taking severalseconds to display. In addition, a user is usuallysitting at his terminal while a program operatesat the host. LAT takes advantage of thisasymmetrical relationship. Also, the terminalconnection normally take place at the explicitIrequest ofthe user rather than of the host system.LA T also takes advantag 1 e of this asymmetricaspect.The server does not cmmunicate charactersto a host system as they are entered by the user;rather, it collects characters and periodicallytransmits them to the host. The time interval ofthis period, the "circuit timer," is quite short -<strong>Digital</strong> TecbntcaljournalNo. 3 <strong>September</strong> 198677


Terminal Servers on Ethernet Local Area Networkstypically 80 milliseconds . With many usersconnected, a host is interrupted much less oftenby gathering together all the characters typedby those users and sending them as a singlemessage.The lAT . protocol is divided into two distinctlayers, the virtual circuit layer and the slot layer.Virtual Circuit LayerThe virtual circuit layer establishes and maintainsan error-free communications path (avirtual circuit) between two nodes, typically aterminal server and a host, that wish to communicate.The connection is initiated by one end ofthe communications path and operates under thecontrol of the initiator. However, the circuit canbe terminated by either end. Typically, the virtUalcircuit connection is initiated when the firstterminal user requests a connection to a host systemto which no virtual circuit yet exists. Theinitiator of the virtual circuit is referred to as the"master node," the other end as the "slavenode." Thus the terminal server is normally themaster and the host the slave.The establishment of a virtual circuit connectionrequires a single message exchange. Informationsuch as protocol versions, message sizes,and node names are included in these messages.Simplified View of VirtualCircuit OperationWe start with a simplified explanation of the virtualcircuit operation. Once established, the dataexchange occurs as follows:• Every 80 milliseconds, the master sends to theslave a message containing any data that mustbe sent.• On receiving this message , the slave processesany data in that message and sends back areply containing any data waiting to be sent inthat direction.have ignored errors that may occur in messagedelivery, and we have assumed message deliveryeven when there is no data to send. We willexamine the implications of these cases shortly.The protocol as defined is, in effect, a requestresponseone. Such a protocol has the characteristicthat only one data link buffer need be allocatedat each end of the virtual circuit. This factcan be important for hosts that need to supportlarge numbers of virtual circuits without dedicatinglarge quantities of buffer space to that task.The termination of a virtual circuit can occurfrom either end; under normal conditions, however,the master usually initiates the closing.The IAT protocol defines three messages atthe virtual circuit layer: the start, run, and stopmessages. Thus for a typical virtual circuit, wemight see the exchange of messages depicted inFigure 2 (again, making the stated simplifyingassumptions) .Knowing the built-in limits on maximum messagesize and the rate at which IAT messages areexchanged, we determined that the maximumamount of data that can be transferred across anyvirtual circuit is just under 150,000 bits per secondin each direction. (In fact, the IAT protocoldefines a method for increasing the availablebandwidth for a virtual circuit by using multipledata link messages. To date, there has been noTIME(MILLISECONDS)MASTER0 START510 RUN1590 RUN95170 RUNSLAVESTARTRUNRUN• On receiving this reply, the master processesany data that was in the message.175RUN• Eighty milliseconds after one message wassent, the next message is seiu from the master.The message round-trip time is typically lessthan 10 milliseconds. This operation is timerdriven on the master, the terminal server, andevent driven (by message receipt) on the slave,the host. The operation is simplified because wenRUNn+5RUNn+10 STOPFigure 2 Exchange of Messages78<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


I!!!New Productsneed to implement this feature.) Once this maximumhas been reached, terminal users willexperience a degradation of service, sharedequally among them. As shown later, the minimummessage exchange rate may be much lessthan one exchange every circuit timer, due tooptimizations in the IAT protocol.Removing the SimplificationsSo far, our view of the virtual circuit protocol hasbeen constrained by two major simplifications.These two are concerned with errors that canoccur on the Ethernet, and with a mechanism forreducing traffic on the Ethernet when there )is no data to be sent. The following discussionexplains how the consequences of these simplificationsare taken into account.The IAT protocol, being based on Ethernet,presumes that the majority of packets will betransmitted and received without errors. Theseerrors can be due either to corruption of data(detected by CRC checking) or to bufferingproblems at the destination node. To account forany errors that do occur, a sequence numbermust be assigned to each virtual circuit message,and an acknowledgment of that message must bemade. No extra messages need be sent since thesequence number and acknowledgment fieldsare contained within the normal message formats.However, there is no negative acknowledgmentdefined by the IAT protocol for reasons ofsimplicity and the low error rates experiencedon Ethernet IANs.The Ethernet communications medium isinherently very reliable. Therefore, whenever amessage is unacknowledged within the SO-millisecondperiod before the next message is sent,the cause will normally be due to either a heavyCPU load on the host or a host crash. To avoidcompounding the problem of a transient overloadon the host CPU, IAT specifies that messagesare not retransmitted every 80 millis.econds.Rather, they are retransmitted onlywhen they have not been acknowledged withinapproximately one second. A given message willbe retransmitted a certain number of times; afterthat, the conclusion can be drawn that either thehost has crashed or the communications controllersor medium have failed. This number isknown as the "retransmit limit."If we can reduce the data sent over an idle virtualcircuit, the CPU load of the host will bereduced in tum. !AT employs a scheme wherebyeach end of the virtual circuit can agree to acquiescefor a time; a circuit in this mode is called"balanced." Once balanced, if no data needs tobe sent for a long time, the master will eventuallysend and the slave wiU then respond withsingle run messages. Thus! each end knows theother is still alive. This action is called a "keepalive"function, which taks place every 20 secondsby default.IIf data becomes availabl when the circuit isbalanced, then either end must be permitted to"unbalance" the circuit. If the master wishes tosend data, then this unbalancing operation is nodifferent from any normal run message that themaster may send. However, if the slave wishes tosend data, then it must send an "unsolicited" runmessage that is not explicitly solicited by themaster. As with any other run message, theunsolicited message is sequenced and must beacknowledged by the master before the slave isIpermitted to send another :run message.Thus by allowing virtual circuits to be balancedwhen there is no data to be sent, the IATprotocol uses much less Ethernet bandwidth andallows a corresponding reduction in the loadingof the CPU host.The virtual circuit layer provides reliable communicationbetween a pair of nodes. It also providesa datapath that is bidirectional, sequential,timely, and error free. All users desiring to communicateover that path are multiplexed over thesame virtual circuit, consquently lowering theCPU cost per user on the hbst. This multiplexingIfunction is the responsibility of the slot layer.!The Slot LayerThe slot layer establishes user sessions, transfersdata bidirectionally, and multiplexes and demultiplexessessions over virtual circuits. In thiscontext a session can be envisioned as a connectionfrom one user's terminal to one host system.In the simplified case, a terminal user firstidentifies the computer system with which hedesires to communicate. Aivirtual circuit is thenestablished - if one doe not already exist -from the terminal server t the chosen host system.A session is then established on top of thevirtual circuit. The servic ' e access point at thehost would normally be represented as a virtualterminal port into the host operating system.Thus the user would perceive the virtual terminalas being directly connected to the hostsystem. For example, on the VMS system , the<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 198679


Terminal Servers on Ethernet Local Area NetworksLOGIN OUT function can be run to allow the userto log in and continue with the normal interactiveuse of the system.At the slot layer, data is passed to the virtualcircuit layer as "slots," which are addressedunits of data. A number of different types of slotshave been defined. Each session has a unique slotnumber on the virtual circuit to aid in the multiplexingand demultiplexing of sessions over virtualcircuits. Slots are only sent over virtual circuitrun messages. Because slots all share theunderlying virtual circuit, no explicit errordetection and correction need be performed bythe slot layer.The establishment of a session is accomplishedusing one of the assigned slot types called a startslot. As with the start message (which causes thecreation of a virtual circuit) , the session establishmentoccurs with a single start slot exchange.First, the master sends a start slot requesting aconnection to the slave . If the slave is able toaccept the connection, it replies with a start slot;if not, due perhaps to lack of resources, the slavemay reject the connection with a reject slot containingan appropriate reason code. During sessionestablishment, various parameters are negotiated,one being the maximum quantity of datathat may be sent in a single data slot. This quantitycan be different in each direction, the largestbeing 255 bytes.As noted earlier, the virtual circuit layer providesan error-free, bidirectional datapathbetween two nodes. The slot layer takes advantageof this condition and passes data in eachdirection independently, mirroring the operationof a terminal as a full-duplex device. Owingto the mismatch of speed between terminal andhost, some flow-control mechanism is needed toprevent one end from overloading the other.(This mechanism is independent of the flow controlrequired between the terminal server andthe terminal itself. That control is normally handledby using the ANSI flow-control charactersXON and XOFF.)The LAT protocol defines a credit-based flowcontrolscheme at the slot layer. In this controlscheme, the receiver must give permission to atransmitter to send each data unit, containingone or a collection of bytes. Data may beexchanged in units of up to 255 bytes in aslot type called a data-A slot. The sending of adata-A slot (if it contains any data at all) usesup a single "credit." If one end of a sessiondesires to send some data, that end must have acredit outstanding. Typical implementationsnormally keep two credits outstanding at anytime. Thus each end of a session must be preparedto receive up to 510 bytes of data. A creditis not reissued until all the data contained inthe data slot that used the credit has been consumed.. That is, all the data must have been eitherdisplayed on the terminal or read by the hostapplication.The initial credit allocation is passed in a startslot. The slot header will contain a field forpassing credits to the other end of the session;that field is non-zero when credits are beingextended. In this way it is possible to send adata-A slot with no data but with the credit fieldnon-zero. Such a slot does not itself consume acredit since it is presumed to take no additionalbuffering at the slot layer at the other end to processthe slot.There are three additional slot types definedfor the slot layer. The first, the data-B slot, communicatesthe following information:• The physical port characteristics, such as baudrate (e .g., 9600 baud) , character size (e.g., 7or 8 bits) , and parity (e.g., none, odd, even)• The session characteristics, such as whetherthe ANSI flow-control characters (XOFF IXON) should be treated as data or flow-controlmessages• The in-band signaling of break conditions orsignaling errors (parity or framing errors)The data-B slot is subject to the same creditmechanism as the data-A slot and indeed sharesthe same credits.The next slot type, the attention slot, is notsubject to credits and is used for out-of-band signaling.This slot is currently used only for anabort-output operation; for example, discardingany output waiting to be sent to the terminalwhen a cancel-output ("O) character is typed.A session may be terminated by either end viathe final slot type, the stop-slot. Typically, thestop slot is sent by the host system after the userlogs out of the system.Directory ServiceOne goal of the LAT protocol is to permit theautomatic configuration of the LAN. The importantinformation that needs to be disseminatedthroughout the LAN is the name of each service80<strong>Digital</strong> TechnlcalJournalNo. 3 <strong>September</strong> 1986


New Productsthat may be used. Rather than requiring that eachterminal server possess this information a priori,LAT provides a mechanism that permits eachserver to "learn" about the configuration.To accomplish this learning process, an additionalmessage type is used, the "service advertisement."This message is multicast from eachslave node to all master nodes and gives thenames of all services that the slave node is currentlyoffering. (A multicast message is a singlemessage addressed to and received by multiplenodes.) An advertisement is transmitted periodically,typically every 60 seconds. Thus on startup,a server can "listen" for service advertisementsand build a directory of available services.This directory can then be presented to the user,on demand, enabling him to choose whicheverservices he wants from those available when aconnection to a host system is desired.Service names, the names used to gain accessto the appropriate service access points, are notlimited to the name of the node on which the serviceis offered. Indeed, there is no restrictionthat any node may offer just one single service.Instead, LAT allows a given node to offer multipleservices.One common use for multiple service names isin a VAXcluster environment. Here the clustermanager can choose to offer as a service a namerepresenting the logical name of the cluster, inaddition to (or instead of) each individual nodename. When a user requests a connection to theservice name representing the cluster, the terminalserver can select one of the available nodes.In this case all nodes offering the same servicewill be presumed to be offering identical capabilitiesto the user.To assist the terminal server in choosing anode, the service nodes provide a "rating" associatedwith each service offered. The rating is anumeric value from 0 to 255 that representssome measure of the resources available to applyto that service. For example, the current VMSLTDRIVER implementation takes into accountthe most recent CPU idle time, the CPU type, theamount of memory, and the number of remaininginteractive job slots. VMS LTDRIVER alsoallows the system manager to specify a rating.The terminal server can then choose, at anyinstant, the node that offers a requested servicewith the highest rating and use that node asthe one to which to form the connection. Thischoice ensures that the load can be shared amongthe nodes in a VAXcluster system. The users neednot be aware of the current configuration of thecluster in order to form a connection.By carefully managing the service advertisements,the server makes the service directoriesreflect the current service list and their associatedratings. If a server fails o hear from a serviceprovider for some period, t?e server can assumethat the service provider has failed, or crashed.IThe server can then remove the service from itsdirectory of available servies.Note that this multicast naming service is alsoasymmetric; the master nodes do not send multicastadvertisements to the slave nodes. A recentaddition to the LAT protocol allows a slave to utilizea different multicast message to determine ifa given node name exists on the LAN . This techniqueis used so that host sytems can find terminalservers (in order to solibt connections fromtheir ports, described later) I by knowing only thename, not the specific Ethernet address, of theserver.Some details of this naming service deservefurther discussion. For example, the LAT "loadbalancing"and "fail-over" features are mostoften associated with VAXcluster systems. However,although they enhance <strong>Digital</strong>'s VAXclusteroffering, these LAT features are independentof it.i"Equivalent services" my also be offered byImultiple nodes using the directory service. Considerservices that are net\vork based, such asvideotext and dial-out modems. With an EthernetLAN, many independent nodes might offer suchservices; typically, however, users can access theservice only through nodes on which they haveaccounts. If a user's system is down, he is deniedaccess to the service, even though the serviceremains available on other nodes. For example,consider a videotext-basbd service, such asLIVE_ WIRE (an in-house j electronic bulletinboard) , that can be offered by many independentLAT host systems. If a LAT user connects toLIVE_WIRE, the terminal server software willdetect that the service is offered from multiplesources. The software will then make a connectionto the source believed to be currently offeringthe best level of service . If that serviceshould fail (i.e., stops sending Ethernet LAT messages), the terminal server s 1bftware will automaticallyreconnect the user toi an alternate providerof the same service if one1 exists; this action isknown as fail-over.<strong>Digital</strong> TecbnicalJounaalNo. 3 <strong>September</strong> 198681


Terminal Servers on Ethernet Local Area NetworksFuture versions of <strong>Digital</strong>'s LAT products maymake more extensive use of the LAT service capability.That would make it possible to installapplications that are accessible to the extendedLAN but not to the wide area network. A form ofnondiscretionary access control is implicit inthis design.LAT group codes can be used to partition anEthernet logically when the number of nodesgets large. By large, we mean more than I 00 services.Having more than 20 services or so meansthat a server display with one line per servicewill no longer fit on a terminal display withoutscrolling.Product Implications of theLA T ArchitectureAl though not originally conceived as a distributedterminal switch, an Ethernet can be usedeffectively in that role if combined with the terminalserver products. This fact remains trueeven when the Ethernet and host system are runningother protocols simultaneously, such asDECnet and VAXcluster systems based on Ethernet.Our experience has shown that a single dedicatedEthernet segment, without bridges, caneasily support several thousand concurrentusers.Functioning as a distributed terminal switchin the <strong>Digital</strong> computing environment, LAToffers significant advantages over dataswitchesand backplane multiplexers. The most prominentof these advantages is that any terminalserver user can connect to any host system ."Blocking" connections to host systems (moreaccurately called "port contention") is not anissue because host-system ports are logical, notphysical. A VAX(VMS system is limited by theLAT architecture to about 6 million simultaneousconnections, or 32,000 terminal servers, eachwith up to 255 sessions. This large number representsa. significant cost advantage, especiallyconsidering that Ethernet controllers are standardoptions on many of <strong>Digital</strong>'s processors. Inthis case the host-processor terminal connectioncost then becomes negligible, making backplane-orientedterminal switches much lessattractive . This cost advantage improves as thesize of the system increases. Table I comparesthe requirements of LAT with those of a dataswitchfor different numbers of terminals andhosts.Some additional advantages afforded by usingLAT are as follows:• Multisession capability, not offered by dataswitches• Simplified installation and management(especially where users and computer systemsare often added or moved around)• Higher availability due to the lack of any singlepoint of system failure• Simplified, incremental expansion andmigration capabilities inherent in <strong>Digital</strong> 'sextended LAN architecture , utilizing bridgesLAT PerformanceLAT performance is measured in terms of CPUload per user, which decreases as the number ofusers performing terminal IjO increases. ThusLAT performance increases with increasing CPUloads. Under light loads, LAT uses a relativelylarge amount of CPU resources. This is understandableif the cost of processing an Ethernetpacket containing a single character is comparedwith the cost of servicing a single DZ- I I characterinterrupt. As more data is exchanged, however,the number of messages exchanged doesTable 1A Comparison of Host Connections for LAT and Dataswitch<strong>Number</strong> of Terminals,<strong>Number</strong> of HostsLAT RequirementsDataswitch Requirements8 terminals1 host64 terminals8 hosts512 terminals16 hosts8 server connections1 Ethernet adapter64 server connections8 Ethernet adapters5 ·1 2 server connections16 Ethernet adapters8 terminal connections8 host connections64 terminal connections512 host connections512 terminal connections4096 host connections82<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New Productssoles for management or system-level debugging.Moreover, reverse LAT can also be usedto provide shared access to a pool of dial-outmodems.fImplementations and Applications0


Terminal Servers on Ethernet Local Area NetworksFigure 4Communications ServerInternally, the DECserver 100 is radically differentfrom the other two members of theterminal server family, yet still retains the sameexternal characteristics. The DECserver 100 is alow-cost terminal server capable of connectingeight asynchronous ASCII terminals to an Ethernetusing the LAT protocol. This server is a verycompact unit and can be located in a computerroom, a communications closet, or in an officeenvironment. The server has no modem control.Modem control is implemented using an 8-MHzMotorola 68000 chip, with 128KB of RAM , and512 bytes of nonvolatile RAM (NVRAM). Like theEthernet Terminal Server software, the DECserver100 software is down-line loaded from aDECnet load host.Inc. Each PAM interface connects up to eight linecards, each of which is a dual RS232C interfacewith full modem-control capability. The serveralso has a console boot terminator (CBT) modulefor self-test code, bootstrap code, and remoteconsole support. The Ethernet Terminal Serveroffers a user interface similar to that on the DECserver100. Using the LAT protocol, the servercan connect up to 32 terminals (either locally orremotely via modems) to the Ethernet. The EthernetTerminal Server can be located in a computerroom environment or a communicationscloset. The software is always down-line loadedinto the unit from a DECnet load host across theEthernet.Extensions to the OriginalImplementationsThe initial implementations of the LAT protocolwere on the terminal servers described aboveand on VAX/VMS host systems. The serversimplemented only the master end of theLAT protocol, whereas the hosts implementedthe slave end . Follow-on implementationshave added similar support for additionalhost systems: the MicroVMS, RSX- 11M-PLUS,MicroRSX, ULTRIX-32, ULTRIX-32m, TOPS- 10,and TOPS-20 systems.Each system implementation offers access tothe command interpreter as the service accesspoint. Figure 5 illustrates this support.VAX/VMSSYSTEMRSX1 1 -M-PLUSSYSTEMUL TRIX-32SYSTEMTOPS-20SYSTEMMicroVMS MicroRSX TOPS-10SYSTEMSYSTEMSYSTEMIIIIIETHERNETDECserver 100TERMINAL// .\\. .TERMINALSSERVERIITERMINALS\\DIAL-INMODEMSFigure 5Additional LA T Host Support84 <strong>Digital</strong> TecbnicaljournalNo. 3 <strong>September</strong> 1986


New ProductsNON-LAT HOSTSVAX/VMSHOSTETHERNETTERMINALSERVERDECserver 100TERMINALSTERMINALS DIAL-IN PERSONALMODEMS COMPUTERSFigure 6Ethernet Configured as a Service NodeVersion 2_0 of the Ethernet Terminal Server,released in August 1985, added the reverse-IATimplementation, permitting a server to offeradditional services to which terminal userscan connect_ This implementation permitssessions to be created within the box as we llas across the network, thus fo rming a switchstyle of operation in a single server. The types ofservices that may be offered by the terminalserver can be grouped into the following threecategories.The first category is connections to non-IAThosts. In this mode , the server acts as the Ethernetconnection for systems (typically not madeby <strong>Digital</strong>) that cannot themselves offer IAT serviceson the Ethernet. Asynchronous ASCII portson these systems are connected to a terminalserver. Terminal users on the same or differentterminal servers can connect to the serviceoffered. They can then communicate with thenon-IAT host as though it were connected to theEthernet.The second category is service for dial-outmodems. Terminal users can connect to a port ina pool of dial-out modems. The users can thenuse the appropriate ASCII protocol to create adialed connection and then access the remotesystem via its own dial-in port.'The third category is service for personal computers(PC) . They can be cbnnected to terminalIservers and run in either of the terminal emulationmodes. Each PC thus acts as though it werea dumb terminal. A PC can ' also run in file transfermode when connected to another PC via .thesame , or another, terminal server. Figure 6 illustratesthe terminal server as a service node .Subsequent versions of the Ethernet TerminalServer, the DECserver 100, and the VMSLTDRIVER software all permit asynchronousprinters to connect to terlninat servers. Theseversions also allow print qu¢ues to be directed tothe printers from hosts. The IAT protocol hasbeen enhanced so that th connection mechanismremains under the ctintrol of the terminalj.server (for the reasons of efficiency mentionedpreviously) . That enhancement allows a host to"solicit" a connection from a port on a terminalserver. Once the connection has been made, datatransfer can occur as in the normal interactiveterminal case, except that the printer output isunder the direction of a VMS print symbiont. It ispossible, with these implementations, to directthe queues from multiple systems to a singleprinter or bank of printers being offered as acommon service. When a connection request ismade while the printer is eing used by anotherII<strong>Digital</strong> TechntcaljournalNo. 3 <strong>September</strong> 198685


Terminal Servers on Ethernet Local Area Networkssystem, the connection request can be queued.This queuing provides a basic mechanism forsharing printers among multiple systems.Some of <strong>Digital</strong>'s personal computers nowimplement the master end of the IAT protocoland can operate as simple single-session terminalservers. These servers are implemented as part ofthe DECnet-DOS and ProjDECnet releases andallow the PC to emulate a terminal connected toa terminal server. Combining this feature withthe servers that offer services, a PC user can connectto any PC that is connected to a terminalserver for file transfer applications, to a dial-outmodem, or to a non-LAT host system. Dataintegrity is provided "end-to-end" in PC-basedimplementations due to the lack of twisted pair,or simihir, wiring. Figure 7 shows the connectionsto asynchronous printers and IAT from personalcomputers.Within the IAT environment, the service nameoffered by a host system does not always have torepresent the command interpreter on a givensystem, though this is by far the most commonuse today. Instead, a service name could representan application program, which might be runautomatically when a connection request ismade. Alternatively, using the solicited-connectionmechanism currently employed for printers,applications programs could initiate connectionsto terminals (or other asynchronous devices)located within the LAN .DECserver 200The DECserver 1 00 interconnects terminals inan office environment at a very low price. Soonafter it was announced, it became clear thatmodem-controlled lines and connections to nonrAThost systems should also be priced justas low.Thus the DECserver 200 project was initiatedto produce a new server based on the DECserver1 00 design, but with modem control capabilities.Moreover, this product had to meet theoriginal cost goals of the DECserver 1 00. Thisproject involved a redesign of the printed circuitboard, yet retained the same system architecture.A faster version (1 0 MHz) of the same MC68000microprocessor was used, and memory wasincreased from 128KB to 384KB of RAM andfrom 512 bytes to 2KB of NVRAM. This increaseallowed room for the implementation of modemcontrol software and support for non-IAT hosts(i.e., reverse-rAT capabilities) . The increase alsoallowed a larger service directory database to bestored and an enhanced on-line help capabilityto be added.NON-LA T HOSTSVAX/VMSHOSTETHERNETTERMINALSERVERDECnet-DOSDECserver 100PRO/DECnetETHERNETTERMINALSERVER .TERMINALS ASYNCHRONOUS TERMINALS DIAL-IN ASYNCHRONOUSPRINTERS MODEMS PRINTERSFigure 7Asynchronous Printers and LA Ton PCs86 <strong>Digital</strong> TecbnicaljournalNo. 3 <strong>September</strong> 1986I


1 New Products,. •..1 _., - ·-. -- ::_Figure 8 DECserver 200Another feature of the DECserver 200 takesadvantage of the new DECconnect cablingscheme, allowing coQnections to be made usingDEC4 23 wiring. This feature allows communicationsat up to 19.2 Kbau.d over cable that is neithertwisted pair nor shielded, for relatively longdistances of up to 1000 feet. Figure 8 shows theDECserver 200 hardware.SummaryUnlike other existing packet-oriented transportlayer architectures, the LAT transport layerimplements asymmetric connection management,asymmetric data flows , and timer-basedmessage exchanges.The most unusual innovation of the IAT architectureis the use of multicasting as a presentationlevel naming service . On Ethernet, packetsare normally addressed to the adapter of a .specific system. However, the Ethernet specificationdescribes a form of logical addressing calledmulticast addressing. In this scheme a packetaddressed to a multicast address is receivednearly simultaneously by many independent systems.LAT uses these messages to completelyconfigure the topology automatically. Thisaction means that installing a terminal server is assimple as plugging it into the Ethernet and waitingfor services to be advertised.·Asymmetric connection management considerablysimplifies the complexity of the protocol inwhich terminal servers initiate connections tohost systems. If a host system wants to connect toa terminal server, that connection must be solicitedfrom the terminal server. This protocolsolves the problem of having many host systemscompeting independently for the same resource .The first "solicitation" is serviced by a connection,and subsequent requests are queued on afirst-in, first-out basis. IOn a particular terminal srver, all devices thatare logically connected to the same host systemshare messages both to and from that host .Within each message , each user's data is containedwithin slots. This ultiplexing, in conjunctionwith the delay timer, reduces furtherthe number of messages e:?cchanged. For example,as more users log in to a host system, thenumber of messages exchanged remains constantat approximately 12 per second in eachdirection, even as the lengths of the messagesincrease.The DECserver 100 and · DECserver 200 arelow-cost implementations of the IAT architecture,allowing terminals and other asynchronousdevices to be configured in a flexible and costeffectivemanner in a LAN ..Acknowledgments ,Over the years, a large number of people havecontributed to the architecture and productsdescribed in this paper. The authors acknowledge, with gratitude, all this work. In addition, anumber of people have taken the time to reviewthis paper and have made!many helpful suggestions;to these. people we a.lso extend our thanks.References1 . J. Morency et al ., ''The: DECnetjSNA Gateway.IProduct - A Case St;udy in Cross VendorNetworking,'' Digitl Te chnical journal(<strong>September</strong> 1986, this issue) : 35-53.<strong>Digital</strong> <strong>Technical</strong> journalNo. 3 <strong>September</strong> 198687


Paul R. Beckjames A. KryckaThe DECnet- VAX ProductAn Integrated App roachto NetworkingEarly DECnet implementations were completely layered above the servicesof the operating system. This loose bonding of network products tothe operating system resulted from separate development effo rts. From itsinception in 1976, the VMS operating system integrated networkingfunctionsadhering to the <strong>Digital</strong> Network Architecture (DNA). The DECnet­VAX product is the DECnet implementation most tightly coupled with itsparent operating system. This product provides an unprecedented degreeof transparency fo r network applications while remaining true to theDNA strategy. Transparency is achieved by providing access to networkcapabilities through system services, record management services, andthe standard 1/0 statements of high-level languages.When the first VAX processor and its VMS operatingsystem were designed a decade ago, theDECnet architecture was in its second majorphase . Several of <strong>Digital</strong>'s major operating systemshad already implemented DECnet Phase II.Therefore, a major goal of the VMS DevelopmentGroup was to provide networking capabilitieswith the initial release of that group's product.Both the VAX architecture and the VMS operatingsystem were completely new designs. However,the VMS system shares a common heritagewith the RSX- llM operating system. Some of theutilities in the first few VMS releases were actuallyimages of their RSX- llM equivalents runningin compatibility mode. That was not thecase with the DECnet-VAX product, the networkproduct in the VMS system.Previously, DECnet implementations had beenadd-ons to their host operating systems, whichpredated the development of the DECnet architecture.The VMS system, on the other hand, wasdesigned after the DECnet architecture had beenwell established. The VMS architects recognizedthat including networking capabilities was vitalto their system's success in the future. Thus theydecided to integrate those capabilities smoothlyinto the operating system itself rather than tolayer the architecture on top. Although sold as alayered product, DECnet-VAX was designed andimplemented by the same group that developedthe VMS software. This product was designedfrom the beginning to be a coherent pan of theVMS system . Its components are maintained withthe VMS source code and compiled as part ofeach VMS base level . This decision to integratethe DECnet-VAX development into the overallVMS project was instrumental in achieving thelevels of integration and transparency found intoday's product.In designing DECnet-VAX, a completely integratedapproach to networking was taken toachieve the following goals:• A high degree of transparency at many levels,allowing remote services to function in a waythat appears local to the system• The utilization of unique features in the VAXhardware and VMS software• High performance and efficiency• Ease of implementation of networkapplicationsTo build adequate DECnet capabilities into theVMS system, a model to view network functionshad to be developed. This model had to provideanswers to a number of strategic questions. How88<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New Productscould the network name space be built on thelocal name space of an individual node? Howwould network functions be accessed from withinthe operating system itself? To what extentshould a user be aware that he is specifying a networkfunction rather than a local one?This paper will describe how the design ofthe VMS system facilitates the integration of networkingcapabilities. The DECnet-VAX producttakes advantage of this design to provide networkingservices that are faithful to the DNAphilosophy while still tailored to the uniqueVA XfVMS environment.Foundations of the DEC net- VAXProductThe foundation of all networking applicationsis the ability of a program on one system to exchangedata with a program running on anothersystem. In the DECnet architecture this capabilityis called task-to-task communication. It is thebackbone upon which a wide range of VMS networkingfacilities are built. These facilitiesinclude• Remote file access and virtual terminalsuppon• Layered product extensions, such as distributedmail and remote database applications• Applications that rely heavily on file accessand task-to-task capabilities, developed byusers and third-pany companies• Distributed network management operationsThe DECnet-VAX implementation had to satisfythe needs of both end users and applicationdevelopers. Therefore , its main goals were toprovide remote file access and task-to-task communicationcapabilities that would be easy tolearn and use, functionally complete, and accessiblethrough the standard VMS 1/0 interfaces.Providing a high degree of transparency fornetwork activities was the key to achieving thesegoals. For example, transparency at the file levelmeans that accessing a file on a remote node isconceptually the same as accessing the file onthe local system. That access should not requirethe use of different commads or any changes toapplication programs.The VMS design was influenced by severalimportant concepts that laid the foundation forintegrating networking capa;bilities and evolvinga highly transparent user ip.terface to network.servtces.I1One fundamental concept is that the VMS systemtreats network operatiohs as a natural extensionof local 1/0 operations. The DECnet-VAXimplementation, from the session layer (providinglogical link services) down to the physicaldevice layer, is modeled after the file accessprimitives of the VMS file system. Both networkand file system operations use the assign channel(ASSIGN) , queue 1/0 (QIO) , and deassign channel(DASSGN) system serviCes as their programminginterface, and make u$e of the same subsetof QIO functions. Both ophittions divide theirwork between higher level fu nctions requiring aprocess context to provide a large address spaceand 1/0 handled through appropriate devicedrivers. At this level of abstraction, the programmingsteps required to engage in task-to-taskcommunication are quite similar to those neededto access a local file.Another design decisioq having a profoundeffect on the style of inte!rface to remote fileaccess was to integrate the! record managementservices (RMS) into the VMS system. RMS is usedfor all common file access operations by theoperating system as well as by most VMS utilities.These services provided a platform from whichto develop a common interface for both localand remote file access, as well as task-to-taskcommunication. The DEnet-VAX developersachieved transparent remot file access by incorporatingthe data access protocol (DAP) modulesin RMS to communicate wih a remote file accesslistener (FAL) . For local filJ access, RMS uses theQIO interface to the file system. For remote fileaccess, RMS uses the QIO interface to create alogical link to the FAL server program throughthe session layer of the network. FAL thenaccesses the file by acting as a local user of RMSon its system. The use of RMS is illustrated conceptuallyin Figure 1.The definition of a VS fi le specificationwas extended to include the node name with aprovision for an optional atcess control stririg topass authorization informaion to the remote system.The syntax of a node specifier is one of thefollowing:nodename::nodename"username password account''::<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 198689


The DEC net- VA X Product - An Integrated Approach to NetworkingLOCAL NODEFILEACCESSREQUESTRMSSERVICESILO cREMLVMSFILE '----SERVICESNETWORKSERVICES-DISKDRIVERCOMMUNICATIONSDRIVERIr---DISKNETWORKFILEACCESSCOMMUNICATIONSNETWORKFALREQUESTRMS'----DRIVER r-- SERVICES SERVER SERVICES. . .REMOTE NODEFigure 1RMS Interfa ce to Local and Remote FilesMoreover, the concept of a quoted file specificationwas introduced to allow file name informationon a non-VMS system (one not adhering tothe parsing rules specified for VMS files) to berepresented. In addition, two special forms ofquoted string were adopted to specify the targetentity in task-to-task communication. Thus thesyntax of a file specification for network accesscan be one of the following:nodespec::device:[directory]file.type;versionnodespec::' 'foreign-file-specifier' 'nodespec::"TASK= taskspec"nodespec::"n="where the latter two forms are used to identify anetwork task by name or object number.Another important early design decision was toprovide fu ll access to remote fi les, beyondremote file transfer and manipulation functions,through RMS. Currently, almost every RMS fu nctioncan be performed over the network on aremote VAX/VMS system. Thus most applica-tions using RMS can employ the network transparentlyto• Access sequential, relative , and indexed(ISAM) files• Utilize different access methods (sequential,random by relative record number, relative bykey, and record file address)• Operate in either record or block mode• Communicate with a network task as thoughreading and writing to a sequential fileRMS is used throughout the VMS system by the<strong>Digital</strong> Command Language (DCL) interpreter,VMS utilities, and the run-time library routinessupporting high-level languages. As a result,transparent remote fi le access and task-to-taskcommunication are available at the fo llowinginterface levels: DCL commands, high-level languager;o statements, RMS services, and I/Orelatedsystem services. Figure 2 illustrates theserelationships.90<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New ProductsIPROGRAMTOLOCALDIGITAL RECORD QUEUE ORCOMMANDMANAGEMENT 1/0LANGUAGE UTILITY SERVICES INTERFACE ! NETWORKDEVICEH1--HIGH-LEVELLANGUAGEHRUN TIMELIBRARYFigure 2All DECnet implementations provide task-totaskcommunication so that an application programcan exchange data with another programrunning on a remote system. In the VMS environment,task-to-task communication can be performedby the RMS services as if a file were beingaccessed. This capability is made possible by twodesign decisions.The first decision was to model task-to-taskcommunications within RMS as though it weresent to a bidirectional unit-record device. Thistype of device has many properties of a terminalor VMS mailbox. These properties allow an applicationprogram (or command procedure) toshare data with its remote counterpart throughsequential GET and PUT requests, just as if theprogram were processing a local data file. Furthermore,a CLOSE operation initiated by eitherpartner is signaled to the other as an end-of-filecondition.The second decision was to extend the syntaxof the quoted string form of an RMS file specificationwas extended to accommodate the identificationof a remote task, as described earlier.When the file specification passed to RMS onan OPEN request contains a quoted network taskspecifier, RMS will connect to the remote task orobject identified in the string instead of to theFAL object. The remote VMS process can thencomplete the connection by issuing an OPENrequest using the logical name SYS$NET. In subsequent1/0 requests from either cooperatingtask, data records are passed directly to and fro mthe remote task without using DAP, which isrequired when communicating with FAL.Interfa ce Levels fo r the VMS SystemIDECnet-VAX Building BlocksNetwork PrimitivesDECnet-VAX provides task-to-task communicationsbetween different nodes within a network.layered network applicati


The DECnet- VA X Product - An Integrated Approach to NetworkingTable 1Relationships between RMS and DECnet-VAXFile System RMS Programming DECnet-VAXQIO Function Service Language Operation Operation10$--ACC ESS $OPEN Open file Initiate or accept logical linkIO$_DEACCESS $CLOSE Close file Disconnect logical linkIO$_READVBLK $GET Read from file Read data across logical link10$-WRJTEVBLK $PUT Write to file Transmit data across logical link• The network ancillary control process ,NETACP, which handles those functions thatrequire a process context in which to executeNETDRIVER processes all QIO requests for thenetwork device. Network QIOs generally fallinto one of two categories: logical link traffic, ornetwork management requests. NETDRIVERforwards to NETACP any request for logical linkstart-up or shutdown (e.g., the IO LACCESS QIOused to create a logical link) , or for networkmanagement fu nctions. On the otherhand, NETDRIVER handles logical link transmitand receive requests (IQ $_WRITEVBLK andIOLREADVBLK) by itself.NETDRIVER also contains the bulk of the routinglayer of the DECnet-VAX software. Usinginformation provided by NETACP, NETDRIVERcan determine the optimal circuit on which tosend any received packet whose destination isanother node. By giving high priority to bothlogical link and routing-forwarding traffic,NETDRIVER eliminates the overhead of invokinga process to perform these high-throughputfunctions.NET ACP defines and provides access to thevolatile network database, which is the workingcopy of the permanent network database. Thevolatile database is allocated from NETACP's virtualaddress space. NETACP also controls thestate transitions of data links, the routing layer,and logical links. Both the start-up and shutdownof logical links are handled in NETACP, whichalso creates the process to receive an incominglogical link having no declared network task.In addition, NETACP provides support for <strong>Digital</strong>'s X.25 packet switch network product, VAXPSI. Through "data link mapping," NETACPmakes it possible to map the functions normallyprovided by a data link driver onto an X.25 con-nection. This mapping causes the packet switchnetwork (PSN) to act as though it were a DECnetdata link, thus allowing DECnet nodes connectedto the same PSN (but not to each other) to communicateusing DECnet protocols. These protocolsin turn permit any applications layered onDECnet to function correctly between the DECnetnodes.DECnet-VAX provides two other components,called the network control process (NCP) andthe network management listener (NML) , whichwork together to provide local and remote networkmanagement capabilities. 1 NCP providesthe user interface to network management functions.Network management is the process ofcontrolling those parameters that allow thevarious components of a network to functionefficiently. These parameters reside in two separatedatabases: a permanent database that establishesthe default parameter values upon nodestart-up, and a volatile database that containsthe current parameter values in a functioningnetwork.Both the volatile and permanent databases canbe accessed through NCP by using a commonuser interface . NCP in turn passes the parsedrequests to NML for actual processing. NCP thenformats the results returned by NML for the user.NML is a server whose function is to performnetwork management operations on behalf ofsome client. NML receives its commands in aprotocol called NICE, either from a local NCPcopy or over a logical link from another node(usually, but not necessarily, from that node'sNCP). NML then returns the results via the sameNICE protocol.NML is also the agent that owns and maintainsthe permanent database. Upon receiving arequest for a permanent database operation, NML92<strong>Digital</strong> TecbnlcaljournalNo. 3 <strong>September</strong> 1986


New Productswill access the appropriate file using normalRMS calls. For operations on the volatile database,NML will issue a QIO to NETDRIVER,which in turn forwards the request to NETACP,where the request is honored.Data Link DriversThere is a separate device driver for each communicationsdevice that can be used by DECnet­VAX . In most cases these communicationsdevices can be used by non-DECnet applicationsas we ll. The same device driver is used toprovide the data link interface to NETDRIVER, aswell as the QIO interface for user-written applicationsnot using the DECnet software.The data link drivers supply the needed support for the variety of lower level protocols providedin the DECnet-VAX product. These protocolsinclude the synchronous and asynchronous<strong>Digital</strong> Data Communications Message Protocol(DDCMP) , Ethernet, IEEE 802, and SystemsCommunications Architecture (SCA) for communicatingacross a VAXcluster communicationsinterface (CI) .File Access ListenerThe File Access Listener is the component of RMSthat is activated to service a request for access tothe local file system by a remote node . As such,FAL is an extension to RMS on the remote system.FAL uses RMS services to access local files andDAP to send data back to the requesting node .VAX/VMS Environment in theNeh.IJork KernelThe underlying structure of DECnet-V AX wasdesigned around the special environment providedby the VMS system. The manner in whichnetwork programs are created and the environment·in which they run are governed more bythe design of the surrounding operating systemthan by the network architecture. Some importantaspects of this design are the way networkobjects are identified and activated and the use ofVMS command procedures. This use provides asimple, transparent mechanism for creating anetwork task.The DECnet architecture defines two classes ofexecution entities that have addresses within anetwork and with which network communicationscan be established. The first are called network"objects," identified by node address andobject number. The second ilre network "tasks,"addressed by node and task name. Network tasksare actually a special case of network objects,with object number zero reserved for identifyingnetwork tasks. If a logical link specifies objectnumber zero, it also supplies a task name identifyinga particular network task on the targetnode.In the VMS system, execuion of each programimage takes place within the context of a process.One process will tyically run multipleIimages serially during its lifetime. A flexibleImechanism was needed to associate a request fora network object or task ith the right process'running the right image .DECnet-VAX has three mechanisms to identifythe execution entity that will be associated witha network object or task. In the fi rst, an imageregisters itself in the network database as thespecified object or task. The image then waits forI(or initiates) connections with other programs inthe network.The second mechanism I involves creating anentry in a local database tliat identifies networkobjects. The database information specifieseither a command proced h re or an executableimage to be run in response to a request to connectto the specified object,. Upon receiving sucha connect request, NETACP will create a processin a specified account that executes the commandprocedure or image . Setting up an entry inthe object database provipes the flexibility tospecify account informatibn and privileges forthe object when it is activ£ted.The third mechanism isl a catchall. This acti-'vates a command procedure to serve as a networktask in the absence of ether an entry in theobject database or a nontransparent declarationof the network task by a running image . Uponreceiving a connect request for a network tasknot identified by either of the first two methods,DECnet-VAX assumes that a command procedureresides in the default dirctory of the accountspecified with the reque;st. DECnet-VAX thencreates a process to execute this command procedure.For example, upn receiving a connectrequest for a task called NETISK, undeclared byany running image , DECne-VAX will assume thata command procedure tailed NETTSK.COMresides in the default direttory.The DECnet-VAX software also causes a logicalname, SYS $NET, to be created for this process.SYS $NET translates to a data structure containingconnection information .for this logical link.<strong>Digital</strong> TecbntcaiJournalNo. 3 <strong>September</strong> 198693


The DECnet- VAX Product - An Integrated Approach to Ne tworkingThe command procedure can either activate animage that accepts the connection specified bySYS $NET or complete the connection directlyfrom DCL by opening a channel using the logicalname.There are both advantages and disadvantages tothe way in which DECnet-V AX uses separate processesfor diffe rent network objects . Each VMSprocess carries its own protection, defined bythe authorization parameters of the accountspecified for the process and enforced by aspectsof the VAX architecture . Therefore , using a different process for each logical link is a convenientway to maintain security and provideaccounting information . It is also a simple meansto keep separate the context of each logical link.For example, FAL must maintain the file protectionappropriate for each user accessing filesover the network. A FAL process handles only onelogical link (for one file access) at a ti me. Tohandle more would require FAL to completelyand securely replicate the security context ofeach user for each logical link concurrentlymaintained. That is a formidable task in the VMSsystem, with its complex set of authorizationprivileges.The primary disadvantage of maintaining differentprocesses for different security profiles isreduced performance. This reduction resultsfrom having to create a new process when a logicallink is established. This disadvantage can beoffset by using large timeout constants for eachaccount's NETSERVER processes and FAL logicallink caching, both of which are described later.ASSIGN and DASSGN System ServicesThe ASSIGN system service creates a channel to aspecified device. This service provides transparentaccess to the DECnet-VAX software by recognizingwhen the device specification includes anode name . ASSIGN then issues an IO$_ACCESSQIO function to NETDRIVER on behalf of thecaller to establish a logical link transparently. Inthis way, simply supplying a network task specifierin place of a device name will create a logicallink when a channel is assigned.The internal data structures associated withthe channel are defined to resemble an openfile on the channel. As a result, when called todeassign the channel, the DASSGN systemservice wi ll issue a QIO IO $_DEACCESS requestto close the file. The logical link is thendisconnected.Transparent and NontransparentModesDECnet-VAX provides two mechanisms to establishnetwork communications between applicationson different nodes. In the nontransparentmode, an image executes a network call to declareitself as a particular network task byname or object number. To use this mode a programmermust have a thorough understanding ofnetwork primitives. However, the mode providesgreater flexibility than the transparentmode . That is, the same image can support multiplelogical links concurrently and can even act asmultiple network tasks.The transparent mode provides a very simplemeans to establish a correlation between anetwork task and a process. In this mode, theimage being executed need not even be awarethat it is operating as a network task. Upon creatinga process to handle an incoming logical linkfor an undeclared ne twork object or task,NETACP creates the logical name SYS $NET.This name contains the network control block ­needed to complete the connection with theoriginator of the link. Performing a normalRMS OPEN on this logical name will start a chainof events culminating in the establishment of thelogical link:• The RMS OPEN operation issues an ASSIGNfollowed by a QIO IO$_ACCESS to confirmthe connection.• The RMS GET and PUT operations translate toQIO IOLWRITEVBLK and IOLREADVBLKcalls to the same network device .• The network device translates to networktransmit and receive operations.The standard logical names SYS SINPUT andSYS $0UTPUT can be assigned to the logicalname SYS $NET before an image is activated. Thisaction will allow programs originally writtento perform 1/0 from eithe r te rminal or diskdevices to act like distributed applications in anetwork. The programs require absolutely norewriting.At this point the various layers of the designprovide an environment in which network communicationscan proceed without any specificnetwork calls issued by the programmer. A simpleexample, using DCL command procedures,wi ll demonstrate the nontransparent mode ofnetwork communication.94<strong>Digital</strong> TecbnicalJournal'No. 3 <strong>September</strong> 1986


New Products1. At a node called SOURCE::, the command$ TYPE TARGET::"O=TIME"is entered from a process running underaccount USER.2. The TYPE image issues ari RMS OPEN toTARGET::"O=TIME".3. The OPEN service issues an ASSIGN requeston the string, which results in a QIOIO $_A.CCESS being issued to DECnet-VAX .4. A connect request for network task TIME isrouted to node TARGET, where the DECnet­VAX software creates a process to run a commandprocedure called TIME.COM. DECnet­VAX then creates the logical name SYS $NET,whose translation contains a network taskspecifier identifying the source of the logicallink.5. TIME .COM issues the commands$ DEFINE SYS$0UTPUT SYS$NET$ SHOW TIME6. DCL issues an RMS OPEN using the logicalname SYS $0UTPUT for its output. OPEN inturn issues an ASSIGN. When one logicalname points to another, the names are translatedin an iterative fashion until no fu rthertranslation is required. Since the logicalname SYS $0UTPUT points to the logicalname SYS $NET, the latter translation is usedby ASSIGN. ASSIGN finds the network taskspecifier and issues a QIO 10 $-ACCESSrequest to DECnet-VAX. The formation ofthe logical link is then completed. TheTYPE image at the source node now issuesan RMS GET, which then translates to aQIO IO $_READVBLK request on the networkchannel.7. The time is sent as a string by an RMSPUT operation, which then issues aQIO IO$_WRITEVBLK request on the channelestablished for SYS $0UTPUT. Since thisis a network channel, the QIO is handled byDECnet-VAX and passed across the logicallink to the source node.8. At the source, the data satisfies the QIOIO $_READVBLK, which in turn satisfies theRMS GET, allowing the TYPE image to displaythe time sent from the target node.Throughout this example, only two networkfu nctions were performed at the applicationlevel: the use of a network destination name inthe TYPE fu nction, and the reference to theSYS $NET logical name in the TIME commandprocedure.IDECnet-VAX Features fo r theVMS Environment / .Besides implementing thoe functions definedfor all DECnet implementations, the DECnet­VAX product supplies added-value featuresdesigned for the VMS environment. These extensionsto the architecture enhance the wayDECnet-VAX blends into the VMS system. Severalexamples are illustrated in the following·paragraphs.Proxy Log-inOne traditional problem with password-basedaccess control is making the required passwordavailable to all users needing access to a restrictedresource . If the user membership needsto change (e.g., if someone changes jobs andthus no longer has the right to access theresource) , a new password, which must be communicatedto all current members, is required.To address this problem, tbe concept of "proxyIlog-in" was added to the DECnet-VAX software in1983.21With proxy log-in, each node maintains adatabase of those network users having proxyaccess to specific accountS on the local system.The database is used t() provide a one-toonemapping between the user, identified asNODE::USERNAME, and the target proxyaccount. For example, take the case of the arrivalof a logical link request having no explicit accesscontrol information from user whose name is inthe database. In this case the process created byNETACP to handle the logical link will be runusing the authorization context of the proxyaccount. This mechanism allows members to beadded to or deleted from a particular proxyaccount without their a priori knowledge of theaccount.Cluster Alias AddresfDECnet-VAX nodes can dperate on VAXclustersystems. A cluster is a loo$ely coupled, multipleprocessor network featuring full sharing of diskstorage and common user !environments on eachr<strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 198695


The DEC net- VA X Product - An Integrated Approach to Networkingnode. Each member node within a cluster canbe directly addressed from any other node inthe network. At times, however, it is alsovery convenient to treat the cluster as a singleDECnet node . Among other advantages , thiscapability makes it possible for mail to be sent tousers with accounts in the VAXcluster systemwithout knowing which member nodes areactive .Associating the cluster with a DECnet addressis accomplished by supplying each node in thecluster with a second address, an "alias," representingthe cluster. Each router in the cluster (atleast one is required) adds the alias address tothe routing vector transmitted to other routersin the network. That makes the routing vectorappear to be the optimal path to the aliasaddress. As a result, the rest of the network cannotdistinguish the cluster alias from the addressof a physical node . This approach has an advantagein that it requires no unique support inother systems and no modifications to the DECnetarchitecture.As it is routed through the network, a messagewith the alias address will eventually arrive at arouter within the VAXcluster system. The routerwill recognize the destination address as its ownalias and select a node within the cluster toreceive the message. The selection process isbased on a weighted, round-robin algorithm. Theend communications layer within the router iscapable of identifying which node is associatedwith each logical link. Therefore, once a connectionhas been established, subsequent messageswill always be routed to the correct node withinthe cluster.Dy namic Asynchronous ConnectionsMany personal computers, ranging from IBM PCsto MicroVAX workstations, are now capable ofrunning the DECnet software over asynchronouslines. Thus has arisen the need for a more secureand easily managed mechanism for setting up terminallines to be used as DECnet communicationslines.Ordinarily, one terminal line must be dedicatedto DECnet use for each asynchronous lineneeded. When those terminal lines are not beingused for DECnet purposes, they cannot be usedas normal terminal lines. To solve this problem,DECnet-VAX introduced, in 1985, dynamic asynchronousconnections which allow an interactiveuser to dynamically convert the terminal linehe is using to a DECnet line. (This conversionrequires that access be from a PC using a terminalemulation package , such as SET HOSTjDTEunder the VMS software, and that the PC can runthe DECnet software.)Mter logging in via the terminal .emulator to anaccount on a routing node, the user directs theVMS system on the routing node to switch the terminalline to DECnet use. The VMS system sendsan escape sequence to the terminal emulator onthe PC. Recognizing the sequence, the emulatorconverts the line to a DECnet link at the localend. Meanwhile, the code in the router convertsthe line at that end to DECnet use. The design ofthe VMS terminal driver makes possible this conversion.The terminal driver separates its functionsbetween class drivers (implementinghigher-level functions) and port drivers (interfacingwith the hardware devices) . The DDCMPasynchronous device support in the DECnet-V AXproduct is implemented as a class driver. Thatmakes it possible to switch dynamically betweenDECnet and terminal use on a particular devicesimply by switching class drivers on the sameport driver.When both ends have switched to DECnet use,the normal routing layer initialization takesplace. Some additional checks happen duringrouting initialization on dynamic lines to ensurethat the node that just switched the line is permittedto do that by the router. These checksgive to the system manager on the router theopportunity to control which nodes should bepermitted to connect to his system.Performance IssuesAs DECnet-VAX has evolved, continuing effo rtshave been made to improve its performance.These efforts have run the gamut from restructuringthe basic modules to including supportaimed at improving specific areas of performance.The remaining sections discuss some ofthe areas that have yielded the greatest performanceincreases.Network Drivers and AncillaryControl ProcessesWherever possible, those functions having thegreatest effect on performance have been implementedin NETDRIVER. There they can be executedat high priority without changing the processcontext, which would be required forfunctions executed in NETACP.96<strong>Digital</strong> <strong>Technical</strong>]ournalNo . 3 <strong>September</strong> 1986


-Those functions that occur more infrequentlyhave been implemented in NETACP, which providesthe necessary process context. These functionsinclude state changes and others thatrequire a process context to allow access to alarge pageable database or system service. Theseinclude logical link creation and deletion, networkmanagement functions operating on thevolatile database, and routing table maintenance.Network Server ProcessesTransferring large files across a network oraccessing many remote files can consume considerableresources on a remote system. Thus theprovision of accurate accounting information forremote file access operations is quite desirable.This need led to the implementation of FAL as asingle-threaded server. That server runs in thecontext of a process logged in to the remote systemon an account that is accessible to the initiatorof the file access request.RMS, being procedure based, does not knowwhether or not an application program intendsto access additional files via the same account onthe remote system. Originally, the DAP implementationin RMS was designed to terminate thelogical link with FAL upon closing a file or finishinga file search sequence. Consequently, forexample, a wild-card operation transferring nfiles using the COPY command results in theinvocation of a total of n + 1 FAL processes.One process performs the RMS search sequence,each of the others transfers in serial fashion eachfile that is found. Unfortunately, this approachsignificantly reduced overall throughput, especiallywhen a large number of small files werebeing transferred.The primary disadvantage of using separateprocesses for individual network tasks lies in theoverhead required to create the process. Theincreasing complexity of authorization and protectionmechanisms within the VMS system hasincreased the start-up time during process creation.This increase is experienced by usersactivating network tasks on other nodes as anincrease in response time.Support for network server processes wasintroduced in 1983 to solve the overhead problem.A NETSERVER process can handle seriallymany logical links that require the same accounton the server node. NETACP maintains a list ofthose NETSERVER processes that have beenstarted for particular accounts but are now cur-IIIrently idle. When a new 'ogical link requestspecifying the same account is received, NETACPwill forward the request to an appropriate idleNETSERVER process instead of creating a newprocess to handle the request. On a busy system,this action can trim seconds off the start-up timefor the logical link. To pr¢vent the problem ofthe local system filling up with NETSERVERprocesses that no one nees to talk to, an idleNETSERVER will time out ahd delete itself after acertain amount of time.In 1986, FAL was extended to include thecapability to serially prodess multiple logicallinks (and as a result, multiple files) . This additionyielded a significant improvement in overallthroughput for file transfer activity, especiallyfor wild-card operations.Window-based Congestion ControlIn a large network, data packets must be routedthrough several nodes befre reaching their destinations.Congestion in ani intervening node canseverely decrease the thro 1ughput of all logicallinks using that path. Coninuing to send moredata through a congested nde only makes thingsworse. The systems at the ends of the logical linkhave no knowledge of which path is being used;therefore, they have no direct way of knowingwhere congestion may be occurring in the network.This problem was addressed through theimplementation of a winow-based "back-off"scheme designed to detect the presence of congestionsomewhere along lthe path. The rate atwhich data is sent will be reduced until theeffects of the congestion ate no longer seen.IiNode Database Structure<strong>Digital</strong> Equipment Corporation has a very largeinternal communications network. As that networkgrew in size, its volatile node databasebecame a performance bottleneck. Searchesthrough the database to locate a particular entryby name were causing exessive paging. Notingthat the node database isi frequently accessedusing either the node name or the address as asearch key, it was decide that the speed of alook-up should be the sae for either type ofsearch. To accomplish tht, the node databasewas augmented by two balanced binary searchtrees, one keying off the node address, the otheroff the node name. 3 Each entry in each tree containsa pointer to the node database entry referencedby that entry. It also has a separate pointerINew Products<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 198697


The DECnet- VAX Product - An Integrated Approach to Networkingto the node in the companion tree, making it possibleto parse the node database by either nameor address.Another problem with the volatile node databasedeveloped in 1984 as our internal networkgrew to over 2,500 nodes: the size of the databasecaused the NETACP process to exceed itspaging file quota. That quota was increased toaccommodate 5,000 nodes, a limit exceeded lessthan one year later. At that point the best solutionwas to reduce the number of pages required bythe node database rather than to continue takinga larger portion of the paging file.In a very large network, most nodes are representedmerely as names and addresses. Most ofthe other node parameters, such as routing initializationpasswords, are generally used only fora small subset of the total node population. Withthat in mind an optimization that "walked" thebinary trees was built into the search routines.Any node with only its name and address definedis completely represented by the binary treeentries; so no database entry is allocated. When atree search locates an entry with no associateddatabase entry, the name and address informationfrom the tree entries will be used to initialize atemplate database entry to return to the caller. Ina node database with 7, 000 entries, this optimizationresulted in a reduction of almost 2,500pages in the paging file, which cut NETACP'stotal page file utilization almost in half.Buffer Size OptimizationThe DECnet architecture does not allow the segmentationand reassembly of data packets at thedata link layer. The architecture requires that thebuffer size used by the NSP layer (the transportlayer) must be small enough to be handled byany data link in the network. Traditionally, thishas meant using 576-byte buffers in the NSPlayer.The 576-byte buffers limited the network'sperformance when Ethernet, with its 1500-bytedata link buffers, was supported. The NSP layerwas still constrained to use the smaller buffers,since lower capacity data links existed in the network.It was recognized that performance couldbe improved between nodes on the same Ethernetby using the larger Ethemet buffers. In thiscase there was no chance of the packets beingrouted through a node that could not handlethem. The problem was to recognize when thisoptimization could be safely used.Solving this problem was easy on a routingnode since it could determine that the destinationnode was exactly one "hop" distant on thesame Ethernet. On a nonrouting node, however,this information was not readily available.Fortunately, the NSP protocols establish thebuffer size as the smaller of those offered by thetwo parties involved. Furthermore, a nonroutingnode maintains (in its routing layer) a cached listof the nodes residing on the same Ethernet. Thatlist enables the nonrouting node to address packetsto other nodes directly without passing thepackets through a routing node. This action permitsa nonrouting node to always offer the use of1500-byte buffers when it initiates a logical linkrequest on an active Ethernet circuit. When therequest arrives at the target node, the cache therewill correctly reflect whether or not the sourcenode is on the same Ethernet. If so, the node caneither offer the larger buffers or demand the normalbuffer size.SummaryThe DECnet-VAX product makes possible theprovision of a comprehensive set of networkingcapabilities that are compatible with implementationsin other operating systems. DECnet-VAXdoes this while integrating a high proportionof those capabilities into the heart of the operatingsystem. That integration supplies servicesthat make local and remote operations appearindistinguishable.This integration was achieved by anticipatingthe need for integrated networking capabilitiesfrom the start. The necessary "hooks" were providedin a sufficiently general fashion to allowthe continued development and expansion ofthe networking product. This design allowedDECnet Phase II to be included with the earlyreleases of the VMS software, which did not haveto change as DECnet-VAX progressed throughPhases III and IV. Similarly, as the VMS systemitself evolved to support multinode VAXclusternetworks, DECnet-VAX was able to provide clusteraddressing through its coupling with theoperating system.As both the VMS operating system and DNAcontinue to evolve, the design of the DECnet­VAX software will permit it to follow both,providing transparent networking capabilitiesfor users of VMS.98<strong>Digital</strong> TecbntcalJournalNo. 3 <strong>September</strong> 1986


•New ProductsAcknowledgmentsAny discussion about the basic design of theVMS software should acknowledge the contributionsof its primary architects David Cutler andRichard Hustvedt. The design and implementationof DECnet-VAX owe much to the work ofScott Davis, Alan Eldridge, and Tim Halvorsen.References1. N. La Pelle, M. Seger, and M. Sylor, "TheEvolution of Network Management Products,"<strong>Digital</strong> Tecbnietil journal (<strong>September</strong>1986, this issue) : 117-128.2. P. Karger, "Security in DECnet: Authenticationand Discretionary Access Control ,'' <strong>Digital</strong>Equipment Corporation Internal <strong>Technical</strong>Report, DEC TR-121.3. D. Knuth, The Art of Computer Programming, Volume 3 (Reading: Addison Wesley,1973).<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 198699


john Forecastjames L. jacksonjeffrey A. SchriesheimIThe DECnet- ULTRIXSoftwareThe UL TRIX system is the second operating system approved by <strong>Digital</strong>for its VAX processors. Incorporating the <strong>Digital</strong> Networking Architecture(DNA) capabilities into this software was important to support distributedapplications. A key constraint was that no changes should berequired to existing DNA protocols or DECnet implementations. The4.2BSD socket interface was expanded to support the DECnet protocolsand a unique object spawner was created to simplify writing new servers.A network management structure incorporating DECnet's database conceptalso had to be built. The DECnet-UL TRIX software is the first productimplementing the DNA strategy on any variant of the UNIX software.Project GoalsThe DECnet-ULTRIX software is <strong>Digital</strong>'s firstproduct to be layered on the ULTRIX-32 softwareand is a key part of our ULTRIX strategy. Onemajor reason for developing DECnet-ULTRIX wasto bring the ULTRIX system into <strong>Digital</strong>'s computingenvironment. We believe that our customerswill better meet their computing needsby being able to use the VMS and ULTRIX operatingsystems together. Such a mixture of systemsrequires communications mechanisms thatare easy to use and manage, yet provide highthroughput. These mechanisms make possiblethe transportation of existing applications fromVMS systems to ULTRIX systems. Thus new distributedapplications can be built by takingadvantage of the strengths of each system.DECnet-ULTRIX Version 1.0 provides filetransfer, remote terminal access, mail, networkmanagement, and user programming interfaces.All these functions are completely compatiblewith all current implementations of the <strong>Digital</strong>Network Architecture (DNA) . The DECnet­ULTRIX software also makes possible a largenumber of other options, such as support for terminalservers, protocol gateways developed by<strong>Digital</strong>, layered applications, and managementtools. As we migrate the DECnet protocolstoward the Open Systems Interconnect (OSI)protocols, the DECnet-ULTRIX software willprovide the means for ULTRIX systems to communicatewith those of other vendors.Project ConstraintsIn planning the DECnet-ULTRIX design, wewanted to clearly identify our constraints at theoutset of the project. Thus we would have a consistentand well conceived framework for makingdesign decisions.We decided that the software should requireno changes to the currently available DNA protocolsand DECnet implementations. If problemswith other DECnet products were uncovered bythe DECnet-ULTRIX software, those problemswould be solved. This decision was made so thatthe software could be completely compatiblewith the large base of existing DECnet networkswithout requiring the upgrading or patching ofsoftware for any system. Our goal was to haveULTRIX systems simply "plug" into existingnetworks, thus adding new capabilities for ourcustomers.All features of the DECnet programming interfacehad to be provided even though some wouldnever be used by many customers. A DECnet­ULTRIX user should be able to write programs tocommunicate with any existing DECnet applicationprogram on any type of DECnet system.These features include passing optional datawith connection establishment and dissolution,passing access control information on a connectrequest, and rejecting a requested connectionwhile supplying a reason code.We decided that the DECnet-ULTRIX softwareshould be culturally compatible with the UNIX100<strong>Digital</strong> TecbnlcalJournalNo. 3 <strong>September</strong> 1986


New Productsprogramming environment and other networkingimplementations on the ULTRIX system. Suchcompatibility required that it be easy to portapplications that used other protocols, such asthe transmission control protocol (TCP) and theinternet protocol (IP) to use the DECnet system.Also, it should be possible to write applicationsthat would run over the DECnet and other protocolsat the same time. We felt that the DECnet­ULTRIX software should perform at least as wellas the TCP jiP implementation.Significant Design DecisionsFor several weeks at the project's start we examinedalternatives for the basic design. Two majorones were considered for the basis of the networkenvironment. The first was to extend the"socket" interface from the ULTRIX system,which had been developed as pan of Berkeley4. 2BSD for the Defense Advanced Research ProjectAgency (DARPA) TCP jiP project. A socket isan addressable end point of communicationswithin a process, directing data to a similarsocket in another process. This socket interfacehad many of the functions we needed, althoughsome additions would be required. It had a disadvantagein that the socket environment wouldbe difficult to port to another variant of theUNIX software, should we eventually decide todo that.The second alternative was to build a versionof a communications executive on the ULTRIXsoftware that would isolate the protocol modulesfrom depending on the operating system.1 Thisapproach had been used successfully in anotherproduct set and had the primary advantage ofmaking more of the implementation portable.For example, this alternative would make it easierfor us to port the DECnet-ULTRIX software tothe UNIX System V software.Our final decision was to implement theDECnet-ULTRIX software with the first alternative,using the 4.2BSD interprocess communications(IPC) mechanisms. This alternative providedthe most compatible interface with otherprotocols and took advantage of the servicesalready provided by the IPC code in the ULTRIXkernel. We knew that the socket interface wouldhave to evolve to support other protocols, suchas ISO transport, and that we could provide someleadership in managing its evolution. In the shortterm we would provide extensions since theDECnet system requires op(ions having no equivalentin the IPC socket interface.We also had to find ways to present thoseoptions to users without extensively modifyingthe IPC routines in the ULTRIX kernel. Modifyingthe kernel's IPC code would require changesto other protocol implementations and reduceour ability to port the DECnet code to other4.2BSD-based systems. In pmicular, a way had tobe found to allow a server process to reject arequested connection. We also had to supportDECnet's ability to pass user-supplied data withconnect, accept, or reject operations. The IPCinterface in 4.2BSD provides no means for programsto pass data or acces control informationwithin a connection request. Therefore, nomeans existed for a program to decide that arequested connection should be rejected. Thislimitation was not acceptable for a DECnetimplementation because certain applicationlevel protocols in the DNA structure depend onconnection data for versin coordination andaccess control.Another weakness found in the existing4. 2BSO mechanisms was in the area of networkmanagement. One of DECnet's strengths is itsmanagement and control functions provided bynetwork management tools across nodes within anetwork. 2·3 Using a single command interfacecalled the network command program (NCP), auser may examine and ch'ange the state of keyparameters on any system within his network. Inaddition, he may examine counters describingnetwork activity and errors. Many conditionsoccurring on systems within a network cantrigger the generation of events or notificationmessages. These messags can be directed toconsoles, files, and programs anywhere in thenetwork.To implement these network management features,we had to add program-level access tochange parameters and to read counters andother information kept in the ULTRIX kernel.The network device control needed an especiallyIlarge number of changes. :Berkeley 4.2BSD providesa very limited set of controls over networkinterfaces. These controls are insufficient to supportthe functions of DECnet network management.In particular, the DECnet software has tobe able to turn interfaces on and off, gather counterskept in the device, ad enable and disable·multicast addresses,<strong>Digital</strong> TecbntcaljournalNo. 3 <strong>September</strong> 1986101


The DECnet,ULTRIX SoftwareComponents of the DECnet-ULTRIXSoftwareProgramming InterfaceAs mentioned earlier, we decided to base theprogramming interface in the DECnet-ULTRIXcode on the 4.2BSD interprocess communicationsfacilities, which are modeled on the socketinterface. Operations on sockets are similar tooperations performed on "logical units" or "filedescriptors," which direct IjO operations to afile or another device. Programs make systemscalls to create, bind names to, connect to, sendand receive data over, and destroy sockets.The IPC interface in 4. 2BSD is designed sothat sockets exist in specified communicationsdomains. Sockets within a domain share commonproperties, such as their naming scheme, andmay communicate only with other sockets in thesame domain. To implement the DECnet-UtTRIXsoftware, we had to add the "DECnet" communicationsdomain to the existing Internet and UNIXdomains. The basis of this support was the additionof new modules implementing the DECnetprotocols (at OSI levels 2, 3, and 4). These moduleswould be linked into the ULTRIX kernelwhen the DECnet domain was installed. Theyallow programs to create sockets using the DECnetnetwork services (NSP), routing, and Ethernetdata link protocols to communicate.Within the DECnet domain, two types of socketsare provided: stream, .and sequenced packet.Stream sockets provide a bidirectional, reliable,and flow-controlled stream of data between twoprocesses. Sequenced packet sockets providethese same features while preserving the messageboundaries of the data as presented to: thesending socket interface. All existing DECnetapplications protocols use the sequenced packetinterface because the message boundaries areused to indicate the lengths of data within messages.The stream socket interface was providedto facilitate porting applications from the otherUNIX communications domains to the DECnetdomain. In stream sockets,, the data flowingthrough the stream must be self describing. Inthat way the applications programs using thestream know how long each data element is withoutrelying on message boundaries. Data deliveryis based on buffering and flow control considerationsrather than preserving information aboutthe way the sender presented the data to thestream.We had to provide many supporting routinesin addition to the DECnet protocol code linkedinto the ULTRIX kernel. Those routines are modulesarchived in the standard C-language libraryat DECnet installation time. They provide accessto the DECnet node and object databases,address-conversion routines, and several routinesproviding a simplified programming interface tothe kernel socket routines.Kernel ChangesOur goal was to minimize the number of kernelchanges required to support the DECnet system.All the new functions that reject connections andpass data or access control information on connectionrequests were implemented using theexisting "setsockopt" (set socket option) and"getsockopt" (get socket option) system calls.We modified the ULTRIX kernel to allow thosecalls to be dispatched to domain-dependentcode. That was something the 4.2BSD designershad documented but not fully implemented. Wealso increased from 112 to 1024 bytes the maximumamount of data that could be passed acrossthose interfaces. In that way we could accommodatepassing all the access control information ina single request.We found several bugs in the kernel supportfor this type of socket. Therefore, this DECnetimplementation appears to be the first networkingdomain to support sequenced packets. Allthese bugs were fixed in version 1. 2 of theULTRIX-32 software.Most of the kernel changes were made in thenetwork device drivers. The initial release of theDECnet-ULTRIX software was to act as a nonroutingnode on the Ethernet. Therefore, we wereconcerned only with the <strong>Digital</strong> Ethernet interfacesand the DEUNA and DEQNA networkadapters. As written, the drivers for those devicessupported only the internet protocol (IP) usedby TCP for routing. They had explicit informationabout the IP protocol types coded into thedevice interrupt routines. We also wanted to addsupport for additional protocols (e.g., the LocalArea Transport, IAT, and maintenance operationsprotocol, MOP) at a later date. Therefore, weadded kernel routines that could be. called fromany Ethernet driver. Those routines:dispatch todomain-dependent routines when a message is tobe transmitted or a new message is received. Wealso added a number of I/0 control (ioctl) functions:to those drivers. Those functions allow102<strong>Digital</strong> TecbnlcalJour.nalNo. 3 <strong>September</strong> 1986' .


New Productschanges to the physical address of the hardwareinterface, enable and disable the reception ofmulticast messages, and control more extensivesupport for device counters than had previouslybeen present.Object SpawnerWe thought that one area could be greatlyimproved over the standard socket support in the4.2BSD standard: the invocation of server, or"daemon," processes. When a client programconnects to a server program, software on thespecified node has to decode the address andinform the correct target program that it has apending connection. Calls from the 4.2BSDsocket kernel support that software in a wayrequiring all possible destination processes to berunning and listening for connections. This supporthas several bad effects. First, each of thoseservers consumes memory and slots in the processtable. Second, writing a new server processis more difficult since each process has to issuemultiple system and library calls to receive andbind its address to a socket.To solve these problems on the DECnet­ULTRIX software, we implemented an "objectspawner," which creates a socket to which theprocess binds a special address. That addressinforms the DECnet code in the kernel thatthe spawner should be given the connectionrequests for which no other process has declaredan interest. With this mechanism the existingmodel of server process is still supported and canbe used as desired. A process may choose to createits own socket and listen for connections. Itdoes.that if it wants to handle multiple socketsper process or to decrease the connection processingtime by the time required to create a newprocess and execute a file.Using the DECnet object spawner greatly simplifiesthe writing of a new server and providesseveral useful services. A new .server has to bedefined in the DECnet object database by usingNCP. Defining a server involves specifying itsaddress and the file that should be executedwhen a connection for the server arrives. Additionalparameters indicate the type of socket thatshould be created for the server (stream orsequenced packet) and the default user accountto run under if no access control information issupplied by the '.client process. The spawnerauthenticates ever.y.:t:hing for the server and executesthe process in the context of the specifieduser account. Once the process is executing, theserver simply needs to read and write from standardinput and output, set ;up by the spawner tobe directed to the created ocket.Network ManagementThe user interface to network management isprovided via NCP, which on the ULTRIX systemaccepts the same command syntax as that on allother DECnet systems. NCP communicates withthe network management listener (NML) , bothon the local system and on remote systems, toexecute management commands. Local commandscause NCP to con;tmunicate with NMLusing a UNIX ''pipe''; remote ,commands are executedthrough DECnet sockets. NML controls themanagement databases, implemented mostly asfiles with some parameters stored in the kerneland accessed through a special DECnet socketinterface.The access methods and file organization forDECnet databases are quite different from thoseprovided by the 4.2BSD TCPfiP implementation.The TCP fiP databases are organized as a set offiles constructed and mo d ified using any standardtext editor. Those files contain the hostname-to-address mapping and the service nameto-addressmapping. Program access to thosefiles is supported only for read operations. Thislimitation was unacceptable for the DECnet databases,which require full read and write accessusing the NCPfNML programs. NCP supportscommands to add and modify entries for manyDECnet entities. Moreover; the DECnet databasesmust support networks cbntaining many thousandof nodes.We explored several alternative ways of structuringthe databases to provide such programaccess to support write operations. Eventuallywe chose a file format organized as a simplesequential binary file. Reading an element fromthe database involves first allocating enough virtualmemory for the entire file. Then the file isread into virtual memory,' and a linear search isperformed for the desired element. Writing anIelement into the file involves reading the entirefile into the allocated virtual memory. Then thefile is .searched for the :position of the newelement, which is written to the file. Finallythe remainder of the existing portion of thefile is written into its new position in the file.While this "brute force" method was not particularlyelegant, its performance and reliability<strong>Digital</strong> Tecbnlcal]ournalNo. 3 <strong>September</strong> 1986103


The DECnet- ULTRIX Softwarehave proven to be very acceptable, even withextremely large databases. Other methods relyingon indexed and hashed file access proved tobe far more complicated than their marginal performancebenefits warranted.One idea we examined during the developmentof the DECnet-ULTRIX software was tobuild a new network management database thatcould include entries from TCPfiP, DECnet, andother protocols. This idea was abandoned for tworeasons. First, we found that existing calls to theTCP flP database routines did not contain all thenecessary parameters required to support networkaddresses of more than one format. Sinceone project goal was to leave the existing networkprogramming interfaces unchanged, thisidea made impossible the adding of new parametersto those function calls. Second, we felt thatcreating new routines that were to be linked intocustomers' programs would require a significantlydifferent database format, thus requiringthe relinking of existing TCP flP applications.We deemed this relinking to be unacceptable.What we did do, however, was to ensure thatthe DECnet and Internet database routines werecompatible in their naming and calling conventions.In that way a later release could changeboth sets of routines to be "stubs" that calledinto a common base of supporting routines. Weintend to explore this concept further when weadopt a name server-based mechanism for storingcertain network management information.File TransfersIn a DECnet system, file access operations areperformed using the data access protocol (DAP) .File access uses a clientjserver model in whichthe client program contacts a server program toaccomplish some task specified by a user. DAPsupports most of the common file system operations,such as reading, writing, deleting, and listingthe names of files. For the first release of theDECnet-ULTRIX software, we decided that DAPclient operations would be implemented using anew set of file operation commands called the d(for DECnet) commands. They are similar in conceptto the 4.2BSD rep command for remotecopy. The d commands are as follows:dcp - for DECnet copy files (as in the UNIX"cp" command)dis -for DECnet list file directory information(as in the UNIX "Is" command)drm - for DECnet remove files (as in theUNIX "rm" command)dcat - for DECnet type files (as in the UNIX"cat" command)We decided to provide only command-levelaccess to DAP file operations to shorten thedevelopment time for the DECnet-ULTRIX project.The d commands were implemented using aset of file access routines with a calling conven-­tion very similar to the normal C-standard 1/0routines (stdio). We decided not to make thed routines available to customers. We felt thatnetwork file access should be transparently availableto all programs, not just those using a specialset of 1/0 routines. Making the routinesavailable would require extensive changes to theULTRIX kernel, something not possible givenour tight development schedule.The server side of the DAP implementation is afile access listener (FAL) program that is invokedusing the standard DECnet object-spawningmechanism to handle user requests. FAL is astraightforward program that can sometimes fallshort of what users expect it to do. In fact thebiggest challenge we faced in implementing theDAP protocol on the UL TRIX system was to meetthe expectations of both UL TRIX and remoteoperating system users concerning what constitutesreasonable behavior. The DAP protocol hasmany options, but each DAP implementationincorporates a slightly different dialect. Theseslight differences exist because each operatingsystem's file operations are different. Each systemmust map its own way of performing thoseoperations into the DAP operations. Writing eachside, the client and the server, of a new DAPimplementation presents different problems.These problems are exacerbated when onerequires also that no .existing DAP implementationbe changed to work with the new implementation.The client side of DAP drives the fileoperations; the server side is passive, performingonly the operations requested. Most problemsoccur because the UL TRIX system has noenforced record structure within its files, whilemost of <strong>Digital</strong>'s other operating systems performtheir file access using a record orientation. TheULTRIX client code, as implemented in thed commands, cannot simply request other DAPservers to supply data in ULTRIX record format(stream). Instead, the d commands must interpretthe record formats of those other operating104<strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 1986


New Productssystems and convert data to and from the formatthat ULTRIX users expect.Achieving this capability involved addingknowledge to the commands by · programmingseveral different record formats and attributes. Inmost cases the data is automatically convertedfor the user into the form desired. For cases inwhich that is undesirable, a means is providedfor the user to bypass that conversion. TheULTRIX FAL program must also perform dataconversions even though DAP, as a server, has nosuch responsibility. FAL is forced to convert theULTRIX format stream into the appropriate variablelength format files of the other operatingsystem so that it need not be modified to workwith the DECnet-ULTRIX code.Remote Terminal AccessIn our planning for the initial release of theDECnet-ULTRIX software , we decided not toinclude remote terminal access because itrequired too much development time . Once thebasic networking code ran in the kernel, however,we easily modified the 4.2BSD remote terminalaccess programs rlogin and rlogind to runover a DECnet system. Those programs provideULTRIX-to-ULTRIX terminal access. Later, withminor changes to the protocol , we used themodified rlogind (now called dtermd) to advertiseto other DECnet systems that it could communicatewith the TOPS-20 remote terminal protocol.Using this mechanism we provided accessto the ULTRIX system from non-ULTRIX DECnetsystems that had previously implemented supportfor the TOPS-20 software . This capabilityproved so useful that we decided to include fullremote terminal support in DECnet-ULTRIXVersion 1.0.In the DECnet system, remote terminal accessoperations are currently performed using thecommand terminal (CTERM) protocol. Remoteterminal access uses a hostjserver model inwhich the server (which controls the physicalterminal) contacts the host to request access tothe remote system. Once that connection hasbeen established, the host controls the terminalthrough the CTERM protocol.The ULTRIX system supports a "pseudoterminal"driver that allows a program to controlother programs through what appears to be anormal terminal interface. This control allowsthe daemon program ( dlogind) on the host toprovide a standard interface to users who areremotely logged in. We did, however, encountersome problems trying to use this capability.The CTERM protocol exports terminal I/0requests from the host to server, which executesthem, thus reducing! the host processingload. The pseudoterminal interface providestransparent buffering berJ.een the controllingprogram and the programs controlled. In thatway the controlling program never knows whenanother program has issued a read request; therefore,the controlling program cannot know whento ship a read request to the server. Fortunately,the protocol supports a notification function thatthe server sends to the host !if a user types a characterand there is no outstnding read request.Using this function we allowed the server toissue a "pseudoread" request when the firstcharacter is typed. Usually, the request is for afull line of input, thus allowing the server to performcharacter interrupt processing and localcharacter editing.Using the remote terminl access protocol, aterminal can connect logically to a remote systemhaving very different ontrol conventions,such as control characters and line terminators.For this reason the server program ( dlogin) disablesall special character processing by the localterminal driver. The program then processeseach character individually to perform any functionsrequested by the host ystem. A two-charactersequence (by default a filde [ -] followed bya carriage return) is reserved to allow entry to alocal command mode (the first character may bechanged by a command line switch) . This localcommand mode provides access to the shell onthe local system. This mode also provides commandsto log in the terminal session to a file andto suspend or terminate the current remote terminalsession.MailOf all the fu nctions in the bECnet-ULTRIX software,mail was the easiest to implement. Themail system included with the ULTRIX systemwas already quite sophisticated. This mail systemsupports multiple mail protocols and addressformats with a central mail program named sendmail.Sendmail is driven by a configuration filethat can be tailored on eath ULTRIX system todefine new address format and mail-forwardingrules. We decided to add upport for the mostcommon DECnet mail protocol, called mail- 11,supplied with DECnet-VAX and other DECnetjI<strong>Digital</strong> Tecbnlcal]ournalNo. 3 <strong>September</strong> 1986iIJ.105


The DECnet- ULTRIX Softwaresystems. Supporting the mail- 11 protocol was asimple matter of writing a mailer program adheringto the sendmail interface and speaking themail- 11 protocol . Once this program was written,we modified the sendmail configuration fileto handle DECnet mail addresses properly so thatthe DECnet mailer would be invoked- when necessary.Only a few minor problems were encountereddealing with different ways to parse mailaddresses and different comments contained inmail addresses between VMS systems and theULTRIX sendmail program.With this new capability, ULTRIX systems cannow act as mail gateways between DECnet networksand any other type of mail network supportedby UNIX systems.DEC net- UL TRIX PerformanceOne original goal for the DECnet-ULTRIX softwarewas to provide a level of performance similarto that of the TCP jiP domain. In general, wemet this goal. Both rep and dcp transfer files atapproximately the same rate, and while rep isslightly faster,. it requires a larger percentage ofthe available CPU time.The following measurements were takenbetween two VAX- 11/780 systems on a privatenetwork:dcp average filetranfer raterep average filetransfer rateftp average filetransfer rateDECnet maximumdata transfer rateProject Management51 kilobytes (KB)per second51KBper second27KBper second1200 kilobits (Kb)per secondThe DECnet-ULTRIX project began in early1984. The Berkeley 4.2 version of the UNIXsoftware had just been selected as the basis for<strong>Digital</strong>'s UNIX software for the VAX system. Version1.0 of the ULTRIX system was well on itsway to completion. Our task was to define theDECnet-ULTRIX project, build a project team,and deliver a Phase IV implementation for theULTRIX system in the shortest possible time.At the start, each team member had a lot ofDECnet and software development experience,but very little UNIX expertise. As we learned theintricacies of the UNIX software, we discussedmany of our development ideas with peoplefrom <strong>Digital</strong>'s UNIX Engineering Group. Wedecided our first project should be to implementan Ethernet end node using the DNA Phase IVprotocols. This implementation would includesupport for a programmable user interface, mail,and file transfer. Our decision to add a remoteterminal capability was made later. Experiencewith other DECnet implementations had shownus that these functions would be both necessaryand sufficient to satisfy the majority of mostusers' needs.We built prototypes whenever possible to getfunctions working quickly. These prototypestested the viability of the interfaces and variousimplementation approaches. Often we exploredseveral different designs before choosing onethat worked best as a prototype.These shortcuts can be very valuable, providedthey are followed by a thorough review of thework done. On this project this method workedvery well. The ULTRIX system ran as a DECnetend node in late summer 1984, after which timeseveral UNIX utilities were quickly converted touse the DECnet software.Our past experience helped us to gauge theamount of work required in each developmentarea. That experience allowed us to start workearly on network management since previousimplementations had shown that area to be oneof the largest bodies of work. Throughout theproject we succeeded in keeping work on eachcomponent from being blocked by dependencieson other components. With tight project managementwe put the DECnet-ULTRIX softwareinto field test less than one year after the projectbegan.The entire product was written in the C programminglanguage; a large amount of codewas later transported to other projects, notably toDECnet-DOS. Our experience transporting theimplementation improved the quality ofthe code since many components were testedusing additional interfaces and different codereviewers.SummaryThe DECnet-ULTRIX project provided manychallenges. The most constant one was how tobuild a product that appeared similar to other<strong>Digital</strong> products, yet acted like a natural extensionto the UNIX base upon which the productwas built. The compromises required to meet106<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New Productsthat challenge forced us to address many areas,from command formats to the structure of thewritten documentation.The DECnet-ULTRIX project met all itsgoals for functionality, performance, and schedule.The completed product was deliveredto <strong>Digital</strong>'s Software Distribution Center only16 months after the project's inception. TheDECnet-ULTRIX software will be followed byother releases, thus adding functions and followingthe migration of the DNA strategy to Phase V.Since the ULTRIX system supports TCPjiP, theaddition of DECnet has provided a natural basefor a DECnet-to-TCP jiP gateway. While notbeing the primary focus for this product, theessential fu nctions required for a gateway arenow present. This fact is significant becauseTCP/IP represents a de facto standard for communicationsprotocols in the UNIX community.The DECnet-ULTRIX product is thus able toprovide a level of integration for the UNIX productsof other vendors into <strong>Digital</strong>'s computingenvironment. Future standards in all areas of OSIwill provide a better degree of integration for theDECnet system and the UNIX community. Untilthey are widely implemented, however, DECnetcapabilities on the ULTRIX system provide avaluable bridge between the two environments.References1. J. Forecast, J. Jackson, J. Schriesheim, "CommunicationsExecutive Implements ComputerNetworks," Computer Design (November1980):'71-75.I2 .. N. La Pelle, M. Segar, arid M. Sylor, "The Evolutionof Network Management Products,"<strong>Digital</strong> <strong>Technical</strong> journal (<strong>September</strong>1986, this issue) : i 17128.3. M. Sylor, "The NMCCjDECnet MonitorDesign,'' <strong>Digital</strong> <strong>Technical</strong> journal (<strong>September</strong>1986, this issue): 129-1 41.AcknowledgmentsThe authors wish to acknowledge the help of themembers of the DECnet-ULTRIX developmentteam, who maintained a high degree of energyand enthusiasm throughout the project. Thesemembers were Kim Buxton, Ed Ferris, KarenGillin, and Bill Spencer. Thanks are also due toSteve Seufert, Faye Allen, Marie Rowntree, TerriBuckley, Pat Nelson, and those individuals in theULTRIX Engineering Group who worked with usthroughout the project.<strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 1986107


Peter 0. MierswaIDavid J. MittonMartha L. Sp enceThe DECnet-DOS Sy stemThe DECnet-DOS system is an implementation of the <strong>Digital</strong> NetworkArchitecture standard fo r both <strong>Digital</strong>'s Rainbow personal computersand those of IBM Corporation. This system provides all the services associatedwith a DECnet implementation. These include a choice of communicationtechnologies, adaptive path routing over complex topologies,and network monitoring and management. DECnet-DOS also supp ortstask-to-task programming, remote file transfer and access, remote terminalservices, and network mail services. Those tasks are all performed ona family of low-speed, small-memory processors.Over the past few years the low cost and availabilityof appl ications for personal computers(PC) have been enticing an increasing number ofbusinesses to acquire them. At first, each PC userworked with his own computing resources in astand-alone manner. Eventually, however, theseusers found they wanted to share programs, data,and messages with each other. They also wantedto take advantage of the databases, processingpower, and larger applications on the large computersystems in their companies.Within <strong>Digital</strong> Equipment Corporation, thereis an engineering group responsible for theimplementation of DECnet software on <strong>Digital</strong>'ssmall systems. This group believed that a DECnetimplementation for personal computers couldeasily satisfy these users' desire for data and programsharing. In 1984, this grop initiated a projectto implement the <strong>Digital</strong> Network Architecture(DNA) on personal computers that used the·MS-DOS operating system.The implementation of DECnet software, amature, layered communications architecture,on the personal computers of both <strong>Digital</strong> andIBM Corporation presented a number of interestingproblems. The team had to work with asynchronousand Ethernet communications controllersand a number of different, relatively slowprocessors, all built by other companies. Theyhad to work within the confines of the MS-DOSsystem, a small operating system with few systemservices capable of supporting multiple communicationtasks in the background. Moreover, theresulting product had to be compatible withthousands of application programs already writ-ten by hundreds of different companies. Andbecause of the volatile nature of the PC business,this product had to provide a wide range of basicnetwork services and layered applications. Sinceproducts for PCs were being rapidly introduced,our goal was to design, implement, and test thisproduct in a fairly short time period.It was clear to the project team that the onlyway to meet the time-to-market goal was to followone strategy. First, the project had to adherestrictly to the DNA architecture, thus eliminatingany temptation to implement new and uniqueprotocols not supported by other existing implementations.Second, the project had to borrowsoftware freely from other products that hadbeen or were being developed.This paper presents a unique model for therapid development of a specific product by utilizingwork done on many other projects. Thecombination of original software and borrowedcode, all within the framework of DNA, allowedus to introduce the DECnet-DOS system in a relativelyshort time period. The overall problemswe encountered and how we solved them arefirst presented within the context of generalissues. Then each layer of the architecture isused as a springboard from which to discuss particularproblems encountered in that layer.General Development IssuesCoding Style and StandardsThe time-to-market goal dictated that both codingand debugging had to be done as quickly aspossible. We immediately agreed upon using a108<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New Productshigher-level langage and fairly strict codingstandards to shorten our development time. Wealso had to initiate a search for code fragmentsalready existing within <strong>Digital</strong> that were alsorequired in our product. We felt that incorporatingrelated code could also greatly reduce boththe development and debugging time.The MS-DOS environment is quite similar tothe UNIX environment. Many C compilers basedon MS-DOS machines offer libraries similar tothe one on the UNIX system. Therefore, it wasquite fortuitous that the DECnet-ULTRIX system,<strong>Digital</strong>'s DECnet implementation for the ULTRIXsystem, had just been completed. It was writtenalmost entirely in the C programming language.We felt that some of the DECnet-ULTRIX codecould be used successfully in DECnet-DOS. Ourstrategy was to do the following tasks:• Write as much code as possible in C. Do notpreclude the use of assembler languageif required to access devices or servicesunavailable in C or to reduce execution timewhere necessary.• Use common coding and style practices forall code.• Adopt the DECnet-ULTRIX programminginterface. The programmer's access to networkservices is not part of the architecturebut is specific to the operating system andthe DECnet implementation.• Port code from the DECnet-ULTRIX softwarewhenever it is applicable and easier thanwriting original code.• Include trace facilities in the basic driverand all utilities as part of the design.Training IssuesAt the start of this project, few engineers in<strong>Digital</strong>'s Network and Communications Grouphad extensive experience with MS-DOS internalsor C programming. To prevent a certain delay ofseveral months while people were trained, theproject team decided to pursue two avenues ofexternal assistance: temporary in-house consultants,and external engineering. Three consultantswith MS-DOS and C programming backgroundswere employed, being gradually replacedby <strong>Digital</strong> employees as they completedtheir training. An arrangement was made with theComputing Resources Department of the Univer-sity of Texas Health Science 1 Center, San Antonio,to implement the file transfer utility. They hadexpertise developing an in-house DECnet implementationon another small system.Background Task DesignThe MS-DOS system is a single-task operating system;no services are providd to support the concurrentexecution of two tasks. This fact raised aproblem because a number ! of the requirementsfor the DECnet-DOS system! demand the concurrentoperation of some network tasks with theuser's current application. Such tasks include the.transmission and reception of periodic routingand line confidence messages, and the concurrentoperation of multiple connections. Moreover,a particular constraint was that applicationprograms have to run as wrtten, without codingchanges, while the network! is active. We simplycould not require that the thousands of applica-1tions currently on the market for MS-DOS-basedsystems be changed.To solve this problem, we devised a schemethat would allow network tasks to run either atperiodic intervals or from an interrupt from acommunications controller. This design envisionedthat network tasks wre interruptable andcould run in the background completely transparentto the application prgram running in theforeground. This scheme I had to be designedquickly and work the first ! time for the projectteam to complete the product on schedule.Unfortunately, the design of task scheduling inthe DECnet-ULTRIX software was incompatiblewith our scheme. Therefore, that portion of theDECnet-ULTRIX code could not be ported tosolve this problem. However, the interrupt architectureof the PDP- 11 syst.em and those of theIntel processors in the target machines are verysimilar. Therefore, the intetrupt design from theDECnet-RSX software thatlruns on the PDP- 11system could be used for the , interrupt functionin DECnet-DOS.The DECnet-RSX CPU scheduler uses a set ofwork queues in which request packets calledcommunication control blocks (CCB) arequeued for processing. Any data buffers associatedwith the requests are pointed to by fieldswithin the CCBs. ;For example, if a messag is received from thecommunications controllet, a CCB will be createdby the device driver fr the controller. Thereceived data is placed in a. buffer pointed to by<strong>Digital</strong> TechnkalJournalNo. 3 <strong>September</strong> 1986109


The DECnet-DOS Systemthat CCB, which will be placed in the workqueue of the routing layer. This layer, when run,will process the buffer pointed to by the CCB. Iffurther processing is necessary, this layer willplace the CCB in the work queue of another networkprocess.This scheme would work perfectly well if theCPU scheduler were designed to scan the workqueues both at periodic intervals and after controllerinterrupts. Then the scheduler would dispatchthe network tasks to empty the workqueues. And all these actions must happen sothat they are transparent to the current foregroundapplication.To fulfill those requirements, we designed amemory-resident scheduler that performs allthose actions by using a technique called interruptshelling. To shell an interrupt, the schedulerfirst records the address of the currentinterrupt handler, then replaces it with thescheduler's own address. Thus when the interruptoccurs, the CPU state and the interruptreturn address are saved, and the scheduler,instead of the original interrupt handler, is calleddirectly.Upon entry for an interrupt, the schedulersaves the current context of the system and simulatesan interrupt to the original interrupt handler.When the interrupt processing completes,the interrupt handler will return to the scheduler- not the foreground task. Therefore, thescheduler now gains control. The interrupt processingis now complete and the time-criticalprocessing has finished. The scheduler can nowenable interrupts and examine all work queuesfor tasks that need to be run. After all tasks havebeen run, the scheduler finally returns to theinterrupted foreground task.Two examples will help to make this processmore clear.First, consider an interrupt from an Ethernetcontroller that signals the successful reception ofa message from some other node in the network.When the controller causes the interrupt, thescheduler gains control with interrupts disabled.The scheduler saves the return address and stateand dispatches to the "real" interrupt handler ofthe Ethernet controller. The interrupt handlerperforms the following series of actions:1. Analyzes the interrupt2. Determines that a message has beenreceived3. Allocates a receive buffer4. Copies the message from the Ethernetcontroller to the receive buffer5. Resets the controller to receive anothermessage6. Calls a subroutine to insert a CCB pointingto the message onto the work queue inthe background network processThe Ethernet interrupt controller then dismissesthe interrupt, and control returns to the scheduler.The scheduler now enables interrupts andscans the work queues for additional work.The CCB containing the received message isfound on the work queues and the routing layeris called to completely process the message.When all work queues with immediate work areempty, the scheduler finally returns to the originallyinterrupted code in the user's applicationprogram.The second example deals with handling aninterrupt from the clock. In this case exactly thesame code path as the one in the first example isfollowed. The clock interrupts and the schedulergains control with interrupts disabled. It savesthe return address and state and dispatches tothe "real" clock interrupt handler. This handlerwill update the date and time and dismiss theinterrupt. Control now returns to the scheduler,which enables the interrupts, scans the timerqueues, and dispatches any process whose timerhas expired. When all such processes havebeen completed, the scheduler returns to theoriginally interrupted code in the applicationprogram. ·Using the scheduler to handle interrupts andcontext switching allows network processing tobe performed in the background while an MS­DOS application is running in the foreground.The interrupt shell ensures that a minimumamount of code runs with interrupts disabled.The background process scheduling ensuresthere is no network performance loss due to apause between message receipt and message processing.Overview of the DNA ArchitetureTable 1 lists the layers of the ISO model for datacommunications, along with the correspondingDNA layers and the appropriate DECnet-DOScomponents within each layer.110<strong>Digital</strong> TecbnicaiJournalNo. 3 <strong>September</strong> 1986


New ProductsTable 1Data Communication LayersISO LayerApplicationPresentationSessionTransportNetworkData linkPhysical linkDNA LayerUser/network managementNetwork applicationSession controlEnd-to-end communicationRoutingData linkPhysical linkDECnet-DOS QomponentsIIProgramming libraryJob spawner INetwork control program (NCP)Network test utility (NTU)Network file transfer (NFT)Virtual terminal service (SETHOST)Virtual disk/printer service (NDU)Network mail (MAIL)File access server (FAL)SESSIONNSPROUTINGIAsynchronous :DATA LINKEthernet DATA LINKAsynchronous controllersEthernet controllersIIData Link ServicesAsynchronous Data Link LayerThe <strong>Digital</strong> Network Architecture standard speci- _fies a protocol providing a reliable data communicationspath between two processors oversynchronous and asynchronous serial communicationlines. This protocol is the <strong>Digital</strong> DataCommunications Message Protocol (DDCMP) .The asynchronous data link layer providesDDCMP protocol processing and device driversupport for the asynchronous controllers contained.inthe PCs.We found that no existing software could beborrowed for the DDCMP protocol modules.However, existing DDCMP software programsfrom other products were used as models to constructour own modules. We also had to designand code all device drivers for the various asynchronouscontrollers. At first we were not sureexactly how the asynchronous controller chipand the interrupt controller chip workedtogether. Reading the specifications from thechip manufacturers along with the documentationfrom the makers of the controller boardsresolved any questions we had. The code wethen developed worked properly at lowerspeeds; at 9600 baud, however, we found thatcharacters were being lost during reception.After calculating the bytes per second at9600 baud and the instrutions per second onthe lower-speed PCs, we raiized that very fewinstructions could be exetuted between eachreceived character. In this ase the advantages ofI .coding in a higher-level !language were outweighedby other considerations. After carefullyrecoding the interrupt handler for received charactersin assembler language, we reduced but didnot eliminate the character loss. Using debugtracing of the interrupt stack, we discovered thatthe PC BIOS code handling the clock interruptscould leave the interrupt; system disabled forlong periods. Changing this clock interrupt codesolved the character-loss problem.IEthernet Data Link LayerIThe Ethernet data link layer provides buffer managementservices, transmfts and receives messages,and dispatches the received messagesbased upon their protocol types. The goal forDECnet-DOS included support for a number ofEthernet controllers. Unfortunately, the code forthe device drivers is often the hardest to designand debug.Our search for existing code led us to two separateengineering groups lwithin <strong>Digital</strong>. Bothgroups had already written device drivers for PCbasedEthernet controllers.! We decided to use a<strong>Digital</strong> Techntcal]ournalNo. 3 <strong>September</strong> 1986Ill


The DECnet-DOS Systemdata link layer for buffer management andreceived-message dispatching that was commonto these drivers. We also borrowed several otherdevice drivers, all having consistent callingsequences. As a result, our team had to write thecode for only one device driver; the code for theother device drivers and the data link layer wasprovided to us complete and partially tested.Network Layer Services (Routing)The modules that perform routing functionshave well defined inputs and outputs that arealmost entirely independent of the operating systemtype. Messages arriving from the NSP layermust be passed to lower layers, depending uponinformation stored in the routing database. Messagesalso arrive from the data link layer and mustbe passed to higher layers, depending uponinformation stored in the routing database.Since the code is relatively independent of thesystem, we were able to use the routing modulesfrom the DECnet-ULTRIX system with very fewchanges.Transport Layer Services (NSP)The NSP layer is the one in which logical linksare created, maintained, and destroyed. Linkmaintenance includes all the timing and retransmissionof messages necessary to maintain logicallink integrity. NSP also segments large userbuffers into smaller network buffers and ensuresthat they are reassembled correctly.We studied the feasibility of porting the NSPmodules from the DECnet-ULTRIX system to theDECnet-DOS system. However, the differences inmemory management and process schedulingmade the conversion appear too costly. Therefore,we re'Yrote the modules in the NSP layer ofthe DECnet-ULTRIX software, but retained thesame names and functions.This code was the most difficult to develop.Manipulating dynamic memory, buffers, and timingfor multiple internal tasks is very specific tothe operating system. Code from other implementationsto perform these functions was notvery helpful.In addition, our initial device drivers for asynchronousand Ethernet connections often los'tcharacters or messages. Since NSP maintains theintegrity of each logical link, the retransmissionalgorithms had to be complete and correct veryearly for both low- and high-speed failures.These difficult problems, like many others,were solved by using the algorithms from otherimplementations.Session Layer ServicesThe session layer, where user requests arechecked and dispatched for processing, ishighly dependent on the type of operating system.The single-task environment of the MS-DOSsystem provides no process context identificationor integrity assurance for the user. As aresult, we could not use the traditional design fora DECnet session layer, in which logical linkownership is known by a process code assignedby the system.To design a new session layer, we chose tomake the logical links into system-wide entitiesand retain all information about those links inthe background network process. In that way theidentifiers for logical links would be uniqueacross the entire system. This solution ensuresthe integrity of the logical link database even if auser program creates a logical link and thenexits. One side effect of this design is that anapplication can create a logical link, exit, andthen be run again later to access the existing logicallink. This effect allows the SETIIOST applicationto interrupt a virtual terminal session,return the user to the MS-DOS system to performlocal tasks, and then resume the terminal sessionlater in the same state in which the session wasinterrupted.The actual session interface that we providedwas modeled after the UNIX 4.2BSD networksocket interface as implemented in the DECnet­ULTRIX system. Since the MS-DOS code is similarto the UNIX code, we felt the interface was themost appropriate model to use. Unfortunately,we could not use any of the UNIX code in thisarea, so we had to write our own implementation.By providing the same interface, however,we could easily share network application programswith the DECnet-ULTRIX project.Presentation Layer ServicesNetwork File TransferThe network file transfer utility (NFf) providesfile access services between the PC and remotenodes. NFf supports a number of activities.• Files can be copied in both directions.• Remote files can be displayed on the PC'sconsole.112Dtgital TecbntcaljournalNo. 3 <strong>September</strong> 1986


New Products• Listings of remote directories can also bedisplayed.• Remote files can be deleted.• Remote files can be queued to be printed orexecuted at the remote node.The files to be accessed can be specified usingwild cards and file lists.In addition to these services, NFf is responsiblefor reformatting data if it is copied to or froma remote system with a different file system format.To be a DECnet implementation, NFf had topass strict certification tests that ensured its compatibilitywith all other DECnet implementations.Passing those tests was our single biggesthurdle in this area.The project team again decided to use existingdesigns and code for NFT, the common parserand common message processer being used toparse NFf commands. We wrote the data formattingand protocol modules using DECnet-RSXNFf as a guide. Network 1/0 was done using theprogramming interface library, which providesprogrammers with network access.Using the NFf implementation in the DECnet­RSX software proved to be a wise idea. Thatimplementation had been in use for manyyears; therefore, its algorithms and designwere well tested. The DECnet-DOS NFT implementationwas so successful that it was one ofthe first applications to run in house duringdevelopment.The file access listener (FAL) provides thesame services as NFf but runs on the PC to giveother network nodes access to the PC files. TheFAL utility was begun very late in the project. Asa result we were able to port the completed DECnet-ULTRIXFAL to the MS-DOS system, a taskcompleted in under two weeks. This gave theproject team enough time to add a number ofattractive optional features, such as supportingsimultaneous multiple connections.Virtual Terminal ServiceThe SETHOST utility allows the keyboard andscreen of a PC to emulate a Vfl 00 terminal connecteddirectly to a remote DECnet node. To dothat, this utility must provide not only emulationsupport for the keyboard and screen but also protocol-handlingsupport for the remote terminalprotocol used to communicate with the node.Two protocols are currently used on <strong>Digital</strong>'sproducts to provide remote terminal support:CTERM and LAT. CTERM is layered onto theDECnet software and provides remote terminalsupport to any node in a DECnet network. TheLAT protocol is independent of the DECnet softwareand provides remote trminal support onlyamong the nodes on a single Ethernet.IWe constructed the SETOST utility entirelyout of existing code, the common parser beingused to process SETHOST c'ommands. The softwareto emulate the VT l 00 terminal wasobtained from another engineering group within<strong>Digital</strong> . The handling code for the CTERM protocolwas ported from the DECnet-ULTRIXsoftware with very few changes. The handlingcode for the LAT protocol was obtainedfrom still another engineering group. All network1/0 is done using the. programming interface·library.Virtual Disk and PrinterVirtual device services support disk and printerdevices that are located at remote nodes yetappear to be local to an application program.Our goal was to provide this:service in a transparentmanner so that no chanies had to be made toapplication programs or to the MS-DOS system.Our final design for these services was quitesimple. The services are provided by two components:the network device utility (NDU), and thevirtual device driver. The NDU accepts commandsfrom the user to establish either a virtualdisk volume or a virtual printer device at aremote node. The logical ;link is made to theremote system by NDU, and/the logical link ID ispassed to the virtual deviqe driver resident inmemory. This device driver s written to conformto the standards for MS-DOS device drivers. Itloads at system boot time by the standard MS­DOS-loadable driver technique and accepts standarddevice 1/0 requests from the MS-DOS filesystem. The driver then executes these requestsby performing the equivale.nt data access protocolsequences on the logical link established byNDU.'!The data access protocol 'chosen was the sameone used for file transfer. This choice allowed usto use existing DECnet implementations onlarger systems for the virtual device support.Thus the need to design and implement a specializedprotocol for a numbr of different operatingsystems was eliminated.<strong>Digital</strong> TecbnicaljournalNo. 3 <strong>September</strong> 1986113


Tbe DECnet-DOS SystemNDU and the virtual device drivers were builtfrom three subcomponents:• A common parser and common message processor(described in the next section)• A small library of subroutines that createremote files and open them using the dataaccess protocol (These subroutines weretaken directly from NFT.)• The programming interface libraryNetwork Mail<strong>Digital</strong> provides network-wide mail services for anumber of its systems. These utilities allow usersto compose messages directly within the mailutility or to use an editor or pre-existing text fileas the source . Text can be sent to one or moreusers on a multitude of systems, even using a distributionlist from another text file .The DECnet-DOS mail utility was completed ina short time by combining the common parserwith the mail utility from the DECnet-ULTRIXsoftware. The unique requirements of a smallsystem did present some problems. In a DECnetnetwork, node addressing is done by numericaddresses. However, users often prefer to usenames, which can be more easily remembered.To facilitate that use, each node maintains a databasethat maps names to addresses. Now, sucha database for a large network would be toolarge for a small PC to keep on disk or searchquickly. Yet the PC users may want to send mailto other users anywhere in such a network. Forexample, <strong>Digital</strong> has an in-house network withover 1 0, 000 nodes, any one of which can beaddressed by any other. To solve this problem weimplemented the mail-forwarding feature of themail protocol. That feature allows a PC user tosend a message to an unknown node by asking aknown node to forward the message. Thus the PCdatabase keeps a much smaller list of knownnodes, which is considerably more manageable.The project team originally had a requirementfor receipt and storage of network mail on thelocal disks of the PC. Our investigations showed;however, that the engineering cost of developinga background task that could record the receivedmail on disk was very high.Since 1/0 in the MS-DOS system is singlethreaded, the network software, running in thebackground, cannot perform 1/0 while an applicationprogram is performing it. To overcomethis restriction, our PC mail service automaticallytells the receiver of mail from the PC whichlarge system should be sent the replies.Application Layer ServicesApplication Program CommandParsingThe DECnet-DOS product would contain as manyas ten different application programs, includingone each for file access, virtual terminal support,virtual device support, and mail services; andtwo for network management. Our requirementscalled for common, easily translatable messagesand common command parsing, inCluding characterdelete, line and character recall, and abbreviation.These standards were important becauseit would be very costly to have ten differentengineers coding in ten different ways . Such anapproach, without standards, could have introduceddifferences in the applications, whichwould be difficult for the end user to learn andremember.To avoid this problem we designed and implementeda common parser and a common messageand help processor. The common parser isdriven by a parsing command file that supportsabbreviation, character delete, and line recall.The common message processor is also driven bya message fi le and performs fast look-ups of textstrings and blocks. This design greatly reducedthe time to code and debug the utility programsand made their translation to foreign languagesfairly straightforward.Network Ma nagementThe network management architecture in theDNA architecture re quires access to currentparameters, ·counters, and statistics, all kept inthe volatile database. It also needs the parameters,kept in the permanent database, that will bein effect the next time the network software isstarted. Data in both databases should be accessiblefrom either the local node or a remote node.However, the single-threaded restriction thatthe MS-DOS system imposes on file 1/0 makesaccess to the network management databasesvery difficult. This restriction not only affects themail service, as explained earlier, but preventsDECnet-DOS from providing remote access tolocal network management databases. It also preventsthe background network software fromaccessing disk files. Thus the project team felt114<strong>Digital</strong> TecbnicalJournal·No. 3 <strong>September</strong> 1986


New Productsthat the cost of providing remote access to localnetwork management data was too high in termsof resident memory usage and design complexity.Therefore, the DNA architects granted anexception to the DNA requirement that remotenodes have access to counters and parameterslocal to PCs.Solving the problem of database access by thebackground network task, however, was essentialto the success of our project. Therefore, weadopted the following design. On its first run,the network software performs as a foregroundtask. As such the software first performs initialization,then shifts to the background. Duringinitialization, the network software, running inthe foreground, can safely perform disk 1/0. Atthis time it reads the permanent database, whichestablishes the parameters necessary for runningthe network, such as buffer counts and sizes. Thevolatile database is kept in memory in the backgroundnetwork task. The network control program(NCP) queries the background networktask when access to the volatile database isrequired. Similarly, NCP performs disk 1/0 tothe permanent database when its access isrequired.Node name-to-address translations, usuallyperformed by resident DECnet code, wereassigned to the application layer in the DECnet­DOS software. This shift overcame the backgrounddisk 1/0 restriction. Also, even thoughremote nodes cannot access our local DECnet­DOS databases, they can perform loopback teststo the DECnet-DOS node. For Ethernet configurations,remote nodes can query the data link Jayerfor identification information and data link errorand traffic counters.NCP and the network test utility (NTU) werewritten using the common parsing package andthe programming interface library. The aCtionroutines performing network management andtesting could not be ported from other implementationsbecause of the MS-DOS restrictions.As a result, creating the routines to perform networkmanagement consumed a large part of ourdevelopment effort.Network Programming ServicesAll DECnet implementations make the basicprogram-to-program communication servicesavailable to programmers through some sort ofsystem call or subroutine library. That allowsprogrammers to develop their own networkapplications and services. DECnet-DOS providesan assembler language interface, a C programmingsubroutine library, and transparent MS-DOSfile access services.An assembler languag programmer canperform basic task-to-task ervices by filling aIdata structure with control information, thenissuing a software interrupt that is serviced bythe DECnet process resident in memory. Suchservices include logical link creation anddestruction, message transmission and reception,and status and control .A C language programmer can access the sameservices through a subroutine library. We choseto make this interface c9mpatible with theDECnet-ULTRIX programmer interface. Thischoice decreased the development costs of anumber of our application modules by makingthe DECnet-ULTRIX applications more portable.It also allowed the project team to begin todevelop MS-DOS-specific applications under theULTRIX system. Our customers would benefitfrom the same advantages. iUsing these assembler laq.guage or C services,however, requires a good uhderstanding of logicallink management and networking concepts.Many applications need only one simple connection·to access a file or program on another system.Unfortunately, existing applications oftencannot be easily rewritten to take advantage of·network calls.To solve this problem, we designed servicesfor transparent network aJcess. These servicesmake network access possible for the thousandsof PC applications already written and make thedevelopment of new network applications easier.Two tasks, one for remote file access and onefor remote program communication, are firstloaded into memory. These tasks then take controlof the software interrupt used by all programsfor services offered by the MS-DOS system.Each interrupt made by !an applications programrequesting an MS-DOS service is examined.If the request is a file OPEN or CREATE, the filespecification is also examined. File specificationsthat begin with a double backslash are notvalid MS-DOS fi le names and instead signal arequest for network servics. All other MS-DOSservice requests are passed on to the operatingIsystem for processing. The '!'intercepted" servicerequests are processed by :the memory-residenttasks that mimic the actions of the MS-DOS system.Thus an application to display a file on the<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> ·J986115


Th e DECnet-DOS Systemconsole can access either a local file with thespecification "local.fil" or a remote fi le atanother node.Although a wide range of programmer serviceswas offered to single application programs, thesingle-tasking nature of the MS-DOS system madeit difficult to run a PC offering multiple services.To solve this problem, we added a service to thesession layer . That service allows a single applicationto receive a request for any other service .Using this capability, we wrote a program calledthe job spawner, which receives all requests forservice. For each request, the job spawneraccesses a database for the name of the applicationthat must be run to service that particularrequest. Upon finding the application, the jobspawner runs it to completion and then waits forthe next request.SummaryThis implementation . of DECnet-DOS provides awide range of reliable communications servicesbetween personal computers and larger systems.The development of this software was successfu land completed close to schedule, made pos _siblebyGeneral ReferencesDECnet <strong>Digital</strong> Network Architecture (PhaseIV) General Description (Maynard: <strong>Digital</strong>Equipment Corporation, Order No. AA-N 149A­TC, 1982).J. Forecast, J. Jackson, and J. Schriesheim,"The DECnet-ULTRIX Software," <strong>Digital</strong> <strong>Technical</strong>journal (<strong>September</strong> 1986, this issue):100-107.J. Forecast, J. Jackson, and J. Schriesheim, "CommunicationsExecutive Implements ComputerNetworks," Computer Design (November1980): 71-75.S. Leffler, W. Joy, and R. Fabry, "A 4.2BSD InterprocessCommunication Primer," University ofCalifornia <strong>Technical</strong> Report, Berkeley, California(1983) .S. Leffler, W. Joy, and R. Fabry, "4.2BSD NetworkingImplementation Notes ," University ofCalifornia <strong>Technical</strong> Report, Berkeley, California(1983) .• Strict adherence to a proven communicationsarchitecture• Porting existing designs, algorithms, and codefrom other software projects• Strict, independent certification and performancetest proceduresAn adherence to company-wide architecturesalso ensures that· future communication technologies,both hardware and software, can beeasily integrated with the existing DECnetarchitecture.116<strong>Digital</strong> TecbnicalJournalNo . 3 <strong>September</strong> 1986


Nancy R. La PeUejMarkJ. Segerb:fark W. SylorIThe Evolution of NetworkAlanagent ProducThe management of data networks bas evolved at <strong>Digital</strong> since 1978,although the management of voice networks bas been a more recent phe·nomenon. <strong>Digital</strong>'s first data network management products mnagednetworks of DECnet nodes. Our capabilities now include the managementof diverse data network components by means of several different prod·ucts described in this paper. The integration of these data network man·agement capabilities through a common architecture, user interfaces,management databases, and protocols is a major short-term gal. Theintegration of voice and data network management is a much longertermgoal. The voice management product presented here wiU be 'part ofthat future integration.The size and complexity of networks have beengrowing at an accelerating rate. For example,over the last ten years the size of <strong>Digital</strong>'s internalnetwork has grown from a few communicatingsystems to over 10,000 computer nodes, distributedthroughout 250 sites in 37 countries.We are currently adding about a hundred newsystems per week to this private network. Thisrapid growth has led to a need for more-sophisticatednetwork management capabilities to controlsuch networks. This paper describes thechanging needs of network management, how<strong>Digital</strong>'s products and capabilities have evolvedto meet those needs, and some directions forfuture evolution.Traditionally, networks have come in two categories:data and voice. <strong>Digital</strong> supports manydata network architectures, including BISYNCH,SNA; X.25, Ethernet, and OSI. However, theprimary one supported is the <strong>Digital</strong> NetworkArchitecture, or DNA, which defines our DECnetproducts.The Network EvolutionSome basic network management capabilitieswere added to the DNA architecture early in itsevolution (1978) . These capabilities includedthe manual on-line observation and control ofboth local and remote network nodes. Includedin the DNA design was a network control programthat was to be implemented consistently<strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 1986across all DECnet produCts.1 That programwould allow a network mahager to control theoperation and configuration of the network bymanipulating operational parameters. The programwould also allow him to observe how wellthe network operated by providing current statusinformation, and network traffic and error data.IBasic Management Capabilities fo r theWhole NetworkAs other architectures and protocols emerged,new products, such as X.2,5 and SNA gateways,and local area network (LAN) bridges, neededthe same capabilities to b managed as did theDECnet products. This requirement broughtabout the first major evolutionary trend in networkmanagement. It became clear that DNA hadto be extended to accommodate the managementof connections to these non-DECnet products.Adding Intelligence tO Management·FunctionsThe second evolutionary trend was driven by theincreased size and complexity of DECnet networksand the difficulty in finding qualified peopleto manage them and the: systems within them.This trend led to the need for more-intelligentnetwork management functions to support a centralizedstaff dedicated to managing the network.To partially meet this need, we developed a productthat automates the rponitoring of trafficiIII117


The Evolution of Network Management Productsand other events in the network. This productalso contains many event evaluation functions,which produce statistics made available throughboth interactive and hard-copy reports.Other products have also added intelligence tothe basic network control capabilities of theearly DNA architecture . These products performIAN testing and diagnosis, and X.25 accountingand enhanced protocol-tracing.Vo ice Network IntegrationAn evolutionary trend similar to that in data networkswas also happening in voice networks .Vo ice network users were becoming moresophisticated, requesting capabilities similar tothose seen in data networks. For example, liketheir data network counterparts, voice networkmanagers need the ability to control, optimize,configure, and monitor the network. In additionto collecting management data, users alsorequested its processing to provide managementinformation.At the present time, telecommunications networkmanagement has evolved beyond the scopeof the DNA design. Because of this rapid advance,product strategies have been adopted for telecommunicationsmanagement that identify anumber of directions data network managementproducts could pursue in the future . Specifically,we are expanding our management architectureto allow for the inclusion of additionalnetwork components. We also see the need tointegrate management user interfaces and informationwith other network applications. Thisintegration will support all the business-data andresource-management needs of users.The remainder of this paper covers the evolutionof network management in more detail as itrelates to the development of specific managementproducts. The fo llowing section discussesdata network management as a distributedapplication that provides operational controland observation of a variety of data networkproducts.Management as a DistributedApplicationNetwork management within <strong>Digital</strong> EquipmentCorporation began as the management of networksof DECnet nodes. These networks consistedof peer computer nodes with peer managementcapabilities; that is, each node had remoteaccess to the management capabilities of everyother node, subject only to access control. Eachnode also provided access to its own managementfunctions and data for its own local users.A common user-interface program, called thenetwork control program (NCP) , is implementedacross all DECnet products. This programis not simply a remote console interface tonetwork products. In addition, it allows remoteaccess to network management functions via apublished, proprietary protocol.The character of <strong>Digital</strong> 's networks haschanged significantly over the last ten years.They now have components that are neitherDECnet nodes nor peers, such as IAN bridges,gateways, and other servers of various kinds. Theapproach we used to integrate the managementof these products was based on the DNA managementstrategy. That is, the level of function andthe general presentation to the user were similarto those originally in DNA network management.A number of initial strategies were tried toextend the architecture in various ways toinclude the management of these products.For X.25 connections, our initial strategy wasto extend the arch itecture by adding X.25-specific capabilities. Unfortunately, this strategytightly coupled the management of the X.25product to that of the DECnet product by addingX. 25 management to NCP. That coupling madeupgrading the management implementation foreither product more difficult since it createdinterdependencies between product developmentschedules. We have found these interdependenciesto be difficult to manage with onlythese two products. This strategy would be completelyunworkable if we pursued it for the managementof all the different network products.Nothing would come to market if we had to coordinatethe development and release schedulesfor dozens of interrelated items.For SNA gateways, our temporary strategy wasto derive a parallel management architecture forSNA, based on the DNA management capabilities.Wh ile decoupling the two architectures andimplementations, this strategy only postponedthe integration of network management for SNAgateways. It also resulted in multiple managementprotocols and user interfaces, although theparallels between these were obvious. We couldreadily see that this approach would not workwell in the long term as the number of productsto be managed increased. We also knew that integrationof network management for all our net-118<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New Productswork products was very desirable to customersand would allow us to eliminate duplication ondevelopment projects.For IAN bridges, our strategy was to use a nonproprietaryprotocol based on the evolving IEEE80 2 .1 management protocol. This strategy wasthe first step in an evolution toward the adoptionof emerging international standards for networkmanagement. 2 Since the development of managementprotocol standards had q.ot been completed,using such protocols amounted toworking with prototypes. However, the needto evolve the strategy in this direction wasclear.NCP and other similar products developed forSNA gateway and IAN bridge management providethe network manager with various simplefunctions. Among these are configuration control, low-level testing, and snapshot views ofstate information and various traffic and errorcounters.While integrating these products, both thearchitecture and the subsequent implementationstrategy must be enhanced to allow the independentdevelopment of management capabilitiesfor these diverse components. We are currentlyworking on such enhancements, as discussed inthe section "Future Developments."Other factors were also highlighted while wedeveloped management capabilities for bridges,gateways, and servers. These more recent additionsto our network product set do not alwaysprovide local access to their own managementcapabilities or allow the initiation of managementaccess to other network components. Manyhave no locitlly connected device to support alocal management user interface. To allow networkmanagement for these components, sometype of remote management access from anotherstation that does provide a management userinterface is essential.Remote access is also the key for managingDECnet nodes. Without this access, no managercould see more than his own local node information,which is not sufficient to diagnose andsolve overall network problems. Without remoteaccess, service personnel would have to be sentto each node location that was experiencing theproblem in order to collect management information.This problem results in networks that aremuch more expensive to service.Remote access to the management applicationcan be provided in a number of ways. Access canbe centralized, in which one management stationprovides access to the entire network; distributedto a few management stations; or fullydistributed to all nodes, as ·access is in DECnetNCP. In any case remote accrss from one node tothe management functions ahd data from anothernode necessitates designin and implementingthe management application itself as a dis-1tributed function. The architecture for this distributedmanagement must specify a set ofdistributed software and ,database elementsthat will be needed to manage diverse networkcomponents.Distributed Managerrt,ent Architectureand Application ElemmtsThe basic elements that must be included in adistributed management architecture and in theapplications developed within it are• A user interface• A management agent in ech component to bemanaged, providing remote access to its man-•Iagement funcuons and dataII• A communications mechanism between thenode running the user irtterface and the agentsoftware modules in the network componentsbeing managed• A management database ·.• A set of simple managment functions onwhich more intelligent functions can belayered !Figure 1 illustrates the relationships betweenthese elements.The User InterfaceThe user interface and software to invoke managementfunctions genedlly reside on one ormore management stations i (more if distributed,as with NCP; or one if centralized) . The userinterface is the key to developing an integratedmanagement application from a network manager'sperspective. In developing new networkproducts, we have extended the command languagesyntax developed for DECnet managementto X.25 connections and SNA gateway products,as well as to bridge and seer products currentlyIunder development. Thus 1a common commandstyle, resembling English sntences as closely aspossible, has been used to manage our expandingset of network products.I<strong>Digital</strong> TecbnlcaiJournnlNo. 3 <strong>September</strong> 1986119


The Evolution of Network Management ProductsMANAGEMENT STATIONCOMMUNICATION OFMANAGEMENTFUNCTIONS, DATAOVER NETWORKNETWORKCOMPONENTSPROTOCOL/AGENTMANAGEMENTDATABASENODECOMMANDI--USERINTERFACEMANAGEMENTFUNCTIONSAGENTMANAGEMENTDATABASEGATEWAYIIlAGENT...MANAGEMENTDATABASEBRIDGEr'--_../MANAGEMENTDATABASEFigure 1Distributed Ma nagement Application ElementsA unified user interface giving managementaccess to all these network products from a singleprogram would have been extremely desirable.Such an interface, however, could not beprovided for the first release of all of them. A unifiedinterface would have to allow for the identificationof all products (DECnet, X.25, SNA,bridges, etc.) to which the desired managementfunction would apply. The architecture had notyet addressed this problem of selecting amongmultiple products to be managed, although ithad been extended for X.25 connections. Thusthe X.25 connections can be managed via thestandard NCP interface. However, the SNA gatewayhas its own SNA control program bundledinto the SNA products; and LAN bridges havetheir own management software, the remotebridge management software (RBMS) , sold as aseparate product. The network architecture iscurrently being extended to incorporate this taskof selecting among multiple products.Management AgentsNetWork components must contain managementagents before the components can be managedremotely. These agents perform the managementactions issued by users by way of the user interface.The agents also maintain the operationalmanagement data needed to support the manage-- ·ment application and provide remote access tothis data maintained by the component itself.An agent must be addressable across the networkand must respond to a set of functionrequests that provide adequate managementcapabilities for each particular type of component.Some components (e .g., hardware communicationscontrollers connecting computers tonetwork media) may have extremely simpleagents. Complex programmable network components,like bridges and servers, have agents thatmay respond to many more functions.3 Thusthese components may have a lot of the distributedmanagement application residing in thesoftware or firmware of the agent.Consistency of implementation of managementfunction across agents of diverse types isextremely important. 4 If the implementation offunctions in the agents of different networkproducts is inconsistent, · then these productswill not provide a uniform set of fu nctions to the120<strong>Digital</strong> Tecbntcal]ournalNo. 3 <strong>September</strong> 1986


New Productsusers. Furthermore, the same management functionmight result in different actions for differentnetwork products. Inconsistency of this typeresults in user confusion and dissatisfaction.Management ProtocolsA management protocol is the vehicle for communicatingmanagement functions and databetween the user interface and the managementagent. The NICE protocol was developed for thispurpose within standard DECnet components.NICE was later extended to include X.25 gatewaymanagement and was also used as the basefor adding extensions to manage DECnetjSNAgateways. As mentioned earlier, for non-DECnetservers and bridges, we are tracking the evolutionof a nonproprietary standard managementprotocol.The communication mechanism between theuser interface and the management agent mustalso provide reliable transmission and end-to-endintegrity of management function requests andmanagement data. These requirements have beensatisfied by the data link and end communication(OSI transport) layers of the <strong>Digital</strong> NetworkArchitecture for DECnet links. These layers havealso been used in the management of the gatewaysbetween DECnet and X.25 or SNA networkssince the layers are available on all componentsimplementing the DECnet standards.For LAN bridges and some other non-DECnetserver implementations, other transport mechanismshave been used with the management protocol.In the case of bridges, no transport mecha"nism is implemented at the bridge; simpledatagram-based management is used. For bridgemanagement, the node running RBMS mustassume total responsibility for the reliable transmissionand integrity of management messageson the LAN . The component being managedassumes no mutual responsibility for these messages.Based on product constraints, the responsibilityfor the reliable communication of managementinformation can thus be distributed in avariety of ways . Therefore, we must extend thenetwork management architecture to specify astandard solution to this problem for non­DECnet products.Management DatabaseThe database needed to support the managementapplication can be distributed in a number ofways. Some data must be maintained by the man-<strong>Digital</strong> <strong>Technical</strong>journalNo. 3 <strong>September</strong> 1986aged components themselve, including componentidentification, state infbrmation, traffic anderror counts, and other operational parameters. Apermanent copy of the operational parametersmust be maintained in nonvolatile storage torecover from human, computer, and networkfailures. This copy could exist either in nonvolatileRAM in the compom nt, on local externalstorage, or on a file at the l: emote managementstation. If the copy exists at ' the management station,then the component must depend on thatstation for correct operation unless the data isalso stored locally.Other data files that might be contained in amanagement database inclure the fo llowing:• Loadable software image 'files for both operationaland diagnostic imges associated withparticular network components• Event log files in which notifications of significantevents for network :components are col·'lected• Reference data files that,] though not essentialfor component operation, give informationrelevant to the physical identification, location,and servicing of network components• Management directoryThe management directqry contains informationneeded to identify and locate data relevantto a particular component and to gain access tonetwOrk addresses for the components themselves.Those addresses are needed for the purposeof remote management access.To provide reliable access to essential directoryinformation, the directory data must eitherbe part of a network-wide : directory (or namingservice) or be kept at the d.anagement station ofthe component. The other 1 files, however, couldreside at the management station or on externalstorage at the component, if such storage exists.(It does not exist for many components · likebridges, gateways, and some servers.) The directorycan be used to access idata no matter whereit resides. The directory ! allows managementdata distribution trade-off to be made to meeti'the needs of the network • products themselvesas we ll as those of the management stationsoftware.Management FunctionsSimple management fu nctions are network con­Itrol functions involving interactions with net-1tI121


The Evolution of Network Management Productswork components in which few or no analysesare made of the data or test results obtained.These interactions include• Collecting or modifying management datamaintained by components• Requesting management actions to occur atcomponents (enable, disable, test, etc.)• Loading operational code or parameters intocomponents• Collecting notifications of significant eventsfrom componentsAccess security to these operations is providedthrough a database of authorized network managersand their access rights. In this way differentlevels of access rights can be granted to differentusers for certain functions (e.g., privileges mightbe granted to collect management data but notmodify it from a remote node) .We are currently extending the four simplenetwork-control functions across our networkproduct set. More-intelligent management functionsare being devised to automate the collectionof management information and to provideanalyses of data and test results. Such analyseswould yield information about network performance,fault management, and accounting.These more intelligent functions are being introducedin items such as the DECnet monitor andPfFM products, described later, and ETHERnim,the LAN testing and diagnosis software. Theselection of simple and intelligent managementfunctions to be performed must be accommodatedby an integrated user interface, just as thatinterface must allow for the selection of componentsto be managed. The evolution in <strong>Digital</strong>'snetwork products toward this level of integrationhas just begun.Amount of the ManagementApplication to Be DistributedCertain distribution criteria affect the design ofnetwork components relative to. their managementcapabilities.5 These criteria concern componentpricing, component performance, andnetwork performance. Clearly, the cost of developingsophisticated, intelligent managementagents, such as those providing complexthreshold and statistical analyses, will bereflected in the price of the network product.Customers will normally purchase many networkproducts to be managed by one or a few managementstations. Therefore, an expensive managementstation would be preferable to many expensivenetwork products. One could make thedecision to put much of the complexity into themanagement station software if the productrequirements warrant this move; a more intelligentagent generally results in less traffic. However,such an agent may also require more overheadin the component, thus affecting theperformance of the component itself.The amount of the management application tobe distributed to the management agent alsoaffects the design of intelligent managementfunctions in the management station software.For example, an agent able to log events at settabletime intervals automatically provides thedata needed for analyses by intelligent managementfunctions. An agent not providing thesefunctions must be polled for this information bythat software, which means more code in themanagement station software and more traffic onthe network. The following sections describetwo products that provide some of these moreintelligent management functions in the manage·ment station software.The DECnet Monitor<strong>Digital</strong>'s monitoring product automates the collectionand analysis of network managementdata. This section discusses how the monitorevolved to include automated functions andenhanced management databases, managementprotocols, and user interfaces.Evolution of the MonitorHow to distribute the management responsibilitiesis a key question in developing a networkmanagement strategy. One answer is to distributethis responsibility to the manager at each component.That was done with DECnet networkmanagement, as we described earlier. Anotheranswer is to centralize network managementresponsibility at one or a few nodes .By 1979, <strong>Digital</strong>'s internal network was largeenough to require the centralization of its managementtasks. The group responsible for centralmanagement developed a tool to monitorthe network. This tool, called Observer, wasreleased in December 1982 and ran on a PDP- 1 1system under the RSX- 11M software. Eventuallythe size of the network and the subsequent mon-122<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New Productsitoring activities outgrew the capabilities of thissystem. The lack of memory address space andCPU speed. prevented adding new features, andthe forms-based user interface became quitecumbersome . Thus a new monitor developmentproject was initiated, which culminatedin the DECoct monitor, based on the VAXjVMSsoftware.6The DECoct monitor provides the capabilityto centralize network management, automatingmany of the intelligent monitoring fu nctions· discussed in the last section. It also enhancesthe databases, protocols, and user interfacesprovided by NCP, thus allowing the user tomonitor the state of his network more effectively.Centralized ManagementIn centralized management, a centrally locatedorganization (for example, corporate headquarters)assumes responsibility for the managementof key parts of the network. A centralgroup usually has the expertise and resourcesto deal with the more difficult problems ofnetwork management, which require peoplewhose skills are scarce and expensive. Thusthese people are more effective as a central,and therefore shareable, resource. Theirscarcity means that they need easy-to-use toolsto make them productive . There are manyaspects to ease-of-use; for the monitor, it meansproviding users with information in a formthat is easy to understand. Of course, easy-touseprograms often require more computerresources (disk space , memory, CPU time) ,but the overall cost of network managementcan be lowered if the tools make people moreefficient.Centralized DatabaseNetwork problems typically span many nodes. Asystem manager at a component may see thesymptoms of a problem but not have the informationneeded to fully understand it. Only the networkmanager can access all the informationneeded to diagnose and solve problems thataffect the whole network. To do this diagnosisfrom one location, the network manager mustgather and store data in a historical database athis central site . This centralized database is animmensely valuable resource in managing a network,operating even when some of the systemsbeing managed are not.;iShort, Medium, and Lng TermProblems :Network managers have short-, medium-, andlong-term needs for data and for analyses of thatdata. Over a short time period (hours or minutes), the network manager is concerned withdetecting and solving critical network problemsor failures. Such failures m 1 ight cause the com-'pany's network application Ito be unavailable tosupport its business needs for an indeterminateamount of time. To detect problems, the networkmanager needs intelligent analyses of the mostrecent state of the network..Network problems often:arise from multiplecomponent failures. For exmple, in a networkI .that makes use of alternate ]paths and automaticfail-over (such as provided' by the DECoct software), the failure of two or more circuitscan partition the. network. Tbis failure could alsodisrupt a business application that depends onthe network as a resource. Such a partition willnot occur if only one failure takes place . OnIthe other hand, a single faqure can be detectedin several locations. For e*mple, if a point-topointcommunication line fails, that failurewill be recorded at each end of the circuit. Thusthe network manager needs coordinated informationfrom all points noting the failure if he orshe is not to be confused by multiple indicationsof the same problem. Th monitor evaluatesthese communications failures, sorting outduplicate indications and ignoring redundantinformation.Error and traffic statistics are a key means ofdetecting problems in the network. Many ofthese statistics come from the counters built intothe DECoct software. Single counter readings,however, are not immediatly meaningful. To beuseful, counters must be ampled at the beginningand end of an interval, and their differencecan then be used to compute important statistics.For example, dividing the difference in counterreadings by the length of ,the time period willyield the rate (such as traffic or error rate) perinterval. The DECoct mohitor computes boththese statistics as well as niany others.In a medium time period (hours or days) , thenetwork manager must note developing trendsthat may predict incipient problems that mightbe prevented. Certainly, increasing error rates ortraffic levels could signal developing problems.The DECoct monitor provides displays to signalI<strong>Digital</strong> TecbnlcaljournalNo . 3 <strong>September</strong> 1986123


The Evolution of Network Management Productssuch trends, for example, an on-line histogram oftraffic or errors over the past week for a specifiedline.Over a long time period (weeks or months) ,the network manager is concerned with planningissues, such as how to configure the network sothat its performance and reliability meet theusers' needs. Information about the current network,its performance and reliability, and theworkload presented to it is invaluable. Thisinformation is all available from the DECnetmonitor via traffic and error reports over specifiedlong-term intervals for specific networkcomponents.Function DistributionA key to the design of the DECnet monitor was tobalance the functions that could be distributedmost advantageously with those that could I?ecentralized. The advantages of centralized informationwere described earlier. However, themonitor's design had to incorporate the flexibilityand other advantages of the existing distributednetwork management features built intothe DECnet software.As discussed earlier, each DECnet system has amanagement agent that maintains data about thatsystem and makes it available to a network managerat a remote location. That composite bodyof data must be collected, analyzed, and stored insome central database, after which it can be distributedto the system managers. The DECnetmonitor gathers this data at user-specified intervalsusing the standard DNA management protocol,NICE, originally used only by NCP. The datacollected concerns counter values, and statusand operational parameters for the lines, circuits,and nodes in the network, including thosefor DECnet, X.25, and SNA connections.The central database supports remote sharedaccess from users. The DECnet monitor provideseach network manager with a separate presentationinterface to the shared data. Data distributionis accomplished via an enhanced managementprotocol, an extension to NICE needed tosupport the additional functions provided by theDECnet monitor.Usage StylesEase of use through human engineering was amajor goal of the DECnet monitor, reflected inthe graphical presentation of information that isdifficult to express in other ways (e .g., the topo-logical map) . Yet the command syntax and presentationof information for the more basic capabilitieshave been derived from those in theoriginal DNA network control program.Network managers have different styles ofusing monitoring. The DECnet monitor supportsthree usage styles: batch, interactive, and alarm.In batch usage, reports are automatically generatedat set intervals. This style, oriented tomedium- and long-term planning needs, is usedmost often to produce summaries of networkmanagement information.Interactive usage is driven by user commands.The DECnet monitor's interactive user interfaceadds many new commands and dynamicallyupdated displays to the static displays availablefrom NCP. These capabilities provide automatedmonitoring capabilities on line. With theseadded functions, it was impossible to keep themonitor's syntax identical to that of NCP's; however,there is a definite family resemblance.Interactive usage is oriented toward short- andmedium-term management needs.In alarm usage, the monitor initiates a signal toindicate a problem to the user (for example,turning red the symbol of a system on a topologymap) . Alarms are oriented toward short-termproblem solving.Design of the DECnet MonitorThe most important decision we made in designingthe DECnet monitor was to limit the productto monitoring; we did not attempt to control thenetwork as well. Network managers reallyneeded more help in monitoring, since NCP'scapabilities in this area were primitive, althoughits control capabilities were perfectly sufficient.Including both monitoring and control was simplytoo much to attempt in a single developmenteffort.With these general requirements in mind, theDECnet monitor was designed to have the followingfive functions:• Collect management data from the networkcomponents• Store the management data in a centrallocation• Distribute data to users• Evaluate the collected data into meaningfulinformation (statistics, configuration description,etc.)124<strong>Digital</strong> <strong>Technical</strong> JournalNo. 3 <strong>September</strong> 1986


New Products• Present that information to users in easy-to-useformats (color graphics, topology maps,histograms, tables, lists, fo rms , and batchreports)Many of the lessons learned from designingmanagement software, such as the DECnet monitorfor data networks, also apply to voice networkmanagement.Telecommunications NetworkManagement<strong>Digital</strong>'s earliest telecommunications managementproduct provided simple cost-allocationand traffic-management reports. This capabilityhas evolved to allow users to track costs andexpenses and to generate billing invoices. Userscan now capture historical data and perform predictionanalyses on it.The products developed are based on an officeautomation system that also provides genericapplication-generation tools. Thus users can nowintegrate their telecommunications managementfunctions with the rest of their business communicationneeds. This integration means thatreports, which are really just document files, cannow, after processing, be annotated and shared(via electronic mail) With people from otherdepartments.Evolution of Vo ice ManagementThe TELEPRO product, introduced in 1981,was <strong>Digital</strong>'s first entry into the field of voicenetworks. This early telecommunicationsproduct was designed to collect station messagedetail recorder (SMDR) information fromPBX systems and to generate basic cost accountingreports . These reports included roll-upsof telephone charges for various subgroups,such as departments, cost centers, and the like.There were also reports providing networktraffic information, such as trunk usage andcall cost distribution. In the latter case, informationwas reported on a per-trunk basis andsubdivided by times of day over a monthlyperiod. For example, a manager could seehow many calls on a particular tieline exceededone minute, how many two minutes, etc.,for each day of the month. This informationgave the manager better analyses of the performanceof the network, helping him to makebetter decisions about proper network configuration.Digttal TecbnicaljournalNo . 3 <strong>September</strong> 1986This early product was intended to be a standalone,inexpensive, and easy-to-operate product.The processors initially chosen were the PDP- 1 1family. This choice gave end users the ability to.choose independently from; a variety of processors,disks, and memory configurations. Thusthey could tailor their systems to fit the require-'ments imposed by their data vo lumes.To further extend this prqduct's functionality,we added <strong>Digital</strong>'s databasd query language andreport writer, the DATATRIEVE system. This additiongave TELEPRO's developers the ability tobuild a powerfu l set of standard reports in asmall amount of time. With this set, users couldcreate their own custom,built reports. As aresult, the early product fit well with the needsof the then current market for telecommunicationsmanagement software.When this early product was introduced, manycompeting systems were based on personal computers,which did not have the same extendedfunctionality as the PDP-11 i system. These smallcomputers simply could no provide the storagecapacity or processing power required by largecustomers. Their applictions needed largetables for accurate call pricing, and reports ofvarying detail with information stored as low asthe call record level. Many PC-based productstried various ways to get around these limita:tions. Some approximated call prices (by regionrather than explicit area codejexchange) ; otherscreated summary data on the fly (thereby not savingthe raw data) ; still others used a combinationof the two . All these attempts meant that therequired information might not be available if auser wanted to create his o;wn reports.Meeting Market . NeedsIAround 1984, because of the breakup of AT&TCorporation, the requiremnts for voice managementbegan to expand rapidly, beyond TELE­PRO's capabilities. Owners, of large PBX systemswere now reselling telecommunication services,for example, to landlords of buildings, universities,and even hospitals. Unfortunately, TELEPROwas limited to tracking expenses rather than generatinginvoices. Thus the P/FM (PBX facilitiesmanagement) product was created to providethis additional functionality.As with the DECnet monitor, we soon realizedthe limitations of the PDP- 11 architecture for'supporting this additional functionality. To solvethis problem, we first debded to convert the. IIIi125


The Evolution of Network Management Pr,oductsbasic functions of the early product into thericher VAX architecture. We then added aninvoicing capability and built a common userinterface to yield the final product.Besides tracking telecommunications information,P jFM can track charges for nontelecommunicationsitems, such as monthly parkingfees, office cleaning, and equipment rentals in afacility environment. These capabilities allow alandlord of a building to present tenants with singleinvoices that charge not only for telephoneusage, but also for virtually . anything he hasdefined as a billable entity.Unlike the early product, used primarilyby telecommunications managers experiencedwith computers, the P jFM product is aimed at amore business-oriented market. Therefore, itneeded an extra degree of friendliness to besuccessful. End users can choose a wide varietyof processors from the VAX family, which providesdevelopers with an excellent base operatingsystem on which to add future functionality.Not only do users have access to bigger databases,but developers have all the VAX layeredproducts with which to construct their applications.One such product is the ALL-IN- I system.Although presented primarily as an officeautomation product, this system provides avery user-friendly environment for entering dataand paging through forms. When building P jFM,we took the office automation forms software..from the ALL-IN-I system and borrowed theremainder fo r the base software. The resultingproduct has the ease of use associated with officeautomation, although it is dearly not officeautomation in the electronic mail or calendarsense.To get the P jFM product into the marketplacequickly, we designed and wrote the first versionof the software in a fairly short time. This versionconsisted of two separate and distinct subsystems(expense tracking and invoicing) , even thoughthe user saw only one system interface. Unfortunately,in some cases information had to beentered twice to produce different reports usingthat information. Furthermore, the TELEPROalgorithms were based on a limited program sizeimposed by the original 16-bit PDP-1 1 architecture.Those parts of PjFM based on this architecturewere already at their limits. It was clear theexpanding market for this product wo uldquickly supersede P jFM's ability to go beyond itsoriginal goals.Evolution of the P jFM SoftwareTo meet these expanding needs, we redesignedP jFM with a new architecture that allowed certainareas to be easily modified in the developmentprocess, as well as in the field. In essence,th,is new architecture would provide P jFM'sbasic set of fu nctions. As needs arose for additionalfunctions, experienced customers or<strong>Digital</strong>'s Software Services personnel could writethe necessary code. The resulting product wouldhave more functionality and be better integratedand therefore easier to use. Furthermore, bybeing tailored to fit the VAXjVMS environment,the product's performance would improve aswell.As companies throughout the country expandedtheir telecommunications needs, we sawa continual flow of new product requirements. Itbecame obvious that a more formal strategy forchanges was needed to keep up with the constantlychanging market. To satisfy this need, wedecided to have frequent PjFM releases (e .g., thenext two releases came out about a year apart) .This schedule would allow us to add new functionalityto the base product, such as supportingchanges in FCC regulations or implementing specialrequirements from customers. Most importantly,it allowed us to offer in the base productsome of the custom programs written by SoftwareServices. These programs expanded thebasic P jFM functionality while removing theburden on external groups to support customcode.·Although customers could then get the functionalitythey really needed, the overall cost ofoperation was still a problem. A customer couldeither run PjFM on a stand-alone VAX-1 1/730system or share a larger VAX CPU (1 1/750 or11 j780) with other applications. This latterchoice could reduce the capability of the customer'sprimary processor . In essence, PjFM'sprocessing needs for large organizations requiredmore hardware than many users werewilling to pay for.This problem was quite effectively solved by<strong>Digital</strong>'s new low- and high-end VA X processors.The MicroVAX II system provides a cheaper andmore powerful low-end processor, while theVAX 8000 series can, when clustered, provideconfigurations with mainframe capabilities .P jFM can run efficiently on a Micro VAX II systembut now at a considerably reduced cost from theformer arrangement. If a customer wants to run126<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New Productson a bigger VAX system, there is now much moreCPU power available for the primary tasks sincethe PjFM software takes a correspondinglysmaller slice. In fact, it's quite feasible torun this software on the same hardware with theALL-IN- I system. The advantage here is that whenrunning P jFM, the user is presented with thesame type of user interface down to the individualkeystrokes. In fact , with simple modifications,a user could actually take P jFM reports andmail them to ALL-IN -I users, thus closing theloop between the telecommunications and MISdepartments.Parallels between Vo ice andData NetworksThere are some interesting parallels betweenmanaging voice networks and data networks.Both management schemes can potentiallyrequire tremendous amounts of storage, andmanipulating that data can require a lot of CPUpower as we ll. It also appears that no matterwhat capabilities are provided in the base package, customers always have additional requirements.The capability has always existed to buildcomplex systems to analyze massive amounts ofdata. However, they have been too expensive forall but a few companies. Hardware capabilitieshave now become large enough and cheapenough to allow the full integration of both typesof network management on a single system.The computers in the new generation of VAXsystems, especially the MicroVAX II system,provide the right amount of CPU power at theright time. As regards software customization, wecan accommodate the need for custom-builtproducts by creating base software that will supportthat strategy. With PjFM, the ALL-IN- I systemwas used as the base on which customerdesignedapplications could easily be added.That is a case study of how management applicationscan fit into a more global structure. As ournetwork management architecture evolves, itwill define the global structure to be used forintegrating and customizing the software forboth data and voice management applications.Future DevelopmentsThe lessons learned from developing the productsdescribed above are helping us to plan thecontinued evolution of <strong>Digital</strong>'s network managementeffort. This effort will address the managementneeds for the complex network environ-<strong>Digital</strong> <strong>Technical</strong>JournalNo. 3 <strong>September</strong> 1986ment now emerging. Developing products likeIthe DECnet monitor and PfFM are the fi rst step inthis direction. However, thy are only the beginningof a long-term commitment to produce anintegrated management arthitecture and managementsoftware based on an integrated model.Future versions of existing products will evolvewithin the framework of this model.Our proprietary management architecture isbeing extended beyond the range of the DECnetsoftware and toward international standardsas they evolve . This architecture will furtherintegrate the management of non-DECnet products,such as Ethernet bridges and SNA gateways,with the existing integrated management capabilitiesof DECnet and X.25 products. This integrationwill extend remote :access to the currentmanagement functions of all products (e.g.,simple network control funbions for all servers,and monitoring for brides , gateways , andIservers) .·While extending our management architecture,we are developing a more loosely coupledmanagement software desikn that will ease theaddition of new network products and managementfunctions. This design will include a unifieduser interface across network products andnetwork management functions and will providea choice of interface styles (e.g., command line,fo rms, graphics) . The design will also allowusers to customize their soare in several ways.A customer should be able to purchase as little oras much network manageqtent capability as hedesires. Customers want to elect which networkproducts are to be managt;d and which higherlevelmanagement functions are needed. Theirgoal is to tailor the managehtent application softwareappropriately for thir network environments.Customers' or Digftal's field personnelshould be easily able to add customized softwareenhancements, such as special reports and newintelligent management functions.Besides extending the architecture and developingan integrated software design, we are evaluatingthe market requirements for the additionof more intelligent management functions andmanagement for emerging technologies. Part ofthis evaluation is understanding the ISDN standardsand future products based on those standards.We need to determine how and when therequirements for integrated voice and data networksand network management will appear inthe customer environme!nt. We also want toIIIIiII27


The Evolution of Network Management Productsunderstand customers' needs for the integrationof network management with system and applicationmanagement.Throughout its evolution, network managementhas become increasingly essential to customerswhose businesses depend on the operationof their networks. One problem has beenthat network management fu nctions have highrequirements for processor power and databasestorage. However, since processing power isbecoming cheaper, customers can now takeadvantage of smaller, less expensive, yet morepowerful processors to fu lfill these needs. Theprimary evolutionary trend for network managementhas always been to make people mon; efficient.The affordability of increased processorpower will contribute enormously to <strong>Digital</strong>'sability to provide integrated, extensible, andmore-intelligent management functions. Theavailability of these functions will make peoplemore efficient and effective in the future.ConclusionSome important goals and guidelines haveemerged from the evolutionary processdescribed in this paper. They will serve as aguide for fu ture network management developmentin <strong>Digital</strong> Equipment Corporation .• New products to be used in DECnet networksshould incorporate basic network managementwhen those products are introduced.• Remote access to management functions isneeded to support both decentralized andcentralized management.• An integrated management architecture isdesirable, yet_ it must allow actual productimplementations that are not tightly coupled.• Commonality in management user interfaces,databases, protocols, and fu nctions reducescomplexity, makes the products easier to use,and reduces the duplication of developmentresources.• More intelligent and automated data-analysisand evaluation fu nctions are needed to facilitatethe network manager's job. These functionsshould address the network managementrequirements of all network products.• Customers should be able to tailor the managementapplication software appropriatelyfor their network environments.• Network management is a distributed applicationthat should be integrated into the overallsystem environment in support of users'businesses.AcknowledgmentsThe authors express their appreciation to JimCritser, Stan Goldfarb, Bernard Harris, Bill Keyworth,John Morency, Louise Potter, and DonnaRitter for their careful review and many suggestionsthat have enhanced the ideas presented inthis paper.References1. DECnet <strong>Digital</strong> Network Architecture(Phase IV) Network Management Fu nctionalSp ecification (Maynard : <strong>Digital</strong>Equipment Corporation, Order No. AA­X437A-TK, 1983).2. ]. Heffernan and D. Ritter, "Remote BridgeManagement," DECUS NETwords Newsletter(1986).3. N. La Pelle and K. Chapman, "BuildingBlocks for Remote LAN System Management,"FOCjLAN85 Proceedings (<strong>September</strong>1985): 137- 146.4. D. Thompson, "A Management Standard forLocal Area Networks," IEEE Fo urth InternationalConference on Computers andCommunications (March 1985): 390- 396 .5. N. La Pelle and K. Chapman, "Distribution ofthe Management Function in LAN Systems,"Second Annual ACM Northeast RegionalConference Proceedings (October 1985) :250-267.6. M. Sylor, "The NMCCjDECnet MonitorDesign,'' <strong>Digital</strong> Te chnical journal (<strong>September</strong>1986, this issue) : 129,-141.• The distinction between voice and data networksis becoming less distinct, and networkmanagement must consider both.128<strong>Digital</strong> Tecbnical]ournolNo. 3 <strong>September</strong> 1986


Mark W. Sylor 1The NMCC/DECnetMonitor DesignThe NMCCjDECnet Monitor system allows the monitoring of a .DECnetnetwork. Using the monitor at a central point allows the network managerto control the operation of the network. To be effective, e needsinfonnation about the network's current configuration, state, I performance,and errors. The monitor maintains and interprets a dat base ofnetwork infonnation, which is presented clearly and concisely to theuser through interactive graphics and other techniques. The interpretationand evaluation techniques analyze situations that may be problemsand alert the user to them in a real-time operation.Network management can be described as a controland feedback loop like the one shown in Figure1. In this loop, information is gathered fromthe network by the monitor function and presentedto the network manager. He then decidesif the situation in the network is satisfactory ornot. If not, the manager can initiate some controlaction - perhaps issue a correction, gather furtherinformation, or perform a test. The controlloop feedback cycle is "Look, Think, Act."It's clear that one key to network managementis the manager's having available the informationhe needs to make control decisions. In DECnetnetworks, the NMCCjDECnet Monitor system, orNMCC, can provide this information at one centralpoint.Figure 1MANAGER"THINK"MONITOR "LOOK" "ACT" CONTROLNETWORKMonitor Control Feedback LoopRequirements fo r a Network ManagerA network monitor like the NMCC system mustmeet many requirements.: The most imponantones to consider in designing such a product aredescribed as follows: :I• Multiple managers - A network may havemultiple network mam1gers, people who allaccess the monitor simultaneously. The monitormust allow performance data and calculationprograms to be shared among those managers,even though they will typically beasking for different types of information.• Multiple styles of usage - Network managersuse monitors for different purposes; hence,they have different styles of usage. The fivestyles of usage that are encountered are1. Batch, characterized by the automatic productionof periodid reponsI2. Routine, an intedctive style whereinI .monitoring is done · at fixd time periods(e.g., every morning when the user comesto work)3. Browse, an interactive style wherein monitoringis done on a random basis, whentime is available4. Alarm, in which a. monitor notifies theuser of problems W:hen they are detected(A notification could be to color a systemIred on a display, print a console message,signal a beeper, et.)<strong>Digital</strong> TecbnicaljournalNo. 3 <strong>September</strong> 1986129


The NMCCjDECnet Monitor Design5. Operational , in which the managerobserves a terminal on which informationabout the network is continuously displayedThe NMCC architecture supports all five usagestyles.• Variety of information - The complexity ofthe network is reflected in the variety of informationthat the network's components canpresent to a monitor. It must collect, store,and analyze configuration, status, performance,error, and reference information aboutthe network. Each component in the networkcan supply information about one or more ofthese categories. Moreover, a monitor musthave information to control its own behavior.• Real time and history - A monitor mustprovide information about current conditionsin the network. Of course, "current" is a relativeterm because changes occur in real timeas more recent information is gathered. Amonitor must also provide historical data,needed to compute trends over periods oftime. Network managers must be able to"replay" what occurred in the network, bothfor long-term reporting and for immediateproblem solving.• Ease of use and clarity of presentation - Theefficiency of information presentation is veryimportant, given that the manager interacts soclosely with the monitor. Often, graphics arethe best way to present complex statisticalinformation and topological relationships thatare difficult to display in any other way.• Universality -A typical DECnet network isimplemented across many diverse computerhardware and software systems and supports avariety of communications media. Thus amoitor must be able to collect and presentinformation from each and every one of them.High Level Design of theNMCC SoftwareTo meet the requirements discussed above, wedecided that NMCC had to provide five basicfunctions:• Collect data from the network• Store the data• Distribute that data to users upon request• Evaluate the data into meaningful information• Present that information to the network managerand end users upon requestWe also decided to support two usage modes: a:.1interactive user interface, which supports theroutine, browse, alarm, and operational styles ofusage; and a reporting user interface, which supportsthe batch usage style.These decisions led naturally to the overallMCC design shown in Figure 2. The monitorconsists of three major programs: the kernel, theinteractive user interface, and the reportspackage.The kernel collects data from the componentsin the network and stores that data in an on-linedatabase. The kernel distributes the stored databoth through the NMCC protocol used by theinteractive user interface and through the historyfiles used by the reports package. Running continuously,the kernel supports parallel activitiesfor multiple simultaneous users.The interactive user-interface (UI) programcan be run on demand by the manager or any userwith proper authorization. This program evaluatesthe data and returns the subsequent informa-. tion to the person requesting it. The UI programalso manages the operation of the monitor itself.The programs in the reports package also evaluatethe data, which is presented as hard-copyreports. The kernel periodically writes data fromits on-line database into history files, which arearchived copies of the data collected during eachday of operation.The design of NMCC separates the kernel,which is a management server, from the networkmanager's workstation, the user interfaces, andthe reports package. This separation allows thekernel to be run on one system, while the otherprograms can run on other systems.Common Design ThreadsThree common threads run through much of thedesign of the NMCCjDECnet Monitor system.These threads involve a data model, a request/response operation, and a news function.Data ModelEarly in the design, we focused on modeling thedata being manipulated by the managementfunctions rather than modeling the functionsthemselves. We felt that the organization of thedata was more complex than the functions.130<strong>Digital</strong> <strong>Technical</strong> journalNo. 3 <strong>September</strong> 1986


New ProductsThe data does not change its organizationwhen passing through the collect, store, and distributefunctions. It may change its form (e.g.,from binary to text) , but that is relatively minor.Within that portion of the monitor, the functions '·can be viewed as simple database actions (i.e.,read a record, wdte a record, etc.). As with adatabase, deciding how to organize the data isthe most important decision in the design of thesystem.We found we could organize the data so thatany record could be identified with three keys:the component, the informatiOti."type, and thetime of collection.ComponentsIn a logical sense, components are the variouspieces of the network that must be representedin NMCC. Fundamentally, a DECnet networkconsists of computer systems and the communicationsfacilities (wires) that join them. Thosesystems and wires are the main components modeledby the monitor, and, since it also has to manageitself, some of the monitor's components areincluded as well. In that way, we unified twoseparate functions within a single concept. Figure3 shows the component hierarchy that isbuilt into the monitor. The hierarchical relationshipshown in this Bachman diagram reflects thenaming relationships between the components.Each component located below other componentsin the hierarchy is considered to be part ofthose components. For example, a circuitlocated below a system is part of that system.Information TypeAll the component attributes collected and displayedby the monitor could be viewed as a singledata record. For practial reasons, however,the attributes are distinguished by a number ofdifferent information types'. The <strong>Digital</strong> NetworkArchitecture (DNA) structure that underlies allDNANETWORKMANAGEMENT..-· ·.-· ··.... .: ... ..., _:...... .. (. .. · .NICE•PROTOCOL••DNANETWORKMANAGEMENTFigure ,2NMCC DECnet Monitor, Top Level Structure<strong>Digital</strong> <strong>Technical</strong>journal 131No. 3 <strong>September</strong> 1986


Th e NMCCjDECnet Monitor DesignFigure 3Component HierarchyDECnet products provides three of these informationtypes:• Characteristics parameters that control thebehavior of the DECnet network• Status parameters that reflect the dynamicstate of the DECnet network• Counters that are incremented when animportant event occurs (e.g., a data packet isreceived)In addition, reference information provided byusers and definition information used in namingare two more information types. The NMCC systemstores data from all five types. From thatstored data, NMCC can compute three moreinformation types: statistical, topological, andsummary information .Time of CollectionThe monitor collects and stores historical data.This third key, the time of collection, is used todistinguish historical records. While data alwayshas a value, the monitor can collect only samplesof it.By examining the attributes of' the variousinformation types, we found that the data itselfcould also be classified. For example , parametricdata is fa irly constant over a period of time.Rather than store the values found in each sampletogether with the time the sample was collected,we store the values found plus the timesthose values were first and last seen, thus savingstorage space. Counters change much more frequentlythan parameters, however, and, in fact,more frequently than they can be sampled. Inthis case each sample taken is stored with a timestamp, indicating the time of collection. Thelocal clocks of the systems monitored cannot beused for the time stamps since they are not synchronized,nor can they be guaranteed to run atthe correct rate. Thus NMCC uses its own timestamps, calibrated in Universal Coordinate Time(Greenwich Mean Time) , which are generatedwithin the kernel.Request /Response OperationWithin the data model, only a few simple functionsare needed to operate on the data. Thosefunctions create and delete components, readcollections of records (defined by their keys) ,write records, and set one or more parameterswithin records.Each function can be modeled as a requestissued by the client software wanting that func-'tion performed, fo llowed by one or moreresponses to that client from the server performingthe function. This interaction is shown inFigure 4.News FunctionOnce each record has been appropriately timestamped, it is easy to access historical informationin the database . To support a real-time operation,changes in the data displayed have to becommunicated from the kernel to the user interface.To accomplish that transfer, we defined aspecial time value called "current." Reading thedatabase with the time key of current causes thedata responses to be returned in two phases. Inthe first phase, the most recent data is read fromthe on-line database. In the second phase, the132<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


IiNew ProductsSERVERWAITING FOR SOMETHING TO DO.READ REQUEST.DO X, WHICH MAY INVOLVE ISSUINGREQUESTS OF OTHER SERVERS.SEND RESPONSE.REQUEST FOR X, M 7 IN TRANSIT.NSECLIENTI WANT TO DO X, AND THE SERVER CANDO IT, SO . .. SEND A REQUEST!GO ON DOING WHATEVER CAN BEDONE. (SOONER OR LATER, WE DO ALLTHAT CAN BE ACCOMPLISHED WITHOUTX HAVING BEEN DOE.)WAIT.WAITING FOR SOMETHING TO DO.READ RESPONSE. jCONTINUE WITH PROCESSING 'WHEREWE LEFT OFF.Figure 4RequestjResponse Interactionkernel will return a response whenever a new orchanged value to the data is written to the onlinedatabase. A response received in this secondphase is called "news." News can be generatedby the collection of more up-to-date informationor by other managers modifying the database.Data EvaluationAn important design choice was in what sectionof NMCC should the collected data be evaluated.This choice was imponant because data evaluationis a compute-intensive operation. Therewere three basic choices.1. Data could be evaluated immediately after itwas collected. This approach has twoadvantages:a. Processing has to be done only once. Thatprocessing, however, would take placewhether or not any user ever looked at theresults. Thus the CPU time spent on computationcould be wasted.b. The evaluated data would be reduced -and thus take less space - when stored inthe database. However, a careful analysisfound that in most cases the evaluateddata was no smaller than the raw data. Furthermore,in those cases where the datawas reduced, information had been lost.2.i .The approach also has I two disadvantages:Ic. Adding new ways of evaluating the datawould result in major changes to thesoftware.d. The compute-intenive evaluation couldnot be performed ort a separate machine.The data in the databe could be stored inraw form and evaluated in the kernel onlywhen requested specifically by a user. Whileavoiding the problems discussed in a. and b.above, this approach also suffers from thedisadvantages in c. anq d.3. The data could be evalated in the user inter'face immediately befote presentation to theuser.We chose to use the third approach becauseadding new evaluation functions is easy, becauseevaluation is perfqrmed only when requestedby a manager, and because the com:pute-intensive evaluatio could be moved toa separate machine, a network-managementworkstation.KernelThe major functional sections of the kernel aredepicted in Figure 5. The system is built in successivelayers around the bean of the kernel, a!<strong>Digital</strong> TecbntcalJournalNo. 3 <strong>September</strong> 1986133


The NMCCjDECnet Monitor DesignKERNEL INFORMATIONMANAGER(KIM)- --- - -- - --Figure 5NMCC Kernelphysical database that uses <strong>Digital</strong>'s RdBJVMSsoftware . This relational database system waschosen because it provides data integrity, its datamodel is similar to the NMCC data model, and itoffered a simple method for handling sets ofrecords.The physical database is contained within alogical database (LDB) system. LDB providestransaction services and abstracts the operationson the database, thus masking from the rest of thesystem the detailed knowledge of how the databaseis implemented. The interface to LDB isasynchronous, allowing the rest of the system toproceed with other actions while data is readfrom or written to the disk. Because the interfaceto the RdBJVMS software is synchronous, LDB isimplemented as multiple server processes separatefrom the kernel. Each server is synchronizedwith its database transaction.The logical database is contained within thekernel information manager (KIM) , to whichall requests to read or modify data are made .The actions performed by KIM are atomic, meaningthey act as a single unit even though composedof more primitive actions. KIM's clientsare thus freed from needing detailed knowledgeof the transactions. But KIM's most importanttask is providing a uniform way to request historicaland real-time data. This uniformity greatlysimplifies the design of all other parts of thecode. The user interface and reports packagedo not need special code to perform historicalor real-time functions. Instead, they only haveto perform some simple data manipulations;KIM handles all the intricacies of detailedprocessing. Many functions are clusteredaround KIM, all of which use it to access theirdata.134<strong>Digital</strong> Tecbnical]ournalNo. 3 <strong>September</strong> 1986


New ProductsData is collected from the netwoPk by the networkmanagement interface (NMI) , which pollsthe systems in the network periodically for data.As defined in the DNA architectural specification,which is the formal basis for the DECoctsoftware, each system in the network stores managementinformation and accommodates remoteaccess to it. u The protocol for accessing thisdata is called NICE, which NMI uses to request ·status, characteristic, and counter information.The components for which these types of informationcan be collected include the system, thelines, the circuits, and any other remote node inthe network.Counters have a limited range . When theyreach their maximum values, they latch, and anysubsequent events will not be counted. Therefore, if NMI detects any counters that havealready or may soon latch, it can zero theirvalues.The kernel can poll multiple systems simultaneously.The list of systems to poll and the frequencyof polling for each kind of informationfor each component (twe lve kinds in all, fourcomponents times three types) are all controlledby the network manager. This control data isstored in the on-line database.The data collected by NMI is passed to KIM,which determines if the data is news. If so, KIMwrites the news to LOB and notifies any user whohas requested to be notified when that particularnews arrives. Among the other facts that could bediscovered from the data collected is that newsystems, lines, or circuits have been added to thenetwork. When discovercd,l they arc added to theon-line database.· If allowed to, the database would grow withoutbounds with continuohs polling. The databaseadministration (DBA) software prevents thisproblem by periodically pUrging old data fromIthe database .'One unique attribute of the data collectedfrom the DECnet network is its extensibility.Each new implementatiop or upgrade of theDECnet software can define new fields in theIrecords returned from the polling operation.That is accomplished by a data format (calledNICE data blocks) , which is self describing andextensible. The kernel preserves this structureand also enhances it so thaf all data passed fromone major fu nction to another is carried in thisform.The log file writer (LFW) produces the historyfiles that are read to produce reports. At fixedperiods, LFW writes to a set of files the data collectedsince the last histort fi le was written.The NMCC protocol serVer (NPS) is responsiblefor the kernel's end of the protocol link bywhich the UI program communicates with KIM.In effect, NPS, the NMC protocol, and theNMCC protocol client (called NPC in the userIinterface) allow remote access to the data maintainedby KIM. Multiple protocol links can besupported by the kernel , thus allowing multipleusers to access the data.The need for the asynchronous operation of allIthese fu nctions posed a major design problem forthe NMCC development team. Without our goingITASK TASK TASK TASKTASK TASK ... TASK.. ,., ", ---''", 1", _,"', /.I ... .. .. ... ... / ---,... - .. _• ' ... _ _ -"


The NMCCjDECnet Monitor Designinto excessive detail, the kernel is structuredas multiple, cooperating tasks running asynchronously(e.g., a function could be one oreven one hundred tasks, as is the case with NMI).The tasks share resources to which they at timesneed exclusive access. The tasks must communicatewith each other, and they must be scheduled.The software that performs these chores iscalled the resource scheduling services (RSS) .The design of RSS is based on the processjmonitorstructure proposed by C.A.R. Hoare.3 Therelationship between the tasks, RSS, and the message-passingservices that allow communicationsbetween the tasks is described in Figure.6. Themain advantage of this approach is that the developerswriting the tasks did not have to deal withthe details of interrupts, synchronization, orscheduling.The User Interface (NMCCjUI)The user interface supports the interactive usageof the NMCCjDECnet Monitor. As shown in Figure7, the UI program has three main parts: dataaccess, action routines, and presentation. Thedata-access part controls the UI interfaces, andthe presentation part controls the interface to thekernel and the interface to the network manager'sterminal. The action routines, containingthe main routine of the UI program, execute usercommands and evaluate data.Data Access ModulesData access, interfacing the UI program with thekernel, is further divided into a protocol client, arequest manager, and two data managers. Theprotocol client implements the UI end of theNMCC protocol. This protocol, being based on aDECnet task-to-task logical link, allows remoteaccess by multiple users to the kernel database.The protocol client performs the same servicesas those provided at the KIM interface within thekernel. This capability "hides" the kernel andthe logical link from the remainder of the UI program.The basic functions of the protocol linkare to connect and disconnect the logical link,and to code and decode the protocol messages.The NMCC protocol supports the reading andmodification of records in the kernel's on-linedatabase. The protocol is an asynchronous, fullduplex,interleaved request-response protocol.It is asynchronous so that the UI program doesnot have to wait for a response to a request, andfull duplex so that requests and responses canflow simultaneously across the link. The protocolis interleaved so that a request can be issuedwhile an earlier request is still outstanding. Thusresponses can be returned in a different orderthan that in which the original requests wereissued. (This situation normally happens whennews data arrives while other data is beingreturned.) The protocol also allows multipleCONTROL DATAMANAGER (COM)MAIN ANDPARSERSCREENMANAGERDISPLAY DATA DISPLAY.GRAPHICSMANAGER (DDM) MODIFY ANDKERNELNMCC REQUEST OTHERPRESENTATION TERMINAL SYSTEMPROTOCOL MANAGER MISCELLANEOUS STYLE 1/0 (GKS)CLIENT (RM) ACTIONROUTINESCACHEA GRAPHICSDEVICE CONTROLLANGUAGEFOR VT240s(ReGIS)-- DATA ACCESS----.,. - ACTION -- -- PRESENTATION ---


New Productrequests and responses to be transmitted in asingle message across the logical link. The protocolhas proven to be very fast, yet not expensivein terms of CPU time. This fact has allowed us tosee a definite improvement in performancewhen the kernel and the UI program are run onseparate machines connected by an Ethernetcable.Layered on top of the protocol client is therequest manager (RM) . RM maintains a list of theoutstanding requests so that they can be matchedto their responses. RM can be viewed as a set ofservice routines used by the data managers. Theinterface can be either synchronous (RM"blocks" until the response is received) or asynchronous(RM notifies the main routine via anevent flag when a response is received) . Thecode to access the synchronous interface issimpler for the requesting person to program;this interface also behaves in a way that is moreintuitive to the user. On the other hand, the asynchronousinterface provides better performanceand must be used to implement real-timemonitoring.If the protocol link to the kernel should be disconnected(perhaps by a failure in the network) ,RM wi ll first wait and then attempt to reconnectto the kernel. These actions allow the UI programto run unattended as a permanent display.RM detects and eliminates duplicate readrequests, thus saving CPU time. It also cancelsold read requests when they are dropped fromthe cache because of "old age ." Finally, RMkeeps the pipeline flowing by receiving data atthe interrupt level and buffering that data forlater use.The display data manager (DDM) provides theaction routines with an interface to read from theIdatabase. DDM contains an internal cache of datathat has been recently read from the database .This cache improves the monitor's performanceby decreasing the flow of ata between the UIprogram and the kernel. The UI process also runsIfaster because a user reques for data can often besatisfied by a short access to the cache rather thana longer one to the on-line database in the kernel.The cache is purged accord ing to a leastrecently-usedalgorithm. I.n a read for currentinformation, the kernel has remembered theIrequest so that it can notify the UI program ofnews. Thus the cache purging logic will ask RMto cancel that outstanding request when the datais no longer needed. DDM also provides a consistentview of the data in the cache by locking itscontents so that cache up9ates (read responsesreceived by RM) do not change those contentswhile the action routines a ' re reading the cache.Among other benefits, this logical separationallows the display action routines to be quitesimple.The control data manager (COM) provides theaction routines with a synhronous interface tomodify data in the databas .Action RoutinesThe action routines control the Ul program andprovide most of the functionality visible to theIuser. The major action modules are the UI main,the parser, and the display, ! modify, and miscellaneousaction routines. UI' main is the highestlevelroutine in the program. Figure 8 shows thepseudocode of this routine's algorithm. The Ulprogram waits for one of two actions to occur:either the user enters data bn the keyboard, or aIresponse to a currently outstanding request isreceived from the kernel.IInitialize;current-display:=''N etw ork Summary Current Du rat ion 15 Minutes'' ;ILoopWa itFor(user-input DR response-arrived) ;If user-inputThenGe tCommand(command , current-display, new-display> ;Par se(command , current-display, new-disp lay> ;current-display:=new-display;Display(current-display> ;Unt il exi t ;Termin a te;Figure 8UI Main Pseudocode<strong>Digital</strong> <strong>Technical</strong> journal 13 7No. 3 <strong>September</strong> 1986


The NMCCj/JECnet Monitor DesignThe parser first performs a syntactical analysisof the command entered by the user. It then dispatchesto the correct action routine, which performsthe command. Table 1 lists the commandssi.1pported by the UI program.The parser is context sensitive, meaning thatthe current display on the user's terminal is usedto resolve ambiguity wherever possible. Forexample, if the user is currently viewing a displayfor "Network System BOSTON" and issuesthe command SHOW LINE UNA-0 TRAFFIC, theparser will conclude that the line referred to inthe command is part of a system called BOSTON.The parser accepts abbreviated commands andallows mosckeywords to be entered in any order.Since the main function of the UI program is topresent information to the user, the display commandsare the most important.Each display action routine presents a singledisplay to the user. The parser determines whichdisplay is the current one. Since any display canbe changed if new data arrives from the kernel,the correct display action routine is accessed oneach main cycle of the UI program . The currentdisplay is defined by the three key items mentionedearlier: the component, the page, and thetime of collection .The display action routines select the informationto be displayed and then copy it to the presentationroutines. The information displayedTable 1Commands Supported by LllDisplay CommandsSHOW display-idNEXTPREVIOUSPOPPAN direction [distance]FIND component-idMAGNIFY scale-multiplier•COMPRESS scale-multiplier•Modification CommandsADD component-idDELETE component-idSET [component-id] item-listMOVE component-id X coordinate Y coordinate •Miscellaneous CommandsHELP topicSPAWN DCL-commandREPORT report-commandCONNECT kernelEXIT*Only applies to the Network Mapcan come either directly from the database orfrom the evaluation routines, which evaluatedata stored in the database. The information displayedcan even come from different databaserecords, the intent being to display informationthat provides a clear, related viewpoint.The evaluation routines get their data from thecache in DDM. It's quite possible that the datamay not yet be in the cache (if this is the firsttime the data has been requested) . In that casethe evaluation routines have to either present nodata or take some reasonable default. Eventually,the data will appear in the cache, at which timethe response_arrived flag will be set and theupdated data will be placed on the terminalscreen .This approach was taken because we did notwant to block all actions while waiting for potentiallylarge amounts of data to be returned. Moreover,in a real-time display, we could not predictwhen news would arrive. In practice, this designchoice has proven to be sound because the userremains in control. He can issue another commandor change the current display at any time.We did find that early versions of the software,which sent no indication of progress to the user,tended to be confusing since the user was unsureif the software had completed its update of thedisplay. Currently, the software indicates when itis "working," (i.e., updating the screen). Infuture versions, clearer indications of progressmay be added.During the design we were concerned thatevaluating all the information on a display wouldconsume too much CPU time since it was possiblethat only one item might have changed. Weconsidered a number of ways to run the evaluationroutines "in reverse," so that only thoseitems that were changed by news data would bere-evaluated. However, we rejected all thoseways since they were too complex to implement.The problem was that a change in one piece ofdata would change items shown on many displays,only one of which was seen by the user.An important goal of the UI program was tomake as simple as possible the writing of a displayaction routine. These routines and theevaluation software that supports them are thelargest body of code in the monitor.Many evaluation routines are supported. Someare used to compute statistics from the data collectedfrom the counters. The method used iscalled normalization, which uses averaging and138<strong>Digital</strong> TecbnicalJournalNo. 3 <strong>September</strong> 1986


New ProductsFigure 9Photo of Computer Screeninterpolation techniques to estimate statisticsover any time period. Other routines determinethe current states of portions of the network,while still others determine the configuration ofthe network.The modify action routines change the contentsin the on-line database. Invoked by theparser, these routines use the services of CDMand are synchronous with respect to the database.They were made synchronous to avoid confusingusers whose commands were invalid. If anerror message indicating that a command wasinvalid was displayed long after the user issuedthe command, he might have proceeded withanother command and therefore might lose trackof the cause for the error.The remaining commands supported by the UIprogram perform such functions as providinghelp to the user, spawning a subprocess, invokingthe reports package, connecting to a differentkernel in the network, and the all-importantEXIT command.Presentation ModulesThe network manager interacts with NMCC viaeither a VT240 terminal or a VT241 terminal.The software managing that interaction is calledthe presentation software. It presents a consistentstructure for output on the screen and forkeystrokes entered by the user. The screen isdivided into four areas: an identification area,where the current display is shown ; the dataarea, where the information is shown; a commandarea; and a message rea. Figure 9 shows atypical screen format with' . the three areas.The UI program support,s a .number of presentationstyles. The information on any given pageis best displayed for the Oser in one particularstyle. Some of the styles upported are forms,tables, histograms, and maps. Each page isdesigned to present the data to the user in themost effective manner, given the limitations ofthe two terminals supported. We avoided the useof flashing warnings to a!Crt users to problems.Instead, each display presents data so that users<strong>Digital</strong> <strong>Technical</strong>JournalNo. 3 <strong>September</strong> 1986139


The NMCCjDECnet Monitor Designcan spot problems quickly or observe behaviorpatterns in an advantageous way.The forms may contain text, numbers, meters,and color-coded values for text or numbers.Meters are graphs of numeric values; they mayalso show thresholds that, if exceeded, indicateproblems. These thresholds can be preset by theuser, although reasonable defaults are provided.The network map is a plot showing the topologyof the network. Systems are shown assquares and wires as lines connecting systems.(The shape of each line indicates the type ofwire .) The user can also request that eachcomponent shown on the map be color codedwith its status. The map communicates a largeamount of information in a comprehensible fashionto the user. Statistics can be displayed in theform of histograms, which plot values againsttime.A table is used to display each succeedinglylower level of the component hierarchy depictedin Figure 3. Each row in the table describes onecomponent "owned" by the requested .component.For example, for each system, a table is displayedlisting all lines known to be


New Productsfixed format that are accessible through any programminglanguage.The first phase of report processing ·is controlledby a command file, which can be modifiedby the user. For example, the user can automaticallyproduce daily reports.SummaryNetworks are complex systems at the leadingedge of modern communications engineering.The NMCCjDECnet Monitor syste m creates amodel of a network through a database of informationthat reflects the complex relationshipsbetween the components of that network. Thechallenge in designing this monitor was topresent this complexity as simply as possible tothe network manager. He is ultimately responsiblefor the quality of service that the networkprovides.The monitor was designed to be a truly distributedapplication. Special monitor softwaredoes not have to reside on each node, yet themonitor can collect information about any nodein the network. By separating the I/O-intensivekernel from the compute-intensive user interface,yet allowing them to cooperate in monitoringthe network, the actual monitoring can bedivided between two machines. Both can beoptimized for the tasks assigned to them.This design is a framework within which manynew functions and additional data can fit. As wegain experience with using the monitor, andfeedback on the human engineering of simplepresentations of complex data, we are confidentthat this design can support an evolving managementsystem.References1. DECnet DIGITAL Network ArchitectureI{Phase IV) General Description (Maynard:<strong>Digital</strong> Equipment Corporation,Order No. AA-N149A-TC, 1982).2. DEC net <strong>Digital</strong> Network ArchitecturePhase IV Network Management FunctionalSpecification (Maynard: <strong>Digital</strong>Equipment Corpodtion, Order No. AA­X437A-TK, 1983). II13. C.A.R. Hoare, "MoQ.itors: An OperatingSystem Structuring Concept," Communicationsof the ACM, vol . 17, no. 10(October 1974): 549-557.4. B. Shneiderman, "Direct Manipulation: AStep Beyond Programming Languages,"Computer (August 983): 57-69.Other ReferencesR. Rubinstein and H. Hersh, The Human Factor(Bedford: <strong>Digital</strong> Press, 1984).NMCCjDECnet Monitor User's Guide (Maynard:<strong>Digital</strong> Equipment Cqrporation,.Order No.AA-EW35A-TE, 1986) .ijAcknowledgmentsThe author thanks Jim Critser, Nancy La Pelle,Linsey O'Brien, and Bruce Luhrs for theirthoughtful review. Most of all, he thanks thedevelopers of the NMCCjDECnet Monitor,whose long hours and hard effort made possiblethe realization of this software. Those developerswere Peter Burgess, Bill Gist, Matthew Guertin,Robert Merrifield, Dennis Rogers, ArundhatiSankar, Robert Schuchard, Evelyn Wang, and RiazZolfonoon.<strong>Digital</strong> TecbnlcaiJournalNo. 3 <strong>September</strong> 1986141


Editorial StaffEditor - Richard W. BeaneProduction StaffProduction Editor - Jane C. BlakeDesigner - Charlotte BellInteractive Page Makeup -Terry ReedAdvisory BoardSamuel H. Fuller, ChairmanRobert M. GloriosoJohn W. McCredieJohn F. MucciMahendra R. PatelF. Grant SaviersWilliam D. StreckerThe <strong>Digital</strong> <strong>Technical</strong> journal is published by <strong>Digital</strong>Equipment Corporation, 77 Reed Road, Hudson,Massachusetts 01749.Changes of address should be sent to <strong>Digital</strong>Equipment Corporation, attention: Media ResponseManager, 200 Baker Ave ., CFO I-l/M94, Concord, MA01742.Comments on the content of any paper are welcomed.Write to the editor at Mail Stop HL0 2-3/Kl l at thepublished-by address. Comments can also be sent onthe ENET to RDVAX::BEANE or on the ARPANET toBEANE%RDV AX.DEC@DECWRL.Copyright © 1986 <strong>Digital</strong> Equipment Corporation.Copying without fee is permitted provided that suchcopies are made for use in educational institutions byfaculry members and are not distributed for commercialadvantage . Abstracting with credit of <strong>Digital</strong>Equipment Corporation's authorship is permitted.Requests for other copies for a fee may be made to the<strong>Digital</strong> Press of <strong>Digital</strong> Equipment Corporation. Allrights reserved.The information in this journal is subject to changewithout notice and should not be construed as a commitmentby <strong>Digital</strong> Equipment Corporation. <strong>Digital</strong>Equipment Corporation assumes no responsibiliry forany errors that may appear in this document.ISBN 1-55558-000-9Cover DesignTh is issue features networking products. Our cover depictsthe veins of a leaf as a visual metaphor fo r the connectionsin a network. As the leaf grows to support the flow of nutrients,so the local area network expands with extended LANs,gateways, and terminal servers to support the flow of info r­mation. The image was created using the Lightspeed system.1 he cover was designed by Deborah Falck and Eddie Lee ofthe Graphic Design Department.Documentation <strong>Number</strong>EY-67 15 E-DPThe following are trademarks of <strong>Digital</strong> EquipmentCorporation: ALL-IN- I, DATATRIEVE, DDCMP, DEC,DECconnect, DEChealth, DECnet, DECnet-DOS,DECnet Router, DECnet RouterjX .25 Gateway,DECnet-RSX, DECserver, DECnetjSNA Gateway,DECnet-ULTRIX, DECnet-VAX, DEQNA, DEUNA, <strong>Digital</strong>Network Architecture (DNA) , lAS, LANBridge, the<strong>Digital</strong> logo, MicroRSX, MicroVAX, MicroVMS, NMCC,NMCCjDECnet Monitor, PDP-8, PDP-11, PjFM,PROjDECnet, Q-bus, Rainbow, ReGIS, RSTS, RSX,RSX- IIM, RSX- IIM-PLUS, RSX- IIS, RX02, TELEPRO,ThinWire, TOPS- 10, TOPS-20, ULTRIX, ULTRIX-32,ULTRIX-32m, VAX, VAX- 11/730, VAX- 11/780,VAXcluster, VMS, VT, VT 100, VTI03, VT240, VT241,UNIBUSAT&T and UNIX are trademarks of American Telephone& Telegraph Company.IBM is a registered trademark of International BusinessMachines, Inc.Intel is a trademark of Intel Corporation.Lightspeed is a trademark of Lightspeed Computers,Inc.Motorola is a registered trademark of Motorola, Inc.MS is a trademark of Microsoft Corporation.Xerox is a registered trademark of Xerox Corporation.3COM is a trademark of 3COM Corporation.68000 is a trademark of Motorola, Inc.Book production was done by Educational ServicesMedia Communications Group in Bedford, MA.


ISSN 0898-90 1XPrinted in USA EY-67 15E-DP Copyright© <strong>September</strong> 1986 <strong>Digital</strong> Equipment Corporation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!