PCI Express* 3.0 Technology: Device Architecture ... - Intel

intel.cn

PCI Express* 3.0 Technology: Device Architecture ... - Intel

SF 2009PCI Express* 3.0 Technology:Device ArchitectureOptimizations on IntelPlatformsMahesh WaghIO ArchitectTCIS006


Agenda• Next Generation PCI Express* (PCIe*)Protocol Extensions Summary• Device Architecture Considerations– Energy Efficient Performance– Power Management• Software Development• Summary• Call to action2


PCI Express* (PCIe*) 2.1 Protocol Extensions SummaryExtensions Explanation BenefitTransactionLayer Packet(TLP)ProcessingHintsLatencyToleranceReportingOpportunisticBuffer Flushand FillAtomicsRequest hints toenable optimizedprocessing withinhost memory/cachehierarchyMechanisms forplatform to tune PMMechanisms forplatform to tune PMand to align deviceactivitiesAtomic Read-Modify-WritemechanismReduce access latency to system memory. Reduce SystemInterconnect & Memory Bandwidth & Associated PowerConsumptionApplication Class: NIC, Storage, Accelerators/GP-GPUReducing Platform Power based on device service requirements,Application Class: All devices/SegmentsReducing Platform Power based aligning device activity withplatform PM events to further reduce platform powerApplication Class: All devices/SegmentsReduced Synchronization overhead, software library algorithm anddata structure re-use across core and accelerators/devices.Application Class: (Graphics, Accelerators/GP-GPU))ResizableBARMulticastMechanism tonegotiate BAR sizeAddress BasedMulticastSystem Resource optimizations - breakaway from “All or Nothingdevice address space allocation”Application Class: – Any Device with large local memory(Example: Graphics)Significant gain in efficiency compared to multiple unicastsApplication Class: (Embedded, Storage & Multiple Graphicsadapters)3


Continued…Extensions Explanation BenefitI/O PageFaultsOrderingEnhancementsDynamicPowerAllocation(DPA)Internal ErrorReportingExtends IO addressremapping for page faults –(Address Translation Services1.1)New ordering semantic toimprove performanceMechanisms to allowdynamic power/performancemanagement of D0 (active)substates.Extend AER to reportcomponent internal errors(Correctable/ uncorrectable)and multiple error logsSystem Memory Management optimizationsApplication Class: Accelerators, GP-GPU usage modelsImproved performance (latency reduction) ~ (IDO) 20%read latency improvement by permitting unrelated readsto pass writes. Application Class: All Devices with twoparty communicationDynamic component power/thermal control, manageendpoint function power usage to meet new customer orregulatory operation requirementsApplication Class: GP-GPUEnables software to implement common and interoperableerror handling services. Improved error containment andrecovery.Application Class: RAS for SwitchesTLP PrefixMechanism to extend TLPheadersScalable Architecture headroom for TLP headers to growwith minimal impact to routing elements. Support VendorSpecific header formats.Application Class: MR-IOV, Large Topologies andprovisioning for future use models4New Features with Broad Applicability


Device Architecture Considerations• TLP Processing Hints (TPH)• Power Management Extensions5


TLP Processing Hints (TPH)Processor$$MemoryTLP Processing Hints (PCIe Base2.1 specification)– Memory Read, Memory Writeand Atomic OperationsSystem Specific Steering Tags(ST)– Identify targeted resource e.g.System Cache Location– 256 unique Steering TagsRoot ComplexPCI Express* (PCIe*)DeviceBenefitsEffective Use of System Resources• Reduce Access latency to systemmemory• Reduce Memory & systeminterconnect BW & PowerImproves System Efficiency6Effective use of System Fabric and Resources


Requirements ChecklistEcosystemPlatform support (Root Complex,Routing Elements)System specific Steering TagadvertisementDevice ArchitecturePCIe Capability supportPCIe spec. complianceCharacterize workloadsApplication processor AffinitySteering Tag to WorkloadassociationSelect Modes of OperationDevice DriverDevice SpecificAssignmentsOS/VMMPlatformSTMappingFirmwareSoftware DevelopmentBasic Capability Discovery,Identification and ManagementSystem specific Steering Tagadvertisement and assignmentDevice Driver enhancementsRoot ComplexTPH capabilityST TableTPH capabilityPlatformPCIe DeviceTPH7PCI Express* (PCIe*) Technology


No ST ModeNo Steering Tags usedRequest Steering is PlatformSpecificOS/VMMBasic Capability EnablementMinimal Implementationcost and complexityRoot ComplexTPH capabilityTPH capabilityPlatformPCIe DeviceTPH(No ST)Basic support8PCI Express* (PCIe*) Technology


Interrupt Vector ModeST associated with Interrupt(MSI/MSI-X)Firmware provides Platformspecific ST information toOS/HypervisorOS/Hypervisor assigns STalong with Interrupt vectorassignmentRoot ComplexTPH capabilityOS/VMMPlatformPlatformSTMappingFirmwareSuitable for devices withworkload/Interrupt affinityto coresST TableTPH capabilityPCIe DeviceTPHTTM Advantage9PCI Express* (PCIe*) Technology


Device Specific ModeDevice Specific STassociationBuilds upon Interrupt vectormode s/w supportDevice Driver determinesprocessor affinityDevice DriverDevice SpecificAssignmentsOS/VMMPlatformSTMappingFirmwareNew API required to requestST assignmentsRoot ComplexTPH capabilityPlatformIndependent of InterruptassociationST TableTPH capabilityPCIe DeviceTPHScalable & Flexible Solution10PCI Express* (PCIe*) Technology


TPH Aware Device ArchitectureClassify device initiatedtransactions:Bulk vs. Control‣ Select hints based onData Struct. Use modelsControl Struct. (Descriptors)Headers for Pkt. ProcessingData Payload (Copies)Steering Tag ModesNo ST: Basic Hints only, No STusedInterrupt Vector Mode:Faster TTM with InterruptassociationDevice Specific Mode:Scalable, Flexible & Dynamic,can provide TTM advantageSoftware Development Basic TPH Capability Identification, Discovery and Management Firmware Support to advertise ST assignments OS/Hypervisor ST assignment support Optional API supportTPH permits Device Architecture Specific Trade-Offs11


Device Architecture Considerations• TLP Processing Hints• Power Management Extensions12


System Perspective• Device behavior impacts power consumption of othersystem components– Devices should consider their system power impact, not justtheir own device level power consumption• Extreme example: Enhanced Host Controller Interface (EHCI)• Systems (and devices) are idle most of the time– There’s a big opportunity for devices and systems to takeadvantage of that• Latency Tolerance Reporting (LTR) enables lowerpower, longer exit latency system power states whendevices can tolerate it• Optimized Buffer Flush/Fill (OBFF) enables platformactivity alignment, resulting in system power savings13Opportunity for Devices to Differentiate onPlatform Power Savings


Platform Power Savings OpportunityPlatform Power in Watts4035302520151050Average power scenario running MM07 (Office* Productivity suite)On an Intel ® Core2 Duo platform running Windows* Vista*0 10 20 30 40 50 60 70 80 90 100 110Source: Intel CorporationTime in MinutesIdle workload power(~8W – 10W)PMOpportunity• Usage Analysis: Typical mobile platform in S0 state is ~90% idle• When idle, platform components are kept in high power state tomeet the service latency requirements of devices & applicationsPower consumption for idle workload is high14


Optimized Buffer Flush/Fill• Next several foils describe:– Platform activity alignment– PCI Express* (PCIe*) OBFF mechanisms– OBFF within context of platform activityalignment– Device implementation impacts for OBFF15


Platform Activity AlignmentCurrent PlatformsPlatforms with Activity AlignmentPower Management Opportunities16OS tickinterruptsDeviceinterrupts(critical)• Time Critical• Bufferreplenish• Performance/ThroughputDevice interrupts(deferrable)• Not time critical• Status Notifications• User CommandCompletions• Debug, StatisticsDevicetraffic(critical)• Buffer over-(under-) run• Throughput/PerformanceDevice traffic(deferrable)• No Bufferingconstraints• DebugdumpsCreates PM Opportunities for Semi-active workloads


17Optimized Buffer Flush/Fill (OBFF)ProcessorWake#OptionalOBFFMessageRoot ComplexPCI Express* (PCIe*)Device2MemoryOptimal Windows• Processor Active– Platform fullyactive. Optimal forbus mastering andinterrupts• OBFF – Platformmemory pathavailable formemory read andwrites• Idle – Platform isin low power stateGreatest Potential Improvement WhenImplemented by All Platform DevicesOBFF• Notify all Endpoints ofoptimal windows withminimal power impactSolution1: When possible,use WAKE# with newwire semanticsSolution2: WAKE# notavailable – Use PCIeMessageWAKE# WaveformsTransition EventIdle OBFFIdle Proc. ActiveOBFF/Proc. Active IdleOBFF Proc. ActiveProc. Active OBFFWAKE#


OBFF and Activity AlignmentPlatformPCIePCIePCIeDev ADev BDev CWAKE#Traffic Pattern with No OBFFDevice ADevice BDevice COS TimerTick (intrpt)InterruptsDMAsTraffic Pattern with OBFFTimeIdleWindowActiveWndwIdleOBFFIdleWindowActiveWndwIdleWindowOBFFWndwIdleWindowActiveWindow18PCI Express* (PCIe*) Technology18


OBFF Device ImplementationImpactsMaximize idle windowduration for platformClassify device initiatedtransactions:critical vs. deferrable•Align transactions withother devices in system•Coalesce transactions intogroups where possible– Perform groups oftransactions all atonce, don’t trickle allthe time•Perform criticaltransactions as necessary•Defer other transactionsto align with platformactivity– Decode platform idle/ active / OBFFwindow signalingSelect data buffer depths to tolerate platform activityalignment• 300µs of deferral buffering recommended for Intel mobileplatforms1919


Latency Tolerance Reporting• The next several foils describe:– Power vs. response latency– PCI Express* (PCIe*) LTR mechanisms,semantics– Examples of device implementation schemes• Application state driven LTR reporting• Data buffer depth driven LTR reporting• Software guided LTR reporting– Device implementation impact summary forLTR2020


Power Vs Response Latency(Mobile)Platform Response Latency500msec ~10sec~100usecMaxProc. C0Max powerSystem Active State (S0)WorkloadLow Service Latencyrequirements duringactive workloads givesreliable performanceProc. CX IdleDevices andApplications haveFixed ServiceLatencyexpectations froma platform in S0System Inactive State (Sx)High Service Latencyrequirements duringidle workloads givesmore PM opportunity8-10W 600mWPlatform Power Consumption DecreasingVariable Service Latency requirements in S0 is OptimalS3Response LatencyPenalty increaseswith lower powerstatesS4400mW2121


Latency Tolerance Reporting (LTR)ProcessorLTRMessageRoot ComplexMemoryLTR Mechanism• PCI Express* (PCIe*) Message sent byEndpoint with tolerable latency– Capability to report both snooped & nonsnoopedvalues– “Terminate at Receiver” routing, MFD &Switch send aggregated messageBenefits• Provides Device Benefit: Dynamically tuneplatform PM state as a function of Deviceactivity level• Platform benefit: Enables greater powersavings without impact toperformance/functionalityDynamic LTR22Buffer12PCI Express* DeviceLTR (Max)LTR enables dynamic power vs. performancetradeoffs at minimal cost impactBuffer1IdleLTR (ActivityAdjusted) Buffer 2 Active


Application State Driven LTRExample: WLAN Device Sending LTRLatency(100usec)Latency(1msec)Latency(100usec)Client AwakeClient DozingBeacon with TIM Data PS-Poll ACKLatency information with Wi-Fi Legacy Power SaveExample use of device PM states to give latency guidance2323


Data Buffer Utilization Guided LTRExample: Active Ethernet NIC Sending LTRInitialLatency500µsVery LowPowerA new LTR value is in effect no later than the value sent in the previous LTR msgLatency500µsecHighPowerPlatform Components Power StateLowHigh LowPowerPower PowerLowPowerVery LowPower500µs(PCIe) BusTransactions600µs100µs 100µsDeviceBufferDeviceLTR ThresholdBuffer100µs100usTimeoutIdleIdle24Network DataTimeoutLTR = 100µsLTR = 500µsExample use of buffering to give latency guidancePCI Express* (PCIe*) Technology24


Software Guided LatencySW Guided Latency• Three device categories Static: Device can alwayssupport max platform latency Slow Dynamic: Latencyrequirements changeinfrequently Fast Dynamic: Latencyrequirements change frequently• Static and Slow Dynamic types ofdevices may choose SW guidedmessaging Policy logic for determination ofwhen to send latency messages(and what values) in software E.g. use an MMIO register‣ A write to the register wouldtrigger an LTR messageDevice DriverLTR Policy LogicMMIO RegisterPlatformPowerMgtPolicyEngineLTRPCI Express* Device2525


LTR Device Implementation ImpactsWhen idle, let platform enter deep power saving states‣ Use MaxLatency (LTR Extended Capabilities field) when idle‣ Require low latencies only when necessary – don’t keepplatform in high power state longer than necessaryDynamic, hardwaredriven LTR‣ Leverage applicationbased opportunities totolerate more latency• E.g. WLAN radio offbetween beacons‣ Implement databuffering mechanismto comprehend LTRSWGuided?Software guidedLTR‣ Implementsimple MMIOregisterinterface• Registerwrites causeLTR messageto be sent2626


Software EnablingFeatures requiring basic software supportCapability Discovery,Identification andManagement8GT/s speed upgradeAtomicsTransaction Ordering RelaxationsInternal Error ReportingTLP PrefixFeatures requiring additional supportAbove and beyondcapability enablementResource Allocation,Enumeration & APITransaction Processing HintsLTR & OBFFResizable BARIO Page FaultsDynamic Power AllocationMulticast27


Summary• Next Generation PCI Express* (PCIe*)Protocol Extensions Deliver Energy EfficientPerformance– Protocol Extensions with Broad Applicability• Ecosystem Development is essential– Platform Support– Device architectures optimized around protocolfeatures– Software support and Enabling28


Call to Action• Device Architecture Considerations– Develop Device Architecture to make the most of the most ofproposed protocol extensions• Differentiate products utilizing TPH– Select Hints/ST modes based on device/market segmentrequirements• Differentiate products utilizing LTR and OBFF– Can differentiate by platform power impact not just device power– Power Savings opportunity is huge• Keep track of Next Generation PCI Express* Technologydevelopment– PCI-SIG www.pcisig.com• Engage with Intel on Next Generation PCI Express productdevelopment– www.intel.com/technology/pciexpress/devnet29


30Acknowledgements / Disclaimer• I would like to acknowledge the contributions of the followingIntel employees– Stephen Whalley, Intel Corporation– Jasmin Ajanovic, Intel Corporation– Anil Vasudevan, Intel Corporation– Prashant Sethi, Intel Corporation– Simer Singh, Intel Corporation– Miles Penner, Intel Corporation– Mahesh Natu, Intel Corporation– Rob Gough, Intel Corporation– Eric Wehage, Intel Corporation– Jim Walsh, Intel Corporation– Jaya Jeyaseelan, Intel Corporation– Neil Songer, Intel Corporation– Barnes Cooper, Intel CorporationAll opinions, judgments, recommendations, etc. thatare presented herein are the opinions of the presenterof the material and do not necessarily reflect theopinions of the PCI-SIG * .30


Additional Sources of Informationon This Topic• Other Sessions– TSIS007 –PCI Express* 3.0 Technology: PHYImplementation Considerations on IntelPlatforms– TSIS008 – PCI Express* 3.0 Technology:Electrical Requirements for Designing ASICs onIntel Platforms– TCIQ002: Q&A: PCI Express* 3.0 Technology• www.intel.com/technology/pciexpress/devnet31


Legal Disclaimer• INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NOLICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUALPROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMSAND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER,AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OFINTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR APARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OROTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE INMEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.• Intel may make changes to specifications and product descriptions at any time, without notice.• All products, dates, and figures specified are preliminary based on current expectations, and are subject tochange without notice.• Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, whichmay cause the product to deviate from published specifications. Current characterized errata are available onrequest.• Code names featured are used internally within Intel to identify products that are in development and not yetpublicly announced for release. Customers, licensees and other third parties are not authorized by Intel touse code names in advertising, promotion or marketing of any product or services and any such use of Intel'sinternal code names is at the sole risk of the user• Performance tests and ratings are measured using specific computer systems and/or components and reflectthe approximate performance of Intel products as measured by those tests. Any difference in systemhardware or software design or configuration may affect actual performance.• Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and othercountries.• *Other names and brands may be claimed as the property of others.• Copyright © 2009 Intel Corporation.32


Risk FactorsThe above statements and any others in this document that refer to plans and expectations for the third quarter, the year and thefuture are forward-looking statements that involve a number of risks and uncertainties. Many factors could affect Intel’s actualresults, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially fromthose expressed in these forward-looking statements. Intel presently considers the following to be the important factors that couldcause actual results to differ materially from the corporation’s expectations. Ongoing uncertainty in global economic conditions posea risk to the overall economy as consumers and businesses may defer purchases in response to tighter credit and negative financialnews, which could negatively affect product demand and other related matters. Consequently, demand could be different fromIntel's expectations due to factors including changes in business and economic conditions, including conditions in the credit marketthat could affect consumer confidence; customer acceptance of Intel’s and competitors’ products; changes in customer orderpatterns including order cancellations; and changes in the level of inventory at customers. Intel operates in intensely competitiveindustries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and productdemand that is highly variable and difficult to forecast. Additionally, Intel is in the process of transitioning to its next generation ofproducts on 32nm process technology, and there could be execution issues associated with these changes, including product defectsand errata along with lower than anticipated manufacturing yields. Revenue and the gross margin percentage are affected by thetiming of new Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel'scompetitors, including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to suchactions; and Intel’s ability to respond quickly to technological developments and to incorporate new features into its products. Thegross margin percentage could vary significantly from expectations based on changes in revenue levels; capacity utilization; start-upcosts, including costs associated with the new 32nm process technology; variations in inventory valuation, including variationsrelated to the timing of qualifying products for sale; excess or obsolete inventory; product mix and pricing; manufacturing yields;changes in unit costs; impairments of long-lived assets, including manufacturing, assembly/test and intangible assets; and thetiming and execution of the manufacturing ramp and associated costs. Expenses, particularly certain marketing and compensationexpenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products andthe level of revenue and profits. The current financial stress affecting the banking system and financial markets and the goingconcern threats to investment banks and other financial institutions have resulted in a tightening in the credit markets, a reducedlevel of liquidity in many financial markets, and heightened volatility in fixed income, credit and equity markets. There could be anumber of follow-on effects from the credit crisis on Intel’s business, including insolvency of key suppliers resulting in productdelays; inability of customers to obtain credit to finance purchases of our products and/or customer insolvencies; counterpartyfailures negatively impacting our treasury operations; increased expense or inability to obtain short-term financing of Intel’soperations from the issuance of commercial paper; and increased impairments from the inability of investee companies to obtainfinancing. The majority of our non-marketable equity investment portfolio balance is concentrated in companies in the flash memorymarket segment, and declines in this market segment or changes in management’s plans with respect to our investments in thismarket segment could result in significant impairment charges, impacting restructuring charges as well as gains/losses on equityinvestments and interest and other. Intel's results could be impacted by adverse economic, social, political andphysical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and othersecurity risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Intel'sresults could be affected by adverse effects associated with product defects and errata (deviations from published specifications),and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust and other issues, such as thelitigation and regulatory matters described in Intel's SEC reports. A detailed discussion of these and other risk factors that couldaffect Intel’s results is included in Intel’s SEC filings, including the report on Form 10-Q for the quarter ended June 27, 2009.Rev. 7/27/09

More magazines by this user
Similar magazines