12.07.2015 Views

Eliminating performance bottlenecks in Caché-based apps

Eliminating performance bottlenecks in Caché-based apps

Eliminating performance bottlenecks in Caché-based apps

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Elim<strong>in</strong>at<strong>in</strong>g</strong> PerformanceBottlenecks <strong>in</strong> Caché<strong>based</strong>ApplicationsAla<strong>in</strong> Houf


AgendaIntroduc<strong>in</strong>g Concepts and MetricsIdentify<strong>in</strong>g Worst OffendersTest for PerformanceTop Tricks to Improve Performance


Plan for Today• Introduce metrics that reflect application<strong>performance</strong>• Learn about tools to capture and analyze thesemetrics, <strong>in</strong>clud<strong>in</strong>g identify<strong>in</strong>g the worst offenders• Study some efficient ways to improve these metricsand therefore improve application <strong>performance</strong>


Address<strong>in</strong>g Performance ChallengesOptions:1) Identify hardware and software configuration thatwould susta<strong>in</strong> required load2) Adjust exist<strong>in</strong>g application to better utilize exist<strong>in</strong>gresourcesOur focus is on #2


System Performance Limit<strong>in</strong>g FactorsCPUInput/Output(Deliver<strong>in</strong>g data to/fromCPU)


System Performance Limit<strong>in</strong>g FactorsCPUMemory- Disk- NetworkI/O


System-level MetricsCPUMemory- Disk- NetworkI/OCPU Utilizationiowait/Disk Queue


System Stats – Big Picture• Snapshot (OS – level tools)• Perfmon.exe, Task Manager, Activity Monitor, top, vmstat• Look for CPU utilization, Top CPU consum<strong>in</strong>g processes, Diskqueue/iowait


System Stats – Process ID• OS Process ID = Caché PID ($Job)• Try to match OS process ID with Caché Process <strong>in</strong>Management Portal**if you are lucky you‟ll catch your worst offend<strong>in</strong>g process right there


Caché-level MetricsCPURout<strong>in</strong>eCommandsMemoryCache EfficiencyNon DB I/OFiles, NetworkDB I/OCACHE.DAT, WIJ, JournalsGloRefsTotal Execution Time


Key Metrics Caché - side• GloRef• S<strong>in</strong>gle access to global (get, set, kill)• Roughly translates to I/O (DB Files and Memory)• Rout<strong>in</strong>e Commands• S<strong>in</strong>gle COS <strong>in</strong>struction• Roughly reflects CPU load• Execution Time• Overall measure of <strong>performance</strong>• Accounts for “not all GloRefs are created equal” andother factors


^GLOSTAT• Caché High-Level <strong>performance</strong> stats


^GLOSTAT^GLOSTATTo run: DO ^GLOSTAT <strong>in</strong> %SYS namespace10 to count data for 10 secondsMetrics to watch for:Global references ~ Disk+Memory activityRout<strong>in</strong>e commands ~ CPU activityCache Efficiency ~ Memory vs. Diskaccess (how good are your global buffers)


Capture Data Over Time• It is important to capture data over representativeperiod of time• OS – Level• logman (W<strong>in</strong>dows)• sar, vmstat, iostat (*nix)• Caché – Level• ^mgstat, SYS.History• Enterprise tools• SNMP, WMI• BMC Patrol


Concepts and Metrics - Summary• I/O and CPU are two primary causes of<strong>performance</strong> problems• GloRefs, Rout<strong>in</strong>eCommands and Execution Timeare most important Caché-side metrics to watch• Decreas<strong>in</strong>g these metrics per task ontest/development system would most likely helpproduction system as well


AgendaIntroduc<strong>in</strong>g Concepts and MetricsIdentify<strong>in</strong>g Worst OffendersTest for PerformanceTop Tricks to Improve Performance


^PERFMON• Command – l<strong>in</strong>e/API tool for collect<strong>in</strong>g <strong>performance</strong>statistics arranged by globals and rout<strong>in</strong>es• Can answer questions such as:• What rout<strong>in</strong>e responsible for the most l<strong>in</strong>es of codeexecuted?• What global was accessed the most?• Directory block writes per rout<strong>in</strong>e…• etc.


DemoPERFMON/%SYS.MONLBL <strong>in</strong>vestigationIdentify busiest l<strong>in</strong>e[s] of code


^PERFMON^PERFMON%SYS>do ^PERFMON Start Monitor (default) Processes (not important) Rout<strong>in</strong>es (we want all of them) Globals (we want all of them) (default) Databases (not important) (default) Network Nodes (notimportant)Wait 2 m<strong>in</strong>utes…


^PERFMON^PERFMON Pause Monitor Report Statistics Custom Category Report All Metrics Sort by Rout<strong>in</strong>e Excel format Stop Monitor…


^PERFMON“Freeze Panes” on B2Sort by Global References, Rout<strong>in</strong>e L<strong>in</strong>es…


Understand<strong>in</strong>g PERFMON Report• Most Important Metrics• Global References/Rout<strong>in</strong>e L<strong>in</strong>es – Overall activity• Important Metrics• Data Block Read/Write – What triggers physical diskaccess• Journal Entries, Block Allocation – Add<strong>in</strong>g new data• Lock/Lock Fail – potential lock conflict


^%SYS.MONLBL• Similar <strong>in</strong> UI and metrics to PERFMON, but collectsdata on l<strong>in</strong>e-by-l<strong>in</strong>e basisTime – Time spend execut<strong>in</strong>g that l<strong>in</strong>e of codeTotal Time – Execut<strong>in</strong>g l<strong>in</strong>e of code and all calls


Monitor<strong>in</strong>g – Other Tools• %Monitor.Process – low footpr<strong>in</strong>t <strong>in</strong>-processmonitor<strong>in</strong>g tool• Collects same set of metrics as PERFMON/MONLBL• No <strong>performance</strong> impact (safe for production)• Data accessible only with<strong>in</strong> process (need to use API)• ^PROFILE• Unix‟s „top‟ – like UI• “Hybrid” of PERFMON and %SYS.MONLBL• Good for brief first look


^PROFILE


New Stuff!!!• GLOBUFF• Utility to identify Global Bufferusage per-globalNew <strong>in</strong> 2012.2• BLKCOL• Utility to detect block collisions• Same mach<strong>in</strong>ery as cstat –D10,1026


^GLOBUFFNew <strong>in</strong> 2012.227


^BLKCOLNew <strong>in</strong> 2012.228


SQL Monitor<strong>in</strong>g - %SYS.PTools• SQL Monitor<strong>in</strong>g tool• Attached to cached queries (requires queries purge)• Works with Class Queries and Embedded SQL(Requires recompile)• Collects data about executed queries (time, glorefs etc.)• SQL <strong>in</strong>terface to results (%SYS_PTools package)


DemoSQL Monitor<strong>in</strong>g with %SYS.PToolsCount # of times executed and total time spend with<strong>in</strong> each SQL queryDO $SYSTEM.SQL.SetSQLStats(2)Purge Queries/Recompile embedded SQLDo $SYSTEM.SQL.Purge()Run some SQL queriesExam<strong>in</strong>e %SYS_PTools.SQLStatsView


%SYS.PTools ExampleSELECT Rout<strong>in</strong>eName, QueryText, COUNT(*) as CountTotal,SUM(TotalTime) as TimeTotalFROM %SYS_PTools.SQLStatsViewWHERE namespace='PERF'GROUP BY Rout<strong>in</strong>eNameORDER BY CountTotal


Caché and OS ToolsSystem activityoverview“Worstoffenders”Process-levelmicroscopeW<strong>in</strong>dows Unix CachéPerfmon.exelogmanvmstat, iostat,sarGLOSTAT, mgstatTask Manager top PERFMONPROFILE%SYS.PTools^BLKCOLProcessExplorertruss, straceOther stats SysInternals Misc. tools cstat%SYS.MONLBL,%Monitor.Process


Identify<strong>in</strong>g Bottlenecks - Summary• Identify<strong>in</strong>g worst offenders? There are <strong>apps</strong> tools forthat:• PERFMON• %SYS.MONLBL• PROFILE• %SYS.PTools


AgendaIntroduc<strong>in</strong>g Concepts and MetricsIdentify<strong>in</strong>g Worst OffendersTest for PerformanceTop Tricks to Improve Performance


What’s Faster? (COS answer)• Collect<strong>in</strong>g metrics <strong>in</strong>side COS code:• $zh – time with maximum possible precision• %Monitor.Process• ##class(%SYS.ProcessQuery).%OpenId($j)• CommandsExecuted• GlobalReferences


What’s Faster? SQL Answer• SQL – System Management Portal expose metrics fortime and global references:


Query Plan• System Management Portal ->Execute SQL -> Show QueryPlan• Identify what <strong>in</strong>dices and tempstructures are used• Relative cost is relative (nomean<strong>in</strong>g outside of compar<strong>in</strong>gtwo queries)• You may also look at/parsecached query source


Sample SQL OptimizationBefore:SELECT <strong>in</strong>t1, <strong>in</strong>t2, oneplustwo, str1FROM perf.testWHERE str1 %startswith ‟b'After:SELECT <strong>in</strong>t1, <strong>in</strong>t2, <strong>in</strong>t1+<strong>in</strong>t2 as oneplustwo, str1FROM perf.testWHERE str1 %startswith ‟b'Orig<strong>in</strong>al Avoid external call Add IndexTime sec. 1.259 0.195 0.157Glorefs 136083 56236 19015


Test for Performance - Summary• Monitor Rout<strong>in</strong>e Commands, Global References, Time• Tools to measure:• %SYS.ProcessQuery, %Monitor.Process, $zh <strong>in</strong> COS• System Management portal for SQL or wrap SQL query<strong>in</strong> COS rout<strong>in</strong>e to capture metrics• With rare exceptions – if you manage to decrease ANYof these metrics – you’ll improve the overall<strong>performance</strong>


AgendaIntroduc<strong>in</strong>g Concepts and MetricsIdentify<strong>in</strong>g Worst OffendersTest for PerformanceTop Tricks to Improve Performance


Top 3* COS Performance Optimizations1. Elim<strong>in</strong>ate [External] Calls2. Reduce # of System calls3. Pick effective temp data structures


Avoid External and Internal CallsTime required to execute simple A+B operation 10k times:External Function $$Add^Library 0.018Function $$Add 0.008Macro (no call) $$$Add 0.001Set c=$$Add(a+b)Add(a,b) {Quit a+b}#def<strong>in</strong>e Add(%a,%b) (%a+%b)Set c=$$$Add(a,b)


Case Study - Generators• Complex logic, access<strong>in</strong>g class Metadata. Multiplesuperclasses. Each class implements Generator<strong>based</strong>ReturnMeta() method, call<strong>in</strong>g ReturnMeta()of superclassesSuperClassesInl<strong>in</strong>eTotal Time 8.762153 0.8962Total Commands 1990989 1290963• Avoid<strong>in</strong>g calls to methods ofsuperclasses ~10x faster10x faster


Reduce # of System CallsSet file=##class(%File).%New("file.txt")Do file.Open("WSN")for i=1:1:1000 {for j=0:1:255 {Do file.Write($c(j))Inefficient}do file.WriteL<strong>in</strong>e()}do file.%Close()


Reduce # of System CallsSet file=##class(%File).%New("file.txt")Do file.Open("WSN")for i=1:1:1000 {set l<strong>in</strong>e=""for j=0:1:255 {set l<strong>in</strong>e=l<strong>in</strong>e_$c(j)}do file.WriteL<strong>in</strong>e(l<strong>in</strong>e)}do file.%Close()


Case Study – M<strong>in</strong>imiz<strong>in</strong>g # of System Calls• Orig<strong>in</strong>al application – read<strong>in</strong>g and process<strong>in</strong>gbunch of small files (3-5 records/file). Us<strong>in</strong>g %Fileand %FileCharacterStream to access filesystem• ~7k records/second• Optimized – Low-Level API ($ZSearch, Open, Use,Read) to scan directory and read data• ~56k records/second8x faster


Temp Data• DOs:• Local Variables/Arrays• Start<strong>in</strong>g with 2010.2 super fast even with large # of nodes• More <strong>performance</strong>-related changes com<strong>in</strong>g <strong>in</strong> 2013• Special optimization for s<strong>in</strong>gle <strong>in</strong>teger subscripts• CACHETEMP• Faster than local arrays with large number of nodes• Prone to block contentions• Process-Private globals• Currently CACHETEMP – <strong>based</strong>, but immune to block contentions• DON’Ts:• “Temporary” objects, tables


Misc. TricksHigh-Performance DataLoad- Direct Global Access/Embedded SQL- $SortBeg<strong>in</strong>()/$SortEnd() for unsorted data- Defer build<strong>in</strong>g <strong>in</strong>dices if feasible- Use low-level API to read files


Misc TricksRelationships- One-to-ManyAlways add <strong>in</strong>dex on “one” side


Misc TricksBatch Jobs- ^%PRIO to manipulate process priority- $PrefetchOn()/$PrefetchOff()


Misc TricksReduc<strong>in</strong>g Lock ConflictsShared vs. Exclusive (default) lockL+ ^global#“S”Object Concurrency.%OpenId(id,0) concurrency level 0-4SQL%NOLOCKTransaction Isolation Level


Misc Tricks$Piece vs. $List$List is usually faster for largestructures$ListFromStr<strong>in</strong>g() to convert from $P


Misc TricksMisc. COS$Case||, &&$Order, $Data$Bit*()


Top 3* SQL Performance Optimizations1. Add miss<strong>in</strong>g <strong>in</strong>dex2. Enable/Enforce usage of right <strong>in</strong>dex3. Effective use of functions


Identify<strong>in</strong>g Index Usage• System Management Portal ->Execute SQL -> Show QueryPlan• Look at the source code ofgenerated Cached Query, searchfor ‘$o(^’• Check if process statusconstantly at ^TableD global


Case Study – Add Index• Inefficient Client-Server code – (up to 6 JDBCstatements/roundtrips to update s<strong>in</strong>gle record) ~ 10records/second <strong>in</strong> a lab• Mov<strong>in</strong>g to stored procedure (1 round trip) ~ 150rec/sec <strong>in</strong> a lab• In production - load<strong>in</strong>g 150k records still takes 8 hours• One of SELECTs miss<strong>in</strong>g <strong>in</strong>dex – not noticeable <strong>in</strong> thelab, but extremely slow <strong>in</strong> production (subject to # ofrows)• Add <strong>in</strong>dex8 hours - > 2 m<strong>in</strong>.


Enabl<strong>in</strong>g Index Usage• TuneTable() to make sure stats are right• Make sure “where field” is used “as is” and not asargument to a functionSELECT * FROM packagesWHERE dbo.ToUnixDate(starttime)>1262584800AND dbo.ToUnixDate(starttime)


Misc TricksSQL H<strong>in</strong>ts%NOLOCK%NOCHECK%NOINDEX%INORDER%FULL%IGNOREINDICES…See 2012.2 documentation for more


Misc TricksAdvanced SQL IndicesBitmapBitSliceCollection%Text- multi-condition selects- aggregates- list, arrays- full text search


Case Study - Functions• This query searches for “pattern” with<strong>in</strong> URL str<strong>in</strong>g:select dpiau.*, %exact(dpias_step) as dpias_step, %exact(dpias_type) as dpias_type, dpicuh.* fromdpi_client_url_history dpicuh left jo<strong>in</strong> dpi_application_urls dpiau on dpicuh_appurlid = dpiau_id left jo<strong>in</strong>dpi_application_steps on dpiau_apppatternid = dpias_patternid where dpias_step is not null anddpias_patternid = 1 and( dpiau_url like '%?'||dpias_step||'&%'or dpiau_url like '%?'||dpias_stepor dpiau_url like '%&'||dpias_step||'&%'Inefficientor dpiau_url like '%&'||dpias_stepor dpiau_url like '%/'||dpias_step||'/%'or dpiau_url like '%/'||dpias_step||'?%'or dpiau_url like '%/'||dpias_step)


Case Study - Functions• This query searches for “pattern” with<strong>in</strong> URL str<strong>in</strong>g:select dpiau.*, %exact(dpias_step) as dpias_step, %exact(dpias_type) as dpias_type, dpicuh.* fromdpi_client_url_history dpicuh left jo<strong>in</strong> dpi_application_urls dpiau on dpicuh_appurlid = dpiau_id left jo<strong>in</strong>dpi_application_steps on dpiau_apppatternid = dpias_patternid where dpias_step is not null anddpias_patternid = 1 andMyCustomPattern(dpiau_url, dpias_step)• Solution – change comb<strong>in</strong>ation of conditions to s<strong>in</strong>glefunction• Result ~100x faster100x improvement


Optimiz<strong>in</strong>g Performance - Summary• Most Common Optimizations COS:1. <strong>Elim<strong>in</strong>at<strong>in</strong>g</strong> [External] Calls2. Reduc<strong>in</strong>g # of System calls3. Pick<strong>in</strong>g effective temp data structures4. …• Most Common Optimizations SQL:1. Add miss<strong>in</strong>g <strong>in</strong>dex2. Enable/Enforce usage of right <strong>in</strong>dex3. Effective use of functions4. …


Upgrade!!!• Recent free-of-charge improvements• Per-process rout<strong>in</strong>e-vector cach<strong>in</strong>g• Faster object dispatch• C level code for common object methods• New, faster!!! local arrays


AgendaIntroduc<strong>in</strong>g Concepts and MetricsIdentify<strong>in</strong>g Worst OffendersTest for PerformanceTop Tricks to Improve Performance


Summary• GloRefs, Rout<strong>in</strong>eCommands, Execution Time• Understand• Learn how to capture• Analyze• Improve

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!