PGCluster-II (PDF file - 685.0 KB) - PGCon

PGCon 2007PGCluster-IIClustering system of PostgreSQLusing Shared DataPGCon 2007Atsushi MITANI

PGCon 2007AgendaIntroductionRequirementPGClusterNew RequirementPGClusterIIStructure and Process sequencePros & ConsConclusion

PGCon 2007As a backgroundIntroductionRequirementPGClusterNew RequirementPGClusterIIStructure and Process sequencePros & ConsConclusion

PGCon 2007Status of DBBrokenStopRunData would be lost. sorry...Out of service, but data is remained.Perfect! You can read and write data.Between Run and StopHey, it’s not working.Huum, I can connect it.

PGCon 2007What is DBANot good DBABreak DB by wrong patch / restore wrong dataOrdinary DBAmonitors, patches, backups of DBStop DB before data brokenGood DBAStop use such a funky DBJoke ?

PGCon 2007High Availability (HA)What is required ?Short down time as much as possibleEven if hardware failure, power down and DBmaintenanceWhy it is required ?Prevent data lost / service stopWho needs ?Data ownerService user

PGCon 2007High Performance (HP)What is required ?Short response time as much as possibleWhy it is required ?User dislikes waitingMany processing data is the value of systemWho needs ?Service user

PGCon 2007At the beginningIntroductionRequirementPGClusterNew RequirementPGClusterIIStructure and Process sequencePros & ConsConclusion

PGCon 2007RequirementTarget was Web applicationHigh AvailabilityScheduled maintenance onlyHigh PerformanceMore than 200 accesses / sec●700,000/hr , 1,500,000/day99.9% are data reading queries

PGCon 2007As a solutionIntroductionRequirementPGClusterNew RequirementPGClusterIIStructure and Process sequencePros & ConsConclusion

PGCon 2007PGCluster(2002-)Synchronous & Multimaster Replication systemQuery based replication●DB node independent data can replicate– now(),random()No single point of failure●Multiplex load balancer, replication server and cluster DBs.Automatic take over●Restore should do by manuallyAdd cluster DB and replication server on the fly.●Version upgrade as well

PGCon 2007Structure of PGCLusterClientsessionLoadbalancerclusterDBreplicatorclusterDBClientsessionLoadbalancerclusterDBreplicator

PGCon 2007Pros & Cons of PGClusterEnough HAEnough performanceCostfor data reading loadNormal PC serversBSD license SWPerformance issueVery bad for datawriting loadMaintenance issueDocument issue

PGCon 2007Demand changes with a timeIntroductionRequirementPGClusterNew RequirementPGClusterIIStructure and Process sequencePros & ConsConclusion

PGCon 2007Current requirementHigh Availability24/7 non stopHigh PerformanceNot only read but writeReduce cost

PGCon 2007Coexistence of HA and HPHA and HP conflict each otherHA required redundancyHP required quick responsePerformance point of viewReplication scales for data reading (not writing)Parallel query has effect in both●However it is not easy to add redundancy (HA).Shared Data Clustering also scales for both●●However, it is not suitable for large data.Shared Disk needs redundancy.

PGCon 2007Suitable solution for HA and HPhighSynchronousreplicationavailabilityAsynchronousreplicationShared dataclusteringParallel querylowslowperformancefast

PGCon 2007Assumption of the performancewriteRequest typePGCLusterreadsmallData instance sizelargepgpoolmanySlonyConnection numPGCLuster-IIpgpool-IIfew

PGCon 2007As a solutionIntroductionRequirementPGClusterNew RequirementPGClusterIIStructure and Process sequencePros & ConsConclusion

PGCon 2007What is the PGCluster-IIData shared clustering systemStorage data shared by shared disk●●NFS,GFS,GPFS(AIX) etc.NASCache and lock status shared by Virtual IPC●Detail as following slides

PGCon 2007Concept of Shared DataClusterDBVirtual shared IPCClusterDBClusterDBShared Disk

PGCon 2007Inside of PGCluster-IIIntroductionRequirementPGClusterNew RequirementPGClusterIIStructure and Process sequencePros & ConsConclusion

PGCon 2007Virtual IPCShare semaphore and shared memory duringDB nodesWrite it to remote nodes through cluster processRead it from local node directorySignal and message queue are out of scope

PGCon 2007Structure of PGCluster-IIDB noderwpgclusterreqpgclusterDB noderwIPCrreqreqrIPCpostmasterpostmasterShared disk

PGCon 2007SemaphoreTo Lock controlHow many semaphores are using?Depends on the “maxconnections”settingBy default, 7 x 16 semaphores are used.Mapping table is requiredSemid = 100 to LOCKSemid = 80 to LOCKSemid sem_num index99 2 14100 2 152/15 LOCKSemid sem_num index79 2 1480 2 15

PGCon 2007Shared MemoryCommunicate during each backend processesStore data of logs, caches, buffers and so onSingle shared memory is allocatedBut it is divided a number of peacesmore than 100 entry pointer are existing.

PGCon 2007Shared Memory usage90% of usageis BufferBlocksShmemVariableCache" LWLockArray ShmemIndex newSpacenewSpace ControlFile XLogCtl CLOG Ctl"SUBTRANS Ctl TwoPhaseState MultiXactOffset Ctl MultiXactMember CtlBufferDescriptors BufferBlocks Shared Buffer LookupTablenewSpaceStrategyControl LOCK hash newSpace PROCLOCK hash"newSpace ProcGlobal DummyProcs newSpaceprocs ProcStructLock procArray BackendStatusArrayshmInvalBuffer FreeSpaceMap Free Space Map Hash newSpaceFreeSpaceMap->arena" PMSignalFlags BgWriterShmem btvacinfo

PGCon 2007Issues of Shared MemoryActivity issueSize is not big but update frequency is very highContents issueIt is including memory address it selfIf copy shared memory to other server, other DBserver may be crashed.Address Data Type Label&1000 &1004 Char * Data&1004 1 OID Oid&1008 &1012 Char * Next&1012 &1024 Char * DataAddress Data Type Label&2000 &1004 Char * Data&2004 1 OID Oid&2008 &1012 Char * Next&2012 &1024 Char * Data

PGCon 2007SolutionAll address data should not copyCopy mask table is requiredAll address data should translate to each localaddressData address Offset is required in each addressdata

PGCon 2007Mask & Transrate SequenceAddress Data Type Label&1000 ‘+12’ Int data_offset&1004 ‘+20’ Int next_offset&1008 &1012 Char * Data&1012 1 OID Oid&1016 &1020 Char * Next&1020 ‘+32’ Int data_offsetAddress offset addedAddress data maskedCopy with maskAddress Data Type Label&2000 ‘+12’ Int data_offset&2004 ‘+20’ Int next_offset&2008 Char * Data&2012 1 OID Oid&2016 Char * Next&2020 ‘+32’ Int data_offsetChange offsetto local addressAddress Data Type Label&2000 ‘+12’ Int data_offset&2004 ‘+20’ Int next_offset&2008 &2012 Char * Data&2012 1 OID Oid&2016 &2020 Char * Next&2020 ‘+32’ Int data_offset

PGCon 2007Shared DiskEach node shares all db clusterbase/, global/, pg_clog/, pg_multixact/,pg_subtrans/, pg_tblspc/, pg_twophase/, pg_xlog/Each node has own configuration filespg_hba.conf, pg_ident.conf, postgresql.conf,pgcluster.confEach node should have same setup valuesConnections (max_connections)Resource usage(memory, Free Space Map)

PGCon 2007pgcluster.confPgcluster table descriptionHostname/IP & portMultiple servers can be describedTop described server may be master.Self node descriptionhostname/IP & portOnly one node can be described

PGCon 2007Startup sequenceNode1Node2Postgres Pgcluster Pgcluster PostgresStart upCreate SEMCreate SHMBegin reqSearch other nodesCreate node tableBegin ansStart upListenCreate SEMCreate SHMBegin reqSearch other nodesSync reqAdd new nodeSend SEMSync SEM reqCopy SEMSync SEM ansSend SHMSync SHM reqCopy SHMSync SHM ansSend node tableSync SYS reqCopy node tableSync SYS ansSync SEM ansBegin ansListen

PGCon 2007IPC sync sequenceNode1Node2Postgres Pgcluster Pgcluster PostgresWrite IPC callReplicate IPC reqSearch other nodesWrite IPC Execute IPC reqWrite IPCExecute IPC ansReplicate IPC ansWrite IPC doneSearch other nodesWrite IPCReplicate IPC reqExecute IPC reqExecute IPC ansSearch other nodesWrite IPCReplicate IPC reqWrite IPC callReplicate IPC ansReplicate IPC ansWrite IPC done

PGCon 2007Stop sequenceNode1Node2Postgres Pgcluster Pgcluster Postgresexit_proc()Stop reqSearch other nodesEnd reqUpdate node tableEnd ansStop ansDelete IPCexit_proc()Delete IPCStop reqStop ansSearch other nodesEnd reqEnd ansUpdate node table

PGCon 2007As a resultIntroductionRequirementPGClusterNew RequirementPGClusterIIStructure and Process sequencePros & ConsConclusion

PGCon 2007Pros & ConsEasy to add a node forredundancy / replace.Data writingperformance does notslow by adding node.Big improve to datareading / manyconnection load.Required large RAM.Data writing does notbecome fast by addingnode.Writing performanceis not good.Nothing expandsCostexcept CPU & network I/OShared disk

PGCon 2007PossibilitySuitable placeIt will be one of solutions the system which has highCPU load and network load.●Most of WEB system, a part of the Online TransactionProcessing(OLTP) systemCombination of PGClusterII and pgpoolIIPGClusterII might get performance with large data.

PGCon 2007IntroductionRequirementPGClusterNew RequirementPGClusterIIFrom nowStructure and Process sequencePros & ConsConclusion

PGCon 2007TODOPerformance should more improve.Some write (and erase) memory data is not needto sync.The conversion methods (from offset to localaddress) should improve.Release source codeASAPDocumentation as well

PGCon 2007Thank youAsk us about PGClusterpgclustergeneral@pgfoundry.orgAsk me about PGClusterIImitani@sraw.co.jpYou can download this slide fromhttp://pgfoundry.org/docman/?group_id=1000072

PGCluster-II (PDF file - 685.0 KB) - PGCon

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?