13.07.2015 Views

PGCluster-II (PDF file - 685.0 KB) - PGCon

PGCluster-II (PDF file - 685.0 KB) - PGCon

PGCluster-II (PDF file - 685.0 KB) - PGCon

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>PGCon</strong> 2007<strong>PGCluster</strong>-<strong>II</strong>Clustering system of PostgreSQLusing Shared Data<strong>PGCon</strong> 2007Atsushi MITANI


<strong>PGCon</strong> 2007AgendaIntroductionRequirement<strong>PGCluster</strong>New Requirement<strong>PGCluster</strong>­<strong>II</strong>Structure and Process sequencePros & ConsConclusion


<strong>PGCon</strong> 2007As a backgroundIntroductionRequirement<strong>PGCluster</strong>New Requirement<strong>PGCluster</strong>­<strong>II</strong>Structure and Process sequencePros & ConsConclusion


<strong>PGCon</strong> 2007Status of DBBrokenStopRunData would be lost. sorry...Out of service, but data is remained.Perfect! You can read and write data.Between Run and StopHey, it’s not working.Huum, I can connect it.


<strong>PGCon</strong> 2007What is DBANot good DBABreak DB by wrong patch / restore wrong dataOrdinary DBAmonitors, patches, backups of DBStop DB before data brokenGood DBAStop use such a funky DBJoke ?


<strong>PGCon</strong> 2007High Availability (HA)What is required ?Short down time as much as possibleEven if hardware failure, power down and DBmaintenanceWhy it is required ?Prevent data lost / service stopWho needs ?Data ownerService user


<strong>PGCon</strong> 2007High Performance (HP)What is required ?Short response time as much as possibleWhy it is required ?User dislikes waitingMany processing data is the value of systemWho needs ?Service user


<strong>PGCon</strong> 2007At the beginningIntroductionRequirement<strong>PGCluster</strong>New Requirement<strong>PGCluster</strong>­<strong>II</strong>Structure and Process sequencePros & ConsConclusion


<strong>PGCon</strong> 2007RequirementTarget was Web applicationHigh AvailabilityScheduled maintenance onlyHigh PerformanceMore than 200 accesses / sec●700,000/hr , 1,500,000/day99.9% are data reading queries


<strong>PGCon</strong> 2007As a solutionIntroductionRequirement<strong>PGCluster</strong>New Requirement<strong>PGCluster</strong>­<strong>II</strong>Structure and Process sequencePros & ConsConclusion


<strong>PGCon</strong> 2007<strong>PGCluster</strong>(2002-)Synchronous & Multi­master Replication systemQuery based replication●DB node independent data can replicate– now(),random()No single point of failure●Multiplex load balancer, replication server and cluster DBs.Automatic take over●Restore should do by manuallyAdd cluster DB and replication server on the fly.●Version upgrade as well


<strong>PGCon</strong> 2007Structure of PGCLusterClientsessionLoadbalancerclusterDBreplicatorclusterDBClientsessionLoadbalancerclusterDBreplicator


<strong>PGCon</strong> 2007Pros & Cons of <strong>PGCluster</strong>Enough HAEnough performanceCostfor data reading loadNormal PC serversBSD license SWPerformance issueVery bad for datawriting loadMaintenance issueDocument issue


<strong>PGCon</strong> 2007Demand changes with a timeIntroductionRequirement<strong>PGCluster</strong>New Requirement<strong>PGCluster</strong>­<strong>II</strong>Structure and Process sequencePros & ConsConclusion


<strong>PGCon</strong> 2007Current requirementHigh Availability24/7 non stopHigh PerformanceNot only read but writeReduce cost


<strong>PGCon</strong> 2007Coexistence of HA and HPHA and HP conflict each otherHA required redundancyHP required quick responsePerformance point of viewReplication scales for data reading (not writing)Parallel query has effect in both●However it is not easy to add redundancy (HA).Shared Data Clustering also scales for both●●However, it is not suitable for large data.Shared Disk needs redundancy.


<strong>PGCon</strong> 2007Suitable solution for HA and HPhighSynchronousreplicationavailabilityAsynchronousreplicationShared dataclusteringParallel querylowslowperformancefast


<strong>PGCon</strong> 2007Assumption of the performancewriteRequest typePGCLusterreadsmallData instance sizelargepgpoolmanySlonyConnection numPGCLuster-<strong>II</strong>pgpool-<strong>II</strong>few


<strong>PGCon</strong> 2007As a solutionIntroductionRequirement<strong>PGCluster</strong>New Requirement<strong>PGCluster</strong>­<strong>II</strong>Structure and Process sequencePros & ConsConclusion


<strong>PGCon</strong> 2007What is the <strong>PGCluster</strong>-<strong>II</strong>Data shared clustering systemStorage data shared by shared disk●●NFS,GFS,GPFS(AIX) etc.NASCache and lock status shared by Virtual IPC●Detail as following slides


<strong>PGCon</strong> 2007Concept of Shared DataClusterDBVirtual shared IPCClusterDBClusterDBShared Disk


<strong>PGCon</strong> 2007Inside of <strong>PGCluster</strong>-<strong>II</strong>IntroductionRequirement<strong>PGCluster</strong>New Requirement<strong>PGCluster</strong>­<strong>II</strong>Structure and Process sequencePros & ConsConclusion


<strong>PGCon</strong> 2007Virtual IPCShare semaphore and shared memory duringDB nodesWrite it to remote nodes through cluster processRead it from local node directorySignal and message queue are out of scope


<strong>PGCon</strong> 2007Structure of <strong>PGCluster</strong>-<strong>II</strong>DB noderwpgclusterreqpgclusterDB noderwIPCrreqreqrIPCpostmasterpostmasterShared disk


<strong>PGCon</strong> 2007SemaphoreTo Lock controlHow many semaphores are using?Depends on the “max­connections”settingBy default, 7 x 16 semaphores are used.Mapping table is requiredSemid = 100 to LOCKSemid = 80 to LOCKSemid sem_num index99 2 14100 2 152/15 LOCKSemid sem_num index79 2 1480 2 15


<strong>PGCon</strong> 2007Shared MemoryCommunicate during each backend processesStore data of logs, caches, buffers and so onSingle shared memory is allocatedBut it is divided a number of peacesmore than 100 entry pointer are existing.


<strong>PGCon</strong> 2007Shared Memory usage90% of usageis BufferBlocksShmemVariableCache" LWLockArray ShmemIndex newSpacenewSpace ControlFile XLogCtl CLOG Ctl"SUBTRANS Ctl TwoPhaseState MultiXactOffset Ctl MultiXactMember CtlBufferDescriptors BufferBlocks Shared Buffer LookupTablenewSpaceStrategyControl LOCK hash newSpace PROCLOCK hash"newSpace ProcGlobal DummyProcs newSpaceprocs ProcStructLock procArray BackendStatusArrayshmInvalBuffer FreeSpaceMap Free Space Map Hash newSpaceFreeSpaceMap->arena" PMSignalFlags BgWriterShmem btvacinfo


<strong>PGCon</strong> 2007Issues of Shared MemoryActivity issueSize is not big but update frequency is very highContents issueIt is including memory address it selfIf copy shared memory to other server, other DBserver may be crashed.Address Data Type Label&1000 &1004 Char * Data&1004 1 OID Oid&1008 &1012 Char * Next&1012 &1024 Char * DataAddress Data Type Label&2000 &1004 Char * Data&2004 1 OID Oid&2008 &1012 Char * Next&2012 &1024 Char * Data


<strong>PGCon</strong> 2007SolutionAll address data should not copyCopy mask table is requiredAll address data should translate to each localaddressData address Offset is required in each addressdata


<strong>PGCon</strong> 2007Mask & Transrate SequenceAddress Data Type Label&1000 ‘+12’ Int data_offset&1004 ‘+20’ Int next_offset&1008 &1012 Char * Data&1012 1 OID Oid&1016 &1020 Char * Next&1020 ‘+32’ Int data_offsetAddress offset addedAddress data maskedCopy with maskAddress Data Type Label&2000 ‘+12’ Int data_offset&2004 ‘+20’ Int next_offset&2008 Char * Data&2012 1 OID Oid&2016 Char * Next&2020 ‘+32’ Int data_offsetChange offsetto local addressAddress Data Type Label&2000 ‘+12’ Int data_offset&2004 ‘+20’ Int next_offset&2008 &2012 Char * Data&2012 1 OID Oid&2016 &2020 Char * Next&2020 ‘+32’ Int data_offset


<strong>PGCon</strong> 2007Shared DiskEach node shares all db clusterbase/, global/, pg_clog/, pg_multixact/,pg_subtrans/, pg_tblspc/, pg_twophase/, pg_xlog/Each node has own configuration <strong>file</strong>spg_hba.conf, pg_ident.conf, postgresql.conf,pgcluster.confEach node should have same setup valuesConnections (max_connections)Resource usage(memory, Free Space Map)


<strong>PGCon</strong> 2007pgcluster.confPgcluster table descriptionHostname/IP & portMultiple servers can be describedTop described server may be master.Self node descriptionhostname/IP & portOnly one node can be described


<strong>PGCon</strong> 2007Startup sequenceNode1Node2Postgres Pgcluster Pgcluster PostgresStart upCreate SEMCreate SHMBegin reqSearch other nodesCreate node tableBegin ansStart upListenCreate SEMCreate SHMBegin reqSearch other nodesSync reqAdd new nodeSend SEMSync SEM reqCopy SEMSync SEM ansSend SHMSync SHM reqCopy SHMSync SHM ansSend node tableSync SYS reqCopy node tableSync SYS ansSync SEM ansBegin ansListen


<strong>PGCon</strong> 2007IPC sync sequenceNode1Node2Postgres Pgcluster Pgcluster PostgresWrite IPC callReplicate IPC reqSearch other nodesWrite IPC Execute IPC reqWrite IPCExecute IPC ansReplicate IPC ansWrite IPC doneSearch other nodesWrite IPCReplicate IPC reqExecute IPC reqExecute IPC ansSearch other nodesWrite IPCReplicate IPC reqWrite IPC callReplicate IPC ansReplicate IPC ansWrite IPC done


<strong>PGCon</strong> 2007Stop sequenceNode1Node2Postgres Pgcluster Pgcluster Postgresexit_proc()Stop reqSearch other nodesEnd reqUpdate node tableEnd ansStop ansDelete IPCexit_proc()Delete IPCStop reqStop ansSearch other nodesEnd reqEnd ansUpdate node table


<strong>PGCon</strong> 2007As a resultIntroductionRequirement<strong>PGCluster</strong>New Requirement<strong>PGCluster</strong>­<strong>II</strong>Structure and Process sequencePros & ConsConclusion


<strong>PGCon</strong> 2007Pros & ConsEasy to add a node forredundancy / replace.Data writingperformance does notslow by adding node.Big improve to datareading / manyconnection load.Required large RAM.Data writing does notbecome fast by addingnode.Writing performanceis not good.Nothing expandsCostexcept CPU & network I/OShared disk


<strong>PGCon</strong> 2007PossibilitySuitable placeIt will be one of solutions the system which has highCPU load and network load.●Most of WEB system, a part of the Online TransactionProcessing(OLTP) systemCombination of <strong>PGCluster</strong>­<strong>II</strong> and pgpool­<strong>II</strong><strong>PGCluster</strong>­<strong>II</strong> might get performance with large data.


<strong>PGCon</strong> 2007IntroductionRequirement<strong>PGCluster</strong>New Requirement<strong>PGCluster</strong>­<strong>II</strong>From nowStructure and Process sequencePros & ConsConclusion


<strong>PGCon</strong> 2007TODOPerformance should more improve.Some write (and erase) memory data is not needto sync.The conversion methods (from offset to localaddress) should improve.Release source codeASAPDocumentation as well


<strong>PGCon</strong> 2007Thank youAsk us about <strong>PGCluster</strong>pgcluster­general@pgfoundry.orgAsk me about <strong>PGCluster</strong>­<strong>II</strong>mitani@sraw.co.jpYou can download this slide fromhttp://pgfoundry.org/docman/?group_id=1000072

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!