Large-Scale Graph-Structured Machine Learning - GraphAnalysis.org

graphanalysis.org
  • No tags were found...

Large-Scale Graph-Structured Machine Learning - GraphAnalysis.org

Graphs are Essential toData-Mining and Machine Learning• Identify influential people and information• Find communities• Target ads and products• Model complex data dependencies3


UsersNetflixMoviesIterate:Matrix FactorizationAlternating Least Squares (ALS)≈ xUsersf(i)Moviesf(j)User Factors (U)f(1)r 13r 14rf(2) 24r 25f(3)f(4)f(5)Movie Factors (M)Factor forUser iFactor forMovie j


PageRankRank ofuser iWeighted sum ofneighbors’ ranks• Everyone starts with equal ranks• Update ranks in parallel• Iterate until convergence7


How should we programgraph-parallel algorithms?High-levelAbstractions!-Me, now10


The Graph-Parallel Abstraction• A user-defined Vertex-Program runs on each vertex• Graph constrains interaction along edges“Think– Using messageslike(e.g. Pregela[PODC’09,Vertex.”SIGMOD’10])– Through shared state (e.g., GraphLab [UAI’10, VLDB’12])-Malewicz et al. [SIGMOD’10]• Parallelism: run multiple vertex programs simultaneously11


Graph-parallel AbstractionsBetter for Machine LearningMessagingiShared StateiSynchronousDynamic Asynchronous12


The GraphLab Vertex ProgramVertex Programs directly access adjacent vertices and edgesGraphLab_PageRank(i)// Compute sum over neighborstotal = 0foreach( j in neighbors(i)):total = total + R[j] * w ji// Update the PageRankR[i] = 0.15 + totalR[4] * w 414 1++3 2// Trigger neighbors to run againpriority = |R[i] – oldR[i]|if R[i] not converged thensignal neighborsOf(i) with priority13


Benefit of Dynamic PageRankBetterNum-Vertices100000000100000010000100151% updated only once!0 10 20 30 40 50 60 70Number of Updates14


GraphLab Asynchronous ExecutionThe scheduler determines the order that vertices are executedCPU 1abcdSchedulerbahiCPU 2hef gijkScheduler can prioritize vertices.


Asynchronous Belief PropagationChallenge = BoundariesManyUpdatesSynthetic Noisy ImageCumulative Vertex UpdatesFewUpdatesGraphical ModelAlgorithm identifies and focuseson hidden sequential structure


GraphLab Ensures a Serializable Execution• Enables: Gauss-Seidel iterations, GibbsSampling, Graph Coloring, …


Never Ending Learner Project (CoEM)• Language modeling: named entity recognitionHadoop (BSP) 95 Cores 7.5 hrsGraphLab 16 Cores 30 minDistributedGraphLab6x 32 fewer EC2 CPUs!machines80 secs15x Faster!0.3% of Hadoop time18


Thus far…GraphLab provided apowerful new abstractionBut…We couldn’t scale up toAltavista Webgraph from 20021.4B vertices, 6.6B edges


Natural GraphsGraphs derived from naturalphenomena20


Properties of Natural GraphsRegular MeshNatural GraphPower-Law Degree Distribution21


Power-Law Degree DistributionNumber of Verticescount10 10 More than 10 8 vertices10 8have one neighbor.10 6High-Degreeadjacent toVertices50% of the edges!10 410 2AltaVista WebGraph1.4B Vertices, 6.6B Edges10 010 0 10 2 10Degree4 degree10 6 10 8Top 1% of vertices are22


Power-Law Degree Distribution“Star Like” MotifPresidentObamaFollowers23


Challenges of High-Degree VerticesSequentially processedgesSends manymessages(Pregel)Touches a largefraction of graph(GraphLab)Edge meta-datatoo large for singlemachineAsynchronous Executionrequires heavy locking (GraphLab)Synchronous Executionprone to stragglers (Pregel)24


Graph Partitioning• Graph parallel abstractions rely on partitioning:– Minimize communication– Balance computation and storageComm. CostO(# cut edges)Machine 1 Machine 225


Power-Law Graphs areDifficult to PartitionCPU 1 CPU 2• Power-Law graphs do not have low-cost balancedcuts [Leskovec et al. 08, Lang 04]• Traditional graph-partitioning algorithms performpoorly on Power-Law Graphs.[Abou-Rjeili et al. 06]26


Random Partitioning• GraphLab resorts to random (hashed)partitioning on natural graphs10 Machines 90% of edges cut100 Machines 99% of edges cut!Machine 1 Machine 227


Formal GraphLab2 Semantics• Gather(SrcV, Edge, DstV) A– Collect information from neighbors• Sum(A, A) A– Commutative associative Sum• Apply(V, A) V– Update the vertex• Scatter(SrcV, Edge, DstV) (Edge, signal)– Update edges and signal neighbors30


PageRank in GraphLab2GraphLab2_PageRank(i)Gather( j i ) : return w ji * R[j]sum(a, b) : return a + b;Apply(i, Σ) : R[i] = 0.15 + ΣScatter( i j ) :if R[i] changed then trigger j to be recomputed31


Minimizing Communication in PowerGraphCommunication Y is linear inNew Theorem:the number of machinesFor any edge-cut we can directlyeach vertex spansconstruct a vertex-cut which requiresstrictly less communication and storage.A vertex-cut minimizesmachines each vertex spansPercolation theory suggests that power law graphshave good vertex cuts. [Albert et al. 2000]33


Constructing Vertex-Cuts• Evenly assign edges to machines– Minimize machines spanned by each vertex• Assign each edge as it is loaded– Touch each edge only once• Propose two distributed approaches:– Random Vertex Cut– Greedy Vertex Cut34


Random Vertex-Cut• Randomly assign edges to machinesMachine 1 Machine 2 Machine 3Balanced Vertex-CutY Spans 3 MachinesZ Spans 2 MachinesNot cut!YY YZ35


Random Vertex-Cuts vs. Edge-Cuts• Expected improvement from vertex-cuts:100Reduction inComm. and Storage101Order of MagnitudeImprovement0 50 100 150Number of Machines36


Streaming Greedy Vertex-Cuts• Place edges on machines which already havethe vertices in that edge.ABBCMachine1 Machine 2ABDE37


Greedy Vertex-Cuts Improve PerformanceRuntime Relativeto Random10.90.80.70.60.50.40.30.20.10PageRankCollaborativeFilteringShortest PathRandomGreedyGreedy partitioning improvescomputation performance. 38


System DesignPowerGraph (GraphLab2) SystemMPI/TCP-IP PThreads HDFSEC2 HPC Nodes• Implemented as C++ API• Uses HDFS for Graph Input and Output• Fault-tolerance is achieved by check-pointing– Snapshot time < 5 seconds for twitter network39


Implemented Many Algorithms• Collaborative Filtering– Alternating Least Squares– Stochastic GradientDescent– SVD– Non-negative MF• Statistical Inference– Loopy Belief Propagation– Max-Product LinearPrograms– Gibbs Sampling• Graph Analytics– PageRank– Triangle Counting– Shortest Path– Graph Coloring– K-core Decomposition• Computer Vision– Image stitching• Language Modeling– LDA40


PageRank on the Twitter Follower GraphTotal Network (GB)4035302520151050Natural Graph with 40M Users, 1.4 Billion LinksCommunicationGraphLabPregel(Piccolo)PowerGraphSeconds706050403020100GraphLabRuntimePregel(Piccolo)PowerGraphReduces CommunicationRuns Faster32 Nodes x 8 Cores (EC2 HPC cc1.4x)41


PageRank on Twitter Follower GraphNatural Graph with 40M Users, 1.4 Billion LinksRuntime Per Iteration0 50 100 150 200HadoopGraphLabTwisterPiccoloOrder of magnitude byexploiting propertiesof Natural GraphsPowerGraphHadoop results from [Kang et al. '11]Twister (in-memory MapReduce) [Ekanayake et al. ‘10]42


GraphLab2 is ScalableYahoo Altavista Web Graph (2002):One of the largest publicly available web graphs1.4 Billion Webpages, 6.6 Billion Links7 Seconds per Iter.1B links processed per second64 HPC Nodes30 lines of user code1024 Cores (2048 HT)43


Topic Modeling• English language Wikipedia– 2.6M Documents, 8.3M Words, 500M Tokens– Computationally intensive algorithmMillion Tokens Per Second0 20 40 60 80 100 120 140 160Smola et al.100 Yahoo! MachinesSpecifically engineered for this taskPowerGraph64 cc2.8xlarge EC2 Nodes200 lines of code & 4 human hours44


Triangle Counting• For each vertex in graph, countnumber of triangles containing it• Measures both “popularity” of the vertex and“cohesiveness” of the vertex’s community:Fewer TrianglesWeaker CommunityMore TrianglesStronger Community


Triangle Counting on The Twitter GraphIdentify individuals with strong communities.Counted: 34.8 Billion TrianglesHadoop[WWW’11]1536 Machines423 Minutes64 Machines1.5 Minutes282 x FasterWhy? Wrong Abstraction Broadcast O(degree 2 ) messages per VertexS. Suri and S. Vassilvitskii, “Counting triangles and the curse of the last reducer,” WWW’1146


Machine Learning and Data-MiningToolkitsGraphAnalyticsGraphicalModelsComputerVisionClusteringTopicModelingCollaborativeFilteringGraphLab2 SystemMPI/TCP-IP PThreads HDFSEC2 HPC Nodeshttp://graphlab.orgApache 2 License


GraphChi: Going small with GraphLabSolve huge problems onsmall or embeddeddevices?Key: Exploit non-volatile memory(starting with SSDs and HDs)


GraphChi – disk-based GraphLabNovel Parallel SlidingWindows algorithm• Single-Machine– Parallel, asynchronous execution• Solves big problems– That are normally solved in cloud• Efficiently exploits disks– Optimized for stream acces– Efficient on both SSD andhard-drives


Triangle Counting in Twitter Graph40M Users1.2B EdgesTotal: 34.8 Billion TrianglesHadoopGraphChiPowerGraph1536 Machines423 Minutes59 Minutes, 1 Mac Mini!64 Machines, 1024 Cores1.5 MinutesHadoop results from [Suri & Vassilvitskii '11]


Apache 2 Licensehttp://graphlab.orgDocumentation… Code… Tutorials… (more on the way)


Active Work• Cross language support (Python/Java)• Support for incremental graph computation• Integration with Graph Databases• Declarative representations of GASdecomposition:– my.pr := nbrs.in.map(x => x.pr).reduce( (a,b) => a + b )http://graphlab.orgJoseph E. GonzalezPostdoc, UC Berkeleyjegonzal@eecs.berkeley.edujegonzal@cs.cmu.edu52


Why not use Map-ReduceforGraph Parallel algorithms?


Data Dependencies are Difficult• Difficult to express dependent data in MapReduce– Substantial data transformations– User managed graph structure– Costly data replicationIndependent Data Records


Iterative Computation is Difficult• System is not optimized for iteration:IterationsDataCPU 1DataCPU 1DataCPU 1DataDataDataDataDataDataStartup PenaltyCPU 2CPU 3Disk PenaltyDataDataDataDataDataStartup PenaltyCPU 2CPU 3Disk PenaltyDataDataDataDataDataStartup PenaltyCPU 2CPU 3Disk PenaltyDataDataDataDataDataDataDataDataData


The Pregel AbstractionVertex-Programs interact by sending messages.Pregel_PageRank(i, messages) :// Receive all the messagestotal = 0foreach( msg in messages) :total = total + msgi// Update the rank of this vertexR[i] = total// Send new messages to neighborsforeach(j in out_neighbors[i]) :Send msg(R[i]) to vertex jMalewicz et al. [PODC’09, SIGMOD’10]56


Pregel Synchronous ExecutionComputeCommunicateBarrier


Communication Overheadfor High-Degree VerticesFan-In vs. Fan-Out58


Pregel Message Combiners on Fan-InAB+SumDCMachine 1 Machine 2• User defined commutative associative (+)message operation:59


Pregel Struggles with Fan-OutABDCMachine 1 Machine 2• Broadcast sends many copies of the samemessage to the same machine!60


Fan-In and Fan-Out Performance• PageRank on synthetic Power-Law Graphs– Piccolo was used to simulate Pregel with combinersTotal Comm. (GB)10864201.8 1.9 2 2.1 2.2Power-Law Constant αMore high-degree vertices 61


GraphLab GhostingAABDBDCMachine 1GhostCMachine 2• Changes to master are synced to ghosts62


GraphLab GhostingAABDBDCMachine 1CGhostMachine 2• Changes to neighbors of high degree verticescreates substantial network traffic63


Fan-In and Fan-Out Performance• PageRank on synthetic Power-Law Graphs• GraphLab is undirectedTotal Comm. (GB)10864201.8 1.9 2 2.1 2.2Power-Law Constant alphaMore high-degree vertices 64


Comparison with GraphLab & PregelTotal Network (GB)• PageRank on Synthetic Power-Law Graphs:10864201.8CommunicationPregel (Piccolo)GraphLabPower-Law Constant αHigh-degree verticesSeconds3025201510501.8RuntimePregel (Piccolo)GraphLabPower-Law Constant αHigh-degree verticesGraphLab2 is robust to high-degree vertices.65


GraphLab on Spark#include struct vertex_data : public graphlab::IS_POD_TYPE { float rank;vertex_data() : rank(1) { }};typedef graphlab::empty edge_data;typedef graphlab::distributed_graph graph_type;class pagerank :public graphlab::ivertex_program,public graphlab::IS_POD_TYPE {float last_change;public:float gather(icontext_type& context, const vertex_type& vertex,edge_type& edge) const {return edge.source().data().rank / edge.source().num_out_edges();}void apply(icontext_type& context, vertex_type& vertex,const gather_type& total) {const double newval = 0.15*total + 0.85;last_change = std::fabs(newval - vertex.data().rank);vertex.data().rank = newval;}void scatter(icontext_type& context, const vertex_type& vertex,edge_type& edge) const {if (last_change > TOLERANCE) context.signal(edge.target());}};struct pagerank_writer {std::string save_vertex(graph_type::vertex_type v) {std::stringstream strm;strm abs(e.source.data._2-e.source.data._1)>0.01)pr.vertices.saveAsTextFile(“results”)Interactive!graphlab::omni_engine engine(dc, graph, clopts);engine.signal_all();engine.start();graph.save(saveprefix, pagerank_writer(), false, true false);graphlab::mpi_tools::finalize();return EXIT_SUCCESS;}66

More magazines by this user
Similar magazines