Option - GPU Technology Conference
Best Practices Managing and
Maintaining Large Scale
Visualization Clusters
Vijay Kalivarapu, Postdoctoral Research Associate
Glen Galvin, Manager of Information Technology
Virtual Reality Applications Center (VRAC)
Iowa State University
Ames, IA
Thursday, March 21, 13
1
Overview
• Who we are
• VRAC facilities
• Best Practices
Thursday, March 21, 13
2
Who we are
• Interdisciplinary Research Center
• ~350 Students, ~50 Faculty Investigators
• Home of the HCI Graduate Program
• Over three million USD a year in sponsored projects
Thursday, March 21, 13
3
VRAC Facilities - C6
• 10 ft. x 10 ft. x 10 ft. cube
• 24 Sony SRX-S105 4K resolution
projectors with beacon active/passive
stereo
• 49-node six-core Xeon cluster
• Two Nvidia Quadro 6000s per node
• Ultrasonic Intersense tracking and 8-
channel audio
Thursday, March 21, 13
4
VRAC Facilities - MIRAGE
• 33 ft. x 11 ft. power wall
• Six DP Titan WUXGA1080p active
stereo projectors
• Seven-node Xeon Dell cluster
• Nvidia Quadro Fermi 5000 per node
• 24-camera Motion Analysis IR tracking
Thursday, March 21, 13
5
VRAC Facilities - METaL
• Multimodal Experience Testbed and
Laboratory
• 3-wall cave - commonly used in industries
• Seven-node Xeon Dell cluster
• Quadro Plex 2200-D2 GPU on render
nodes, FX 5800 on master node
• DP Titan WUXGA-3D stereo projectors
• 4-camera ART IR tracking system
Thursday, March 21, 13
6
VRAC Facilities - Auditorium
• 30 ft x 20 ft front projected auditorium
(seats 250)
• Three-node Intel Xeon Dell cluster
• Two Quadro Plex 7000s
• Front-projected by two Sony 4Ks
• Beacon Active/Passive Stereo
Thursday, March 21, 13
7
VRAC Facilities - Development cluster
• Eight 30” monitors
• Rack mounted three-node
Intel Xeon Dell cluster
• Two Quadro Plex 7000s
• Quadro 5000 - master
• Development and testing
Thursday, March 21, 13
8
Towards better practices ...
• Viz. systems entail unique maintenance and management challenges
• Hardware complexity demands special skill set
• Spent years honing our viz. systems
• Disseminating tricks of trade
• Applicable to academia and industry
Thursday, March 21, 13
9
Best Practices Overview
• Viz. Computer Cluster
• Projectors
• Stereo emitters and glasses
• Tracking System
• Software/Scripts - geared towards Linux and Nvidia
• For the developer
Thursday, March 21, 13
10
Viz. Cluster - Planning/Building
• Identify your requirements instead of
having vendors plan for you
• Viz. capabilities different from HPC
• Budget for
• Few spare cluster nodes
• A smaller development cluster
• GSync 2/Quadro Sync cards ...
Thursday, March 21, 13
11
Viz. Cluster - Planning
• Framelocking provides
frame sync and swap sync
• Makes sure frame
timestamps are synced
• Genlock syncs frame
timings
• Framelock + Genlock
recommended
Thursday, March 21, 13
12
Viz. Cluster - Maintenance
• Setup all cluster nodes the same
• Applies to spare nodes and the dev cluster
• Use script based software installs/updates
• Kickstart system
• Change once, deploy on all cluster nodes
• Use common IP subnet for viz. cluster
Thursday, March 21, 13
13
Viz. Cluster - Management
• Facility scheduling
• Point person handles facility usage
• Compute scheduling - evenings or weekends
• pbs scheduler (open/commercial), maui scheduler, etc
Thursday, March 21, 13
14
Projectors - Planning
• LCD/LED screens vs Projectors
• TVs - proprietary 3D technologies
• Bezels, albeit small
• 120 Hz does not mean 3D
• Screen resolution
• Projectors - recommended
Thursday, March 21, 13
15
Projectors - Planning
• Blend zones
• Not exact science
• Pick projectors accounting for it
• Mirrors
• Glass (heavy/$$$), Mylar (careful)
• Small throw distance
• If projectors are set up horizontally
Thursday, March 21, 13
16
Projectors - Maintenance
• Keep spare bulbs, transmitters and receivers
• Periodic calibration and color balancing
• Keep external lights pointed away
• Recommend having screen patches around
• Not all screens hold polarity or work well with stereo
• Use high gain screens if possible
Thursday, March 21, 13
17
Projectors - Management
• Programmable control panel
• Time delay
• Auto-shut off at midnight
• Cooling cycle
• Disable panel
communications for
maintenance
Thursday, March 21, 13
18
Projectors - Management
• Hook up projectors via private
network
• They can email status
updates
• Good idea for projectors to
support active and passive
stereo
• Recommended video routing:
Copper > Fiber > Copper
Thursday, March 21, 13
19
Tracking System - Planning
Feature Ultrasonic Magnetic Infrared
Line of sight? Yes No Yes
Invasive? Yes and No No Yes, need to cut holes
Issues
Jitter, removed by
calibration (takes time
and people)
Lesser range,
Distortions - ferrous
Jitter - non-ferrous
metals
Need reflective
markers
Thursday, March 21, 13
20
Tracking System - Maintenance
• Keep spares for tracking gear and update firmware
• Test hardware for rough usage (i.e., head tracker, wand, etc)
• Keep calibration programs handy
• All but IS have standalone consoles with video to configure/calibrate
• IS requires a windows station to run ‘IS Demo’
Thursday, March 21, 13
21
Stereo Emitters and Glasses
• IR track systems typically require syncing with IR emitters
• Sync signal that is passed to emitters should be passed to the tracker
• Multiple IR emitters + multiple IR trackers = interference
• Vendors (e.g., xpand) began using encrypted IR signals
• Problems in systems with multiple emitters
• Emitters themselves do not sync - causes occasional eye-swap
Thursday, March 21, 13
22
Stereo Emitters and Glasses - Maintenance
• Store IR glasses sensor-down or in a cabinet
• Storing within IR field causes increased battery drain
• Lessons learned
• Use a digital camera to test IR emitters
• Stereo may not work when using older IR stereo glasses in a 120v
fluorescent lighting
Thursday, March 21, 13
23
Software - Planning
high
Custom
systems
• Test different viz. tools and be
faithful to select few
Hybrid
systems
• VR juggler
• OSG
• Open source tools - VR
Juggler, OpenSceneGraph,
Flexibility
Turnkey
systems
• Vizard
• Quest 3D
OpenSG
low
Native
application
• Conduit
• TechViz
• ICIDO
• Virtools
• EON
• CaveLib, Vizard, Unity 3D,
Virtools, ICIDO
low
Complexity
high
• Project sponsors may require
other commercial tools
Thursday, March 21, 13
24
Software - Video Drivers
• Block Mesa and Kernel from auto-update on Linux systems
• Re-build Nvidia drivers after you update either
• Be consistent across all cluster nodes and dev clusters
• Nvidia Mosaic utility - eliminates threading issues with multiple GPUs
on a cluster node
• One thread per app with Mosaic vs One thread per GPU w/o Mosaic
Thursday, March 21, 13
25
Software - Disable Nouveau
1. Add the following to a file /etc/modeprobe.d/disablenouveau.conf
blacklist nouveau
options nouveau modeset=0
2. Add an option in the bootloader menu (e.g., grub)
rdblacklist = nouveau
3. Reboot and install video drivers with X disabled
http://us.download.nvidia.com/
XFree86/Linux-x86_64/310.40/
README/commonproblems.html
Thursday, March 21, 13
26
Software - Xorg.conf options
Section "Extensions"
Option
"Composite" "Disable"
EndSection
Section "Device"
...
...
Option "stereo" "3"
Option
"allowdfpstereo" "on"
EndSection
http://us.download.nvidia.com/XFree86/Linuxx86_64/310.40/README/xconfigoptions.html
Thursday, March 21, 13
27
Software - Xorg.conf options
• EDID - Extended Display Identification Data
• Query from display and save it in a file
Section "Screen"
...
Option
Option
Option
"UseEdidFreqs" "True"
"UseEdid" "True"
"CustomEDID" "DFP-0:/etc/X11/edid.bin"
Option "ConstantFrameRateHint" "1"
EndSection
http://en.wikipedia.org/wiki/
Extended_display_identification_data
Thursday, March 21, 13
28
Software - Xorg.conf options
• App launches but crashes after first frame draw
• Typically happens if not using Mosaic and code not multi-threaded
Section "Screen"
...
Option
Option
Option
"UseEdidFreqs" "True"
"UseEdid" "True"
"CustomEDID" "DFP-0:/etc/X11/edid.bin"
Option "ConstantFrameRateHint" "1"
EndSection
Thursday, March 21, 13
29
For the Developer
• Cluster script to launch an instance of the app per node
• Can further use scripts
• To kill apps, X-restart, cluster reboot
• SSH keys for auto login to nodes without password prompts
• Use Preboot Execution Environment (PXE) scripts for auto reboot
to a specified OS on the entire cluster
Thursday, March 21, 13
30
For the Developer
• Developer might not always work on the same workstation
• Use floating licenses for software installs if possible
• Major upgrades and updates (e.g., Kernel, video drivers)
• Likely to break existing apps - give a heads up before and after
• Create and encourage using dev-help mailing list
Thursday, March 21, 13
31
• Training & Education
For the Developer
• Periodic training on using the viz. system
• Development and initial testing on a dev-cluster
• No code changes the day of project demonstrations
• Make production code self-contained, including dependencies
• Portability - Inherit env. variables from a launch script than from
~/.cshrc
Thursday, March 21, 13
32
Catering for Developer woes
• User complains, his viz app worked before but ...
• Seg faults
• Did not start
• Too slow
• Shows up only on certain parts of the screen
• Has ghosting on the screen and sees blurry images
• Has tearing between frames
Thursday, March 21, 13
33
Conclusions
• Non-trivial to maintain/manage large scale viz. systems
• Summarized tricks of trade
• No means complete
• vkk2@iastate.edu
• Questions?
Thursday, March 21, 13
34