Option - GPU Technology Conference

gputechconf.com

Option - GPU Technology Conference

Best Practices Managing and

Maintaining Large Scale

Visualization Clusters

Vijay Kalivarapu, Postdoctoral Research Associate

Glen Galvin, Manager of Information Technology

Virtual Reality Applications Center (VRAC)

Iowa State University

Ames, IA

Thursday, March 21, 13

1


Overview

• Who we are

• VRAC facilities

• Best Practices

Thursday, March 21, 13

2


Who we are

• Interdisciplinary Research Center

• ~350 Students, ~50 Faculty Investigators

• Home of the HCI Graduate Program

• Over three million USD a year in sponsored projects

Thursday, March 21, 13

3


VRAC Facilities - C6

• 10 ft. x 10 ft. x 10 ft. cube

• 24 Sony SRX-S105 4K resolution

projectors with beacon active/passive

stereo

• 49-node six-core Xeon cluster

• Two Nvidia Quadro 6000s per node

• Ultrasonic Intersense tracking and 8-

channel audio

Thursday, March 21, 13

4


VRAC Facilities - MIRAGE

• 33 ft. x 11 ft. power wall

• Six DP Titan WUXGA1080p active

stereo projectors

• Seven-node Xeon Dell cluster

• Nvidia Quadro Fermi 5000 per node

• 24-camera Motion Analysis IR tracking

Thursday, March 21, 13

5


VRAC Facilities - METaL

• Multimodal Experience Testbed and

Laboratory

• 3-wall cave - commonly used in industries

• Seven-node Xeon Dell cluster

• Quadro Plex 2200-D2 GPU on render

nodes, FX 5800 on master node

• DP Titan WUXGA-3D stereo projectors

• 4-camera ART IR tracking system

Thursday, March 21, 13

6


VRAC Facilities - Auditorium

• 30 ft x 20 ft front projected auditorium

(seats 250)

• Three-node Intel Xeon Dell cluster

• Two Quadro Plex 7000s

• Front-projected by two Sony 4Ks

• Beacon Active/Passive Stereo

Thursday, March 21, 13

7


VRAC Facilities - Development cluster

• Eight 30” monitors

• Rack mounted three-node

Intel Xeon Dell cluster

• Two Quadro Plex 7000s

• Quadro 5000 - master

• Development and testing

Thursday, March 21, 13

8


Towards better practices ...

• Viz. systems entail unique maintenance and management challenges

• Hardware complexity demands special skill set

• Spent years honing our viz. systems

• Disseminating tricks of trade

• Applicable to academia and industry

Thursday, March 21, 13

9


Best Practices Overview

• Viz. Computer Cluster

• Projectors

• Stereo emitters and glasses

• Tracking System

• Software/Scripts - geared towards Linux and Nvidia

• For the developer

Thursday, March 21, 13

10


Viz. Cluster - Planning/Building

• Identify your requirements instead of

having vendors plan for you

• Viz. capabilities different from HPC

• Budget for

• Few spare cluster nodes

• A smaller development cluster

• GSync 2/Quadro Sync cards ...

Thursday, March 21, 13

11


Viz. Cluster - Planning

• Framelocking provides

frame sync and swap sync

• Makes sure frame

timestamps are synced

• Genlock syncs frame

timings

• Framelock + Genlock

recommended

Thursday, March 21, 13

12


Viz. Cluster - Maintenance

• Setup all cluster nodes the same

• Applies to spare nodes and the dev cluster

• Use script based software installs/updates

• Kickstart system

• Change once, deploy on all cluster nodes

• Use common IP subnet for viz. cluster

Thursday, March 21, 13

13


Viz. Cluster - Management

• Facility scheduling

• Point person handles facility usage

• Compute scheduling - evenings or weekends

• pbs scheduler (open/commercial), maui scheduler, etc

Thursday, March 21, 13

14


Projectors - Planning

• LCD/LED screens vs Projectors

• TVs - proprietary 3D technologies

• Bezels, albeit small

• 120 Hz does not mean 3D

• Screen resolution

• Projectors - recommended

Thursday, March 21, 13

15


Projectors - Planning

• Blend zones

• Not exact science

• Pick projectors accounting for it

• Mirrors

• Glass (heavy/$$$), Mylar (careful)

• Small throw distance

• If projectors are set up horizontally

Thursday, March 21, 13

16


Projectors - Maintenance

• Keep spare bulbs, transmitters and receivers

• Periodic calibration and color balancing

• Keep external lights pointed away

• Recommend having screen patches around

• Not all screens hold polarity or work well with stereo

• Use high gain screens if possible

Thursday, March 21, 13

17


Projectors - Management

• Programmable control panel

• Time delay

• Auto-shut off at midnight

• Cooling cycle

• Disable panel

communications for

maintenance

Thursday, March 21, 13

18


Projectors - Management

• Hook up projectors via private

network

• They can email status

updates

• Good idea for projectors to

support active and passive

stereo

• Recommended video routing:

Copper > Fiber > Copper

Thursday, March 21, 13

19


Tracking System - Planning

Feature Ultrasonic Magnetic Infrared

Line of sight? Yes No Yes

Invasive? Yes and No No Yes, need to cut holes

Issues

Jitter, removed by

calibration (takes time

and people)

Lesser range,

Distortions - ferrous

Jitter - non-ferrous

metals

Need reflective

markers

Thursday, March 21, 13

20


Tracking System - Maintenance

• Keep spares for tracking gear and update firmware

• Test hardware for rough usage (i.e., head tracker, wand, etc)

• Keep calibration programs handy

• All but IS have standalone consoles with video to configure/calibrate

• IS requires a windows station to run ‘IS Demo’

Thursday, March 21, 13

21


Stereo Emitters and Glasses

• IR track systems typically require syncing with IR emitters

• Sync signal that is passed to emitters should be passed to the tracker

• Multiple IR emitters + multiple IR trackers = interference

• Vendors (e.g., xpand) began using encrypted IR signals

• Problems in systems with multiple emitters

• Emitters themselves do not sync - causes occasional eye-swap

Thursday, March 21, 13

22


Stereo Emitters and Glasses - Maintenance

• Store IR glasses sensor-down or in a cabinet

• Storing within IR field causes increased battery drain

• Lessons learned

• Use a digital camera to test IR emitters

• Stereo may not work when using older IR stereo glasses in a 120v

fluorescent lighting

Thursday, March 21, 13

23


Software - Planning

high

Custom

systems

• Test different viz. tools and be

faithful to select few

Hybrid

systems

• VR juggler

• OSG

• Open source tools - VR

Juggler, OpenSceneGraph,

Flexibility

Turnkey

systems

• Vizard

• Quest 3D

OpenSG

low

Native

application

• Conduit

• TechViz

• ICIDO

• Virtools

• EON

• CaveLib, Vizard, Unity 3D,

Virtools, ICIDO

low

Complexity

high

• Project sponsors may require

other commercial tools

Thursday, March 21, 13

24


Software - Video Drivers

• Block Mesa and Kernel from auto-update on Linux systems

• Re-build Nvidia drivers after you update either

• Be consistent across all cluster nodes and dev clusters

• Nvidia Mosaic utility - eliminates threading issues with multiple GPUs

on a cluster node

• One thread per app with Mosaic vs One thread per GPU w/o Mosaic

Thursday, March 21, 13

25


Software - Disable Nouveau

1. Add the following to a file /etc/modeprobe.d/disablenouveau.conf

blacklist nouveau

options nouveau modeset=0

2. Add an option in the bootloader menu (e.g., grub)

rdblacklist = nouveau

3. Reboot and install video drivers with X disabled

http://us.download.nvidia.com/

XFree86/Linux-x86_64/310.40/

README/commonproblems.html

Thursday, March 21, 13

26


Software - Xorg.conf options

Section "Extensions"

Option

"Composite" "Disable"

EndSection

Section "Device"

...

...

Option "stereo" "3"

Option

"allowdfpstereo" "on"

EndSection

http://us.download.nvidia.com/XFree86/Linuxx86_64/310.40/README/xconfigoptions.html

Thursday, March 21, 13

27


Software - Xorg.conf options

• EDID - Extended Display Identification Data

• Query from display and save it in a file

Section "Screen"

...

Option

Option

Option

"UseEdidFreqs" "True"

"UseEdid" "True"

"CustomEDID" "DFP-0:/etc/X11/edid.bin"

Option "ConstantFrameRateHint" "1"

EndSection

http://en.wikipedia.org/wiki/

Extended_display_identification_data

Thursday, March 21, 13

28


Software - Xorg.conf options

• App launches but crashes after first frame draw

• Typically happens if not using Mosaic and code not multi-threaded

Section "Screen"

...

Option

Option

Option

"UseEdidFreqs" "True"

"UseEdid" "True"

"CustomEDID" "DFP-0:/etc/X11/edid.bin"

Option "ConstantFrameRateHint" "1"

EndSection

Thursday, March 21, 13

29


For the Developer

• Cluster script to launch an instance of the app per node

• Can further use scripts

• To kill apps, X-restart, cluster reboot

• SSH keys for auto login to nodes without password prompts

• Use Preboot Execution Environment (PXE) scripts for auto reboot

to a specified OS on the entire cluster

Thursday, March 21, 13

30


For the Developer

• Developer might not always work on the same workstation

• Use floating licenses for software installs if possible

• Major upgrades and updates (e.g., Kernel, video drivers)

• Likely to break existing apps - give a heads up before and after

• Create and encourage using dev-help mailing list

Thursday, March 21, 13

31


• Training & Education

For the Developer

• Periodic training on using the viz. system

• Development and initial testing on a dev-cluster

• No code changes the day of project demonstrations

• Make production code self-contained, including dependencies

• Portability - Inherit env. variables from a launch script than from

~/.cshrc

Thursday, March 21, 13

32


Catering for Developer woes

• User complains, his viz app worked before but ...

• Seg faults

• Did not start

• Too slow

• Shows up only on certain parts of the screen

• Has ghosting on the screen and sees blurry images

• Has tearing between frames

Thursday, March 21, 13

33


Conclusions

• Non-trivial to maintain/manage large scale viz. systems

• Summarized tricks of trade

• No means complete

• vkk2@iastate.edu

• Questions?

Thursday, March 21, 13

34

More magazines by this user
Similar magazines