Welcome Guestlogin to KGsePGregister at KGsePG email | FAQs

Pvr - High-performance Volume Rendering

download

    1 of 11

    Pvr - High-performance Volume Rendering



    Pvr - High-performance Volume Rendering - Transcript


    S CIENTIFIC

    VISUALIZATION

    P VR H igh Performance Volume Rendering
    C liudio T Silva and Arie E Kaufman

    S tate University o f N ew York ut Stony Brook
    C onstantine Pavlakos

    S undia Nutioml Laboratories

    V

    o lume rendering is a powerful computer graphics technique for visualizing three dimensional data l While much visualization creates a rendering only of surfaces though they may be surfaces of 3D objects volume rendering lets us also see inside beneath the surface of the object being represented This technique models a volume as cloudlike cells of semitransparent material Each cell emits light partially transmits light from other cells and absorbs some incoming light see Volume Rendering sidebar For instance while a surface rendering of the human body might show the skin a complete volume rendering also shows the bones and internal organs visible from any side in proper perspective Volume rendering began with medical visualization but has migrated to other fields including visualization and graphics for nonscience uses Objects of visualization need not be tangible in fact volume rendering is especially well suited for representing the 3D volumetric scalar and vector fields that frequently arise in computational science and engineering Volume rendering is a nontrivial technique and can be slow To effectively use it in studying complex physical and abstract structures researchers and engineers need a coherent powerful easy to use visualization tool This tool should allow for interactive visualization ideally with support for user defined computational steering that is the ability to change parameters during simulation But such a visualization tool presents develapment issues and challenges First even with the latest volume rendering acceleration techniques running on top of the line workstations it still takes up to several minutes to volume render an image far from interactive The large parallel computers that create the most detailed scientific simulations can generate data sets typically on the order of 32 to 512 megabytes and ranging up to 16 gigabytes Second even if rendering time is not a concern large data sets may be too expensive to store and extremely slow to transfer over network links to typical workstations This raises the question of whether visualization should be performed directly on the parallel machine generating the simulation data or sent to a high performance graphics workstation for postprocessing in the traditional manner If the visualization and simulation software were integrated we would need no extra storage and visual

    18

    1 070 9924 96 5 0001996lEEE

    I EEEC OMPUTATIONALSCIENCE ENGINEERING

    V olume Rendering
    V olume rendering accumulates information from voxels volumetric pixels in a 3D data set to produce a 20 image allowing structures in the data to be examined carefully The technique models the volumetric data set as cloudlike material that scatters emits and absorbs light Several algorithms can be used With the r ay c asting algorithm a ray is cast in object or volume space for every pixel in the image Roughly for each ray the rendering equation x e t s d t dt I
    0

    V oxel samples

    m Eye

    Ray 4

    t Current sample

    F we A Ray castingcombines color and inter y of voxelsalong the eachline of sight in 30 data to producea pixel in the 20 vimalization ofthe data Black voxelshave beencomposited is being blue worked on red voxelshave not beendoneyet

    i s integrated where I t represents the light intensity emanating from a given portion of the volume and o s is the differential absorption of light to calculate attenuation along the viewing direction The integral is calculated by a simple numerical quadrature scheme from a set of uniform samples I t and o t are calculated by assigning t ransfer f unctions table lookups based on the original volume data f x y z computed by trilinearly interpolating the eight values defined at the volume closest points Each sample s contains the color and opacity at a certain distance from the eye With color and opacity known we easily accumulate the final pixel either back to front or front to back in a process called cornpositing For instance Figure A depicts back to front cornpositing If the current voxel has color C opacity a and incoming intensity of color I the outgoing intensity Y is given by what computer graphics people call the over operator since it lays down one voxel over another I C I l a CoverI The colors are saved premultiplied by the opacities the actual color is C a which saves one multiplication per compositing operation Cornpositing is associativethat is A o ver B over C gives the same result as A o ver 8 over C which is important for parallelization Transfer functions specify what portions of the volume are relevant for visualization Like color maps transfer functions specify color and opacity for each voxel To locate interesting properties in data researchers must often try different combinations of transfer functions see Figure B and viewing parameters For instance our eyes are well trained to see patterns in moving scenes such as rotations Thus especially with complex data lacking visible hard edges we would like to be able to animate the visualization Unfortu

    Figure B Volwne rendering o f M RI data

    nately volume rendering is typically slow even for small data sets especially when the volume is relatively transparent For example using VolVi an advanced but nonparallel volume renderer it takes hours to generate animations of the data sets shown in this article With our PVR system we can generate even the largest animations in a few seconds to a few minutes because the system scales easily References 1 N Max Optical Models for Direct Volume Rendering FEE Trans Visua ization and Computer Graphics Vol 1 No 2 J une 1995 pp 99 l 08 2 R Avila et al VolVis A Diversified Volume Visualization System Proc Visualization IEEECS Press Los 94 Alamitos Calif 1994 pp 31 38

    i zation could be an active part of the simulation Also integrating simulation and visualization in one tool allows for the possibility of interactively steering the simulation This developing methodology of computational steering lets a

    user observe and modify a simulation as it progresses rather than wait for painfully long runs on expensive machines only to discover during postprocessing that the simulations were wrong or uninteresting

    W INTER 1996



    19

    P VR PARALLEL VOLUME RENDERING









    I



    P ara llel Volume Rendering
    T he need tc3 render very large data sets faster coupled with more widely available parallel and distributed machines is the force behind 1 parallel volume rendering research Good starting points in t he literature are the recent survey by Tom Crockett and the proceedii ngs of the Parallel Rendering Symposia 93 95 the ACM V olume Visualization Symposia and the IEEEVisualization con ferences Parallel volume rer rdering can exploit three main types of parallelism
    O bject space pa rallelism w here each rendering node

    gets a portion of the data set
    I mage space parallelism w here different nodes compute disjoint parts of the image 7 T me space o r t emporal parallelism w here different port ions o f the rendering pipeline are divided pipeline fashion amonc J independent sets of nodes

    o f PVR For instance it does not support multiple rendering or cornpositing clusters Another similar system is Discover 4 developed at National Cheng Kung University Taiwan Researchers developed Discover which can use remote processor pools for custom medical imaging applications It offers a client server architecture including support for Microsoft Windows Our group is particularly interested in ray casting methods that run on distributed memory machines such as the Intel Paragon and the ASCI teraflops machine In these machines which limit each node memory access to its local memory s we must divide the data set among computing nodes This requires that we later group volume samples back together in an image All ray casting parallel methods differ primarily in the way they handle this division and regroupi ng dur PVR system s parallelization method is based on a combination of data set

    ddition to our group efforts on the PVR system other s hers have developed several different algorithms and ISbased on these types of parallelism project at Purdue has developed tools for disollaborative visualization 2 The system implevolume visualization with a mix of image ct space load balancing These researchers p to four processors for computation but give aking it hard to evaluate the system usability s parallel environment l3 and his colleagues describe a distributed volrg system implemented on the IBM SPI The ently shares some characteristics with our PVR rticular it runs on a massively parallel machine xct spacepartitioning uses separate rendering ing nodes and provides a front end graphical Unfortunately Rowlan provides few details on rral design and implementation and describes only briefly As far as we know their system de the flexibility portability and performance

    F igure C PVR v olume endering ilhwates content based load balr ancing A mbdiuision of the MRI headfor eightprocessorsis sho um

    I n this article we describe the PVR Parallel Volume Rendering system that we have developed in a collaboration between the State University of New York at Stony Brook and Sandia National Laboratories PVR is an attempt to provide an easy to use portable system for highperformance visualization with the speed required for interactivity and steering The current version of PVR consists of about 25 000 lines of C and Tcl Tk code We used it at ve Stony Brook Sandia and Brookhaven National Labs to visualize large data sets for over a year

    O verview of PVR
    O ur original goals were to achieve portability and performance for rendering beyond that of available systems and to provide a platform for further development In a way PVR is more than a rendering system its components have been specially designed to enable user defined computational steering With PVR it is much easier to build portable high performance complex distributed visualization environments or DVEs Unlike several other approaches to parallel volume rendering see Parallel Volume Ren

    20



    I EEEC OMPUTATIONAL SCIENCE ENGINEERING

    l oad balancing5 and cornpositing schemes proposed elsewhere In our volume rendering implementation we divide the processors into two distinct groups of nodes rendering and cornpositing nodes The rendering nodes get portions of the data set the cornpositing nodes are responsible for turning a collection of subray images into a complete and correct image for viewing In PVR every rendering node receives part of the data set w ith approximately the same number of nonempty voxels as s hown in Figure C Other approaches such as giving the same amount of volume to each node are also feasible j Dynamic load balancing schemes have been tried7 but are harder to implement The PVRrendering nodes sample and composite their part of a ray To avoid global communication each subvolume region assigned to a rendering node is convex and belongs to a global BSP tree which makes cornpositing simpler The cornpositing nodes regroup all subrays together consistently to keep image correctness This calculation is only possible because composition is associative so if we have to subray samples where one ends and the other starts we can combine their samples into o ne subray recursively until we have a value that constitutes the full ray contribution to a pixel Ma et al 6 approached cornpositing differently switching the rendering nodes between rendering and cornpositing Our method is more efficient because we can u se t he special structure of the subray composition to yield a highperformance pipeline where multiple nodes implement the complete pipeline see Figure 4 in the main text Also the structure of cornpositing requires synchronization and lightweight computation making it much less attractive for parallelization over many processors In a more recent paper Ma does divide the nodes into two classes 8 T he PVR structure lets us exploit all three types of parallelism mentioned above By using more than one rendering cluster to compute an image we use object space parallelism and image space9 parallelism we can specify t hat each cluster in PVR compute disjoint scanlines of the

    same image The clustering approach coupled with the inherent pipeline parallelism available in the recursive cornpositing process gives rise to time space parallelism In the latter we can exploit multiple clusters by calculating subrays for several images concurrently that are sent down the cornpositing pipeline concurrently We perform each composition step in lockstep to avoid mixing of images
    R eferences

    1 T W Crockett Parallel Rendering in Encyclopediao f Computer Sciencea nd T echnology A Kent and 1 G Williams eds Vol 34 Supp 19 A Marcel Dekker 1996 pp 335 371 Also available as ICASEReport No 95 31 NASA CR195080 April 1995 2 V Anupam et al Distributed and Collaborative Visualization Computer Vol 27 No 7 july 1994 pp 37 43 3 1 Rowlan et al A Distributed Parallel Interactive Volume Rendering Package Proc V isualization IEEECS Press 94 Los Alamitos Calif 1994 pp 21 30 4 P W Liu et al Distributed Computing New Power for Scientific Visualization I EEEComputer Graphics and A pplications V ol 16 No 3 May 1996 pp 42 51 5 C Silva and A Kaufman Parallel Performance Measures for Volume Ray Casting P roc Visualization IEEECS Press 94 Los Alamitos Calif 1994 pp 196 203 6 K Ma et al Parallel Volume Rendering Using Binary Swap Compositing I EEEComputer Graphics and Applications V ol 14 No 4 July 1994 pp 59 68 7 U Neumann Parallel Volume Rendering Algorithm Performance on Mesh Connected Multicomputers Proc 1993 P arallel Rendering Symp A CM Press New York pp 97 l 04 8 K Ma Parallel Volume Rendering for Unstructured Grid Data on Distributed Memory Machines P roc IEEE ACMPara llel Rendering Symposium A CM Press New York pp 95 23 30 9 J P Singh A Cupta and M Levoy Parallel Visualization Algorithms Performance and Architectural Implications C omp uter V ol 27 No 7 k rly 1994 pp 45 55

    d ering sidebar PVR uses a component approach to building an interactive distributed system At its topmost level it has a flexible and high performance client server architecture for volume rendering The PVR system has the following key features Transpa rency PVR hides most of the hardware dependencies from the distributed visualization environment and the user P e mnance PVR p rovides high speed pipelined ray casting with a load balancing

    scheme that enables performance fine tuning for any given machine configuration 6 ScaLab y All system algorithms are gracefully scalable Scalability concerns machine size as well as growth in data set and image size E b q The PVR architecture can be easily extended making it easy for the DW to add new functionality Also new functionality can be easily added to the PVR shell and its corresponding kernel to accommodate user defined computational steering coupled with visualization
    21

    W INTER 1 996

    Ii

    a n Intel version of OSF l installed at Stony Brook The system uses a single protocol to handle multiple sessions on machines running different operating systems A session specifies the number of nodes it needs and the parameters passed to those nodes The pvrsh and the pvrren i nteractively e xchange for example rendering configuration information rendering commands image sequences F igure 1 The PVR architecture with an emphasis on the PVR shell The Tcl Tk core and performance and debugacts as glue for the client components Everything except the renderers runs on the ging information user workstation The renderers run remotely on parallel machines s The flexible rendering specification means you can specify simple rendering ele S ystem complexity limits the reliability of ments such as changing transformation malarge software systems Distributed systems extrices transfer functions image sizes and data acerbate this problem with asynchronous and sets Moreover with commands see Table 1 nonlocal communication PVR attempts to proin a high level format you can specify the vide just enough functionality in the basic syscomplete parallel rendering pipeline With tem through a component approach to allow these parameters you can use the pvrsh to development of large complex visualization and specify almost arbitrary scalable rendering steering applications Our client server archiconfigurations tecture has coupled rendering computing We implemented the PVR shell as a single servers on one side and the client user workstaprocess which simplifies porting to other option on the other erating systems in about 5 000 lines of C We implemented the PVR client server arcode We augmented our version of the chitecture in two main components Tcl Tk interpreter with TCP IP connection capabilities To support several concurrent ses the P VR s hell often abbreviated PVY sions the system performs all communication which runs on the user workstation and s asynchronously We use the Tk Create the P VR r enderer vwen which runs on FileHandler routine to arbitrate belarge parallel machines tween the different sessions input We could have used a Unix select call and polling inT he PVR shell p vrsh stead but that makes the code more complex T he PVR shell an augmented Xl Tk shell Sessions work as interrupt driven commands gives you a single new object the P VR session responding to requests one at a time Every T cl Tk which is a well designed debugged session can receive events from two sources at script application language and powerful graphonce the user keyboard and the remote maical environment has helped reduce the system chine The system needs locking and disabling complexity interrupts to ensure consistency inside critical The PVR session is an object in the Tk sense sessions that contains attributes A key attribute is the Our code structure lets the user augment sesone that b inds a s ession to a particular parallel sion functionality either externally or internally machine Figure 1 shows some of the PVR shell E xternal a ugmentation occurs without recominternal architecture and its multiple sessions pilation such as that performed by the user incapabilities It shows three sessions two on an terface to show images as they are received asyuIntel Paragon XP S with over 1 840 nodes nmchronously from the remote parallel server ning Sunmos Sandia University of New MexI ztemaal a ugmentation requires source code ico Operating System installed at Sandia and changes The source code structure allows easy one on an Intel Paragon with 110 nodes running additions of functionality

    22

    IEEE COMPUTATIONAL SCIENCE ENGINEERING

    T he PVR renderer

    p vrren

    T he PVR renderer runs reexecution files or embedded in applications motely on a parallel machine Description see Figure 1 and has several Command M is an Internet address N is a port number components the most com s open M N s image window W W is a Tk photo widget plex being the rendering code F is a procedure to be called every time a new s image callback F itself To start up multiple parimage is received allel processes at the remote Fis the name of the local file name where the machine we use the PVR dae s image file f video stream is saved mon p vrd O n the remote ma s s et O ption Val Changes system status chine the handling process al s set cluster C Sets cluster size locates the computing nodes and runs the renderer code on s set group C Groups multiple clusters to exploit image based parallelism them One PVR daemon can allocate several processes s set imagesz X Y Sets the desired image resolution The renderer is the code that s render rotate X KZ S E N Sends a rendering request Specifies the axis of rotation and initial end and incremental actually runs on the parallel nodes The overall code strucangles ture resembles a SIMD single s performance memory cluster Returns the amount of data set memory in each instruction multiple data macluster chine with high level and low s performance camp cluster Estimates how long it will take to composite level commands There is one latency images in the current cluster configuration master node similar to the mic rocontroller on a Thinking Machines CM 2 and several slave nodes Slave functions depend completely on the master The master receives commands from the PVR shell translates them and takes actions such as changing the slaves states and sending them detailed R endering Rendering Rendering Rendering j cluster instructions cluster cluster cluster j For flexibility and performance instructions I I I I I I are sent to the nodes through a ction t ables simiCompositing Compositing lar to SIMD microcode To ask the nodes to cluster cluster perform some action the master broadcasts the Rendering pipeline address of the function to be executed On receiving that instruction the slaves execute the function With this method it simple to add s Image new functionality because the added functionL ow bandwidth 4 s equence ality can be performed locally without changing global files Also every function can be optimized independently with its own commuF igure 2 In PVR the master node receives high level commands nication protocol One shortcoming of this that are converted into virtual microcode by action tables When communication method as with SIMD marendering is the task at hand the high level commands are for genchines is that you must be careful with nonunierating animations by rotations and translations The rendering form execution in particular because the Intel clusters work in parallel The collector groups images together and NX communication library both OSF and Sunsends an ordered image sequence to the client mos have support for NX has limited functionality for handling nodes as groups For example in setting up barriers with NX it impossible to s g roups to perform larger tasks Cluster configuselect a group from all allocated nodes Newer rations require only that the basic functions be communication libraries such as MPI solve this specified in user defined libraries linked in a sinshortcoming by introducing groups of nodes gle binary During runtime you can use the The master node divides other nodes into master to reconfigure clusters according to imclusters Each cluster has a specialized compumediate goals and use the PVR shell to interactational task multiple clusters can cooperate in tively send such commands Figure 2 shows how


    T able 1 Some external

    PVR commands

    They can be typed

    interactively

    placed in

    W INTER 1996

    23

    P VR PARALLEL VOLUME RENDERING

    tttt t t 1

    R endering cluster

    f

    L
    0 Figure 4 The internal structure

    C ompositing cluster of one compositbetween the



    ing cluster one rendering

    cluster and their interclusters is flexible tree hansevin the consistency

    R ay
    of a volume partition shown the in

    connection cornpositing

    In PVR communication and rendering

    F igure 3 Data partitioning 2D cross section marked with volume hierarchical into eight binary manner

    since the first level of the cornpositing dles a set of tokens to guarantee eral rendering

    The seven lines planes in 3D numbers pieces A to H in a canonical A line of sight ray with disthe back as dots passes through

    clusters can work together

    same image Because of its tree structure one properly synchronized cornpositing cluster can work on several images at once depending depth The cornpositing cluster shown the decomposition in Figure 3 on its relates to

    crete samples shown

    volume The ray samples get cornposited s into a single value as shown in Figure 4

    t he configuration for the PVR system highs performance volume renderer makes use of such a clustering scheme This clustering paradigm could help in implementing user defined computational steering This would usually be done by adding the functionality to the action tables for example linking the computational code with PVR dispatching code and also adding extra options to the PVR shell to interactively modify the relevant parameters PVR volume rendering code was the inspiration for this overall code organization and is a very good application to demonstrate its features However because this article focuses on describing the PVR system not on the volumerendering code in the next section we only sketch the implementation
    V olume rendering pipeline

    B esides the master node the PVR rendering pipeline is composed of rendering n odes compositing nodes a nd a c ollector node usually just one as in Figure 2 Optimal rendering performance and flexibility require this specialization All the clusters work in a simple dataflow mode where data

    m oves from top to bottom in a pipeline fashion Every cluster has its own fan in and fan out number and type of messages see Figures 3 and 4 The master configures and reconfigures the overall dataflow with user defined and automatic load balancing parameters Rendering clusters reside at the top level The clusters nodes resample and shade a given volume data set Generally the input is a view matrix and the output is a set of subimages each related to a node in the compositing binary tree The master can use multiple rendering clusters working on the same image but on disjoint scanlines to speed up rendering Once PVR computes the subimages they are passed down the pipeline to the compositing clusters The compositing clusters are organized in a binary tree structure matching that of the compositing tree that corresponds to the decomposition of the volume on the rendering nodes The number of processors doing compositing can differ from the number of nodes in the compositing tree as we can apply v ivtzbalization t o fake more processors than allocated We pipeline images down the tree with every iteration combining the cornpositing results until all the pixels

    24

    IEEE COMPUTATIONAL SCIENCE ENGINEERING

    a re a complete depth ordered sequence At the root of the compositing tree pixels are converted to red green blue RGB format and sent to the collector node s The collector node receives RGB images from the compositing nodes and compresses them with a simple fast runlength encoding scheme Finally the system either sends the images to the PVR shell for user viewing or saving or locally caches them on the disk Details more completely describing our system and performance issues related to CPU speed synchronization and memory usage appear elsewhere 3 4

    F igure 5 A simple PVR program program executed write renders in batch

    with

    a set of PVR rendering brain The commands

    commands

    This You can

    images of a human or can be typed

    can be put in a file and or mixed of the actions

    interactively

    on the keyboard

    Tcl Tk code for example

    stat tcl

    to take care of portions

    R endering with PVR
    F igure 5 shows a simple PVR program which demonstrates the seamless integration with Tcl Tk the flexible load balancing scheme and the interactive specification of parameters The set command can have several options in Figure 5 options are usually specified in multiple lines but could be specified in a single line For instance images z specifies the size of images output by the system A cluster of multiple nodes and a grozlp of clusters are the two basic components of the PVR system load balancing scheme Together they s specify flexible configurations of image space object space and time space parallelism The master node assigns different image scanlines to rendering clusters and assigns each group of clusters a complete image The cluster and group options are used to specify this unique capability of the PVR system load balancing s scheme With both options you can specify the relative sizes of the rendering and cornpositing clusters together with the image calculation allocation Several scalability strategies are possible A rendering cluster must be large enough to hold the entire data set and at least a copy of the image By increasing the cluster size its number of nodes the memory needed per node decreases By grouping clusters splitting the image computation across multiple clusters the number of scanlines per given cluster decreases

    lowering both the image memory requirements and the computational cost thus speeding up image calculation You can use the same commands to configure cornpositing clusters These don scale at the t same rate as rendering clusters because compositing is a relatively light computation highsynchronization operation unlike rendering Compositing nodes need memory to hold two copies of the images which can be quite large our parallel machine nodes only have between 16 and 32 Mbytes of RAM The compositing latency increases as the number of nodes increases the actual rate of increase depends on the height of the cornpositing tree

    H ow PVR can be used
    P VR is a flexible system that can be used for visualization in many ways For example the PVR system architecture facilitates the visualization of time varying data such as the time step volumes computed during a computational fluid dynamics simulation When rendering time varying data we add a permanent c aching ter to the pipeline in Figure 2 that efficiently distributes the volume data to the rendering nodes We use the caching nodes only as m art m emory they hide I O latency fi om disk or other sources and are used as buffer nodes to optimize the computation during our content based load balancing data distribution You can thus visualize a data set for as long as it takes to receive updated data Handling data that changes too rapidly that is faster than

    W INTER1996

    25

    P VR PARALLEL VOLUME RENDERING

    t ttttt t 1

    t otype GUI written in Tcl Tk developed at Sandia Necessary rendering parameters such as image size and transfer function are specified in the right window and the load balancing parameters in the left window This simple interface uses only a single session but we will be adding more functionality With the prototype GUI written in a well documented interface language users can straightforwardly add functionality to the PVR GUI as needed

    P VR performance

    results

    F igure 6 The simple PVR GUI The user specifies general rotations in the main window right Clusters are configured at upper left At lower left is a volume rendering of a 100 x 110 x 92 data set showing terface T cell receptor happening density on the surface of a T cell B cell inThis lets biologists clearly check that chemical interactions by immuno

    P VR has let us visualize numerous scientific data sets giving us useful performance information Our biggest challenge thus far is the limited memory on our Intel Paragon nodes It diffis cult horn the software engineering point of view to consistently and reliably allocate memory especially for visualization of very large data sets
    V isible Human

    are actually

    The data sets were generated

    fluorescence microscopy and prepared for visualization by deconvolution on Sandia Intel Paragon Volume rendering s animations were generated at multiple frames per second using PVR 6

    we can move and render it is impossible because it would require excessivebuffering Another use for our parallel renderer is as a visualization server for large computational parallel jobs For this you would preallocate nodes that can be shared somewhat by multiple users for visualizing their data To implement this server effectively you also need a caching clusd ter as described above The cluster in this case would cache alternate user data sets PVR can be used to develop distributed visualization environments by means of the client server metaphor A DVE developed with T Tk is very portable as Tcl Tk versions exist for almost all of the operating systems available and TCP IP which underlies our communication PVR protocol is virtually universal Table 1 listed more details on F igure 7 PVR volume rendering s ome of the primitives from o f the 512 x 512 x 1 877 voxel w hich DVEs can be built V isible Human data set F igure 6 shows a simple pro

    A t the Supercomputing conference in San 95 Diego we demonstrated PVR ability to vols ume render a 500 Mbyte data set the 5 12 x 5 1 2 x 1 877 voxel Visible Human from the National Institutes of Health see Figure 7 This is only a subset of the full Visible Human data set We did this with approximately 128 rendering nodes and 127 compositing nodes of the Intel Paragon at Sandia remotely displaying in San Diego Rendering a 512 x 5 12 image takes about 5 seconds per frame The main bottleneck is reading the 500 Mbytes of data from the Paragon disks which currently takes around 15 minutes Figure 8 shows rendering times for each frame of a 72 frame animation sequence of the Visible Human data set This is a full 360 degree rotation along the y axis The times are wall clock times calculated at the collector node as it receives the images and savesthem to a local disk Each image is 400 x 4 00 with three color channels For rendering PVR represents the images as an array of pixels each represented as four floating point numbers amounting to 16 bytes per pixel At 400 x 4 00 each image is over 2 5 Mbytes The system transmits images from the rendering nodes to the compositing nodes until they reach the root node of the compositing tree There we convert images to RGB format with one byte per color channel and transmit them to the collector node The collector savesthe final images each 480 000 bytes to disk Computing the complete animation takes 129 23 seconds or

    26

    IEEE COMPUTATIONAL SCIENCE ENGINEERING

    1 79 seconds per frame resulting in 32 Mbytes of data being saved to disk The noticeable peaks in the image generation time deserve further study We believe the source of the pipeline stalls is load imbalance and also contention in writing the images to disk the collector node stalls the pipeline whenever an image is received before the previous image is saved The first image takes considerable longer than the others this is the pipeline initialization cost Our next step is to extend the system to render the 11 RGB Visible Human data set 14 Gbytes with high temporal resolution that is many frames for the rotational animation A 72frame rotation uses 5 degree increments Smaller increments are highly desirable but a 0 5 degree increment would expand the animation files to more than 300 Mbytes This project would require the use of parallel I O a capability that we currently lack and dedicated use of a very large parallel machine such as the entire 1 840 node Intel Paragon at Sandia
    S caling experiments

    0 0

    IO

    20

    30

    40

    50

    60

    I 70

    F rame number out of 72 frames
    F igure 8 PVR rendering mation Human sequence times for a 72 frame ani

    of the 512 x 512 x 1 877 Visible

    data set Each image is 400 x 400

    T o show PVR scalability we used a 256 x s 256 x 937 version of the Visible Human data set Table 2 shows the rendering times for five different configurations varying the number of rendering and compositing nodes While rendering scales reasonably well a comparison of rows 2 and 4 and rows 3 and 5 in the table indicates that it is apparently not cost effective to increase the size of the cornpositing cluster for relatively small images 3

    V R introduces a new level of interactivity to high performance visualization Larger distributed visualization environments can be built on top of PVR and yet be portable across several architectures These DVEs that use PVR have the opportunity to effectively use available processing power up to a few hundred processors giving a range of cost performance to end users PVR is a strong foundation for building costeffective DVEs PVR also introduces a simple way to create user interfaces No longer must users spend time coding in X Motif or Windows to create the desired user interface The TUTk combination is much simpler gives more flexibility and is nearly as powerful as the other alternatives Even though we have completed a usable efficient system much work remains We are for example making the system stable enough for

    P

    g eneral distribution and we are creating a more complete DVE using the VolVis system7 as a model on top of PVR Functionality now missing from PVR must be incorporated The most important element is probably the support for multiple data sets in a session Implementing this capability may complicate the load balancing scheme and simple heuristics might not generate well balanced decomposition schemes If the volumes were allowed to overlap as in VolVis the problem would be even harder and the solution would require heavier processing on the cornpositing end It might be necessary to have a reconfiguration phase whenever a new volume is introduced although how to do so efficiently is unclear Research is ongoing to incorporate irregular grid rendering in PVR Moreover we are considering adding a recent algorithm that exploits a high level of locality which should ultimately lead to more efficient communication schemes Finally we are porting PVR to use MPI as the communication layer instead of NX
    T able 2 Scalability sequence were 250 x 250 Rendering nodes Cornpositing nodes Total rendering t ime s Mean time p er frame 5 of PVR rendering times on a 72 frame animation Images

    of a 256 x 256 x 937 version

    of the Visible Human

    and clusters

    16 32 64 32 64

    1 2 4 1 1

    cluster clusters of 16 clusters of 16 cluster cluster

    15 15 15 31 63

    1 04 10 67 24 56 73 71 42 58 79

    1 44 0 93 0 78 0 99 0 81

    W INTER 1 996

    27

    P VR PARALLEL VOLUME RENDERING I 1 I I I

    tt t t t L

    F or More Information
    r lore PVR related information including publications images and several animations of the data described in this article can be found at 1 http www cs sunysb edu vislab http cg ams sunysb edu pvr http www cs sandia gbv VIS

    PVR source code Intel Paragon version is available to users willing to provide feedback tp our beta testing program Contact csilva ams sunysb edu

    tion of Proteins in Cellular Interactions Proc Visualization ACM Press New York pp 96 363 366 7 R Avila et al VolVis A Diversified Volume Visualization System Proc Visz lization IEEE 94 Computer Sot Press Los Alamitos Calif 1994 pp 3 l 38 8 C Silva J S B Mitchell and A E Kaufman Fast Rendering of Irregular Grids in 1996 Symp Volume Visualization ACM Press New York pp 15 22 Ctiudio Silva is a research associatein the Department ofApplied Mathematics and Statistics State University o f N ew York at Stony Brook His research interestsare in computer graphics scient c visualization and high performance computing He received a BS in mathematicsj om the Federal University of Ceark Brazil and an MS and PhD in computer science j om Sl JNYat Stony Brook He is a member ofACM IEEE Computer Society and the Society for Industrial and Applied Mathematics A rie E lGx nan is director o f t he Center f ey V ual Computing and Leading Professor o f c omputer science and radiology at the State University o f N ew York at Stony Brook He received a BS in mathematics and physicsporn the Hebrew University ofJerusalem an MS in computersciencefiom the Weizmann Institute o f S cience Rehovot and a PbD in computer science from the Ben Gurion University Imael He is editor in chief of IEEE Transactions on Visualization and Cornputer Graphics and is a director o f t he IEEE Computer Society Technical Committee on Computer Graphics tiu man received a 199s IEEE Outstandirzg Contribution Award and a 1996 IEEE Computer Society Golden Core Member recognition Constantine Pavlukos is a senior member of the tecbnical staffat Sandia National Laboratories His interests include distributed visualization architectures for high performance computing environments volume visualization parallel visualization algorithms visualization of very large data sets hierarchical data tecbniques vimal reality and multimedia He received an MS in computer scienceand a BS in mathematics both j om the University o f N ew Mexico He is a member of ACM

    A cknowledgments
    W e thank Maurice Fan Lok f or c o writing the rti version of PVR Brian Wylie for project szlpport and user interface development and Dirk Bartz Tzi cker Chiueb Pat Crossmo Steve Dawson Juliana Freire Tong Lee Ron Peierls and Amitabh Varsbneyfor belpfil discussions the PVR to Sunmos port assistance For thanks to Kevin McCurley Rolf Riesen Lance Shuler porn Sandia and EdwardJ Barvagy fi om Intel Image data is courtesy of the following MRI bead Siemens Visible Human the National Inshutes of Health cell Colin Monksfiom the NationalJewish Centerfor Immunology and Respiratoy Medicine and George Davidson j om Sandia C Silva is partially supported by CNPq Brazil under a PhD f wsbip Sandia National Labs and the Dept o fEnergy M athematics InfirmaO tion and Computer S cience fie and by the National Science Foundation grant CDA 9626 70 A tiuf man is partially supported by the NSF under g rants C CR 9205047 DCA 9303181 MIP 9527694 and by the Dept of Energy under the PIGS grant

    R eferences
    1 A E Kaufman ed Volume Esualization IEEE Computer Sot Press Los Alamitos Calif 1991 2 M Snir et al MPI The Complete Reference MIT Press Cambridge Mass 1995 3 C Silva Parallel Volume Rendering of Irregular Grids doctoral dissertation State University of New York at Stony Brook Dept of Computer Sci 1996 4 C Silva A E Kaufman and C Pavlakos The PVR System Tech Report TR96 06 10 Dept of Computer Sci State University of New York at Stony Brook 1996 5 C Pavlakos L Schoof and J Mareda A visu alization Model for Supercomputing Environments IEEE Parallel 6 Distributed Technology Vol 1 No 4 Nov 1993 pp 16 22 6 C Monks et al Three Dimensional Visualiza

    Corresponding author Cl o 7 Silva Dept ofApplied Math and Statistics State Univ o f N ew York at Stony Brook Stony Brook NY 11794 e mail csilva ams sunysb edu w bttp cg ams sunysb edu csilva

    28

    IEEEC OMPUTATIONAL SCIENCE ENGINEERING