Parallel Processors (e.g., Identical Processors) Patents (Class 345/505)

Computer hardware architecture and data structures for a grid traversal unit to support incoherent ray traversal

Patent number: 8847959

Abstract: A new hardware architecture defines an indexing and encoding method for accelerating incoherent ray traversal. Accelerating multiple ray traversal may be accomplished by organizing the rays for minimal movement of data, hiding latency due to external memory access, and performing adaptive binning. Rays may be binned into coarse grain and fine grain spatial bins, independent of direction.

Type: Grant

Filed: February 13, 2014

Date of Patent: September 30, 2014

Assignee: Raycast Systems, Inc.

Inventor: Alvin D. Zimmerman
Graphics processing unit with command processor

Patent number: 8842122

Abstract: Aspects of the disclosure relate to a method of controlling a graphics processing unit. In an example, the method includes receiving one or more tasks from a host processor, and scheduling, independently from the host processor, the one or more tasks to be selectively executed by a shader processor and one or more fixed function hardware units, wherein the shader processor is configured to execute a plurality of instructions in parallel, and the one or more fixed function hardware units are configured to render graphics data.

Type: Grant

Filed: December 15, 2011

Date of Patent: September 23, 2014

Assignee: QUALCOMM Incorporated

Inventors: Petri Olavi Nordlund, Jukka-Pekka Arvo, Robert J. Simpson
Computer hardware architecture and data structures for lookahead flags to support incoherent ray traversal

Patent number: 8842117

Abstract: A new hardware architecture defines an indexing and encoding method for accelerating incoherent ray traversal. Accelerating multiple ray traversal may be accomplished by organizing the rays for minimal movement of data, hiding latency due to external memory access, and performing adaptive binning. Rays may be binned into coarse grain and fine grain spatial bins, independent of direction.

Type: Grant

Filed: February 13, 2014

Date of Patent: September 23, 2014

Assignee: Raycast Systems, Inc.

Inventor: Alvin D. Zimmerman
Stream compaction for rasterization

Patent number: 8842121

Abstract: A single instruction multiple data (SIMD) processor with a given width may operate on registers of the same width completely filled with fragments. A parallel set of registers are loaded and tested. The fragments that fail are eliminated and the register set is refilled from the parallel set.

Type: Grant

Filed: February 3, 2011

Date of Patent: September 23, 2014

Assignee: Intel Corporation

Inventors: Tomas Akenine-Möller, Jon N. Hasselgren, Carl J. Munkberg, Robert M. Toth, Franz P. Clarberg
Buffers for display acceleration

Patent number: 8842133

Abstract: Embodiments enable a graphics processor to more efficiently process graphics and compositing processing commands. In certain embodiments, a client application submits client graphics commands to a graphics driver. The client in certain embodiments can notify a window server that client graphics commands have been submitted. In response, the window server can generate compositing processing commands and provide these commands to the graphics driver. Advantageously, a graphics processor can execute the client graphics commands while the window server generates compositing processing commands. As a result, processing resource can be used more efficiently.

Type: Grant

Filed: June 26, 2013

Date of Patent: September 23, 2014

Assignee: Apple Inc.

Inventors: John Harper, Kenneth C. Dyke
Device for the parallel processing of a data stream

Patent number: 8836708

Abstract: A device for processing a data stream originating from a device generating matrices of Nl rows by Nc columns of data includes K computation tiles and interconnection means for transferring the data stream between the computation tiles. At least one computation tile includes: one or more control units to provide instructions, n processing units, each processing unit carrying out the instructions received from a control unit on a neighborhood of Vl rows by Vc columns of data, a storage unit to place the data of the stream in the form of neighborhoods of Vl rows by (n+Vc?1) columns of data. The storage unit includes a block of shaping memories of dimension Vl×Nc and a block of neighborhood registers of dimension Vl×(n+Vc?1), an input/output unit to convey the data stream between the interconnection means and the storage unit on the one hand, and between the processing units and the interconnection means on the other hand.

Type: Grant

Filed: June 8, 2009

Date of Patent: September 16, 2014

Assignee: Commissariat a l'Energie Atomique et aux Energies Alternatives

Inventors: Laurent Letellier, Mathieu Thevenin
System on Chip Having Processing and Graphics Units

Publication number: 20140253565

Abstract: System on chip comprising a general purpose processing element, a graphics processing unit and a display interface, supporting graphics visualization on mobile computing devices and on embedded systems.

Type: Application

Filed: May 19, 2014

Publication date: September 11, 2014

Inventor: Reuven Bakalash
Non-linear image mapping using a plurality of non-linear image mappers of lesser resolution

Patent number: 8830268

Abstract: A display system and method for displaying an image on a non-planar display that allows the images to be mapped by image mappers while encompassing image data of an adjacent sub-image or sub-images. This allows a single unified image to be displayed in real time without any tearing or positional/angular artifacts at the image boundaries.

Type: Grant

Filed: November 7, 2008

Date of Patent: September 9, 2014

Assignee: Barco NV

Inventors: Robert M. Clodfelter, Jeff Bayer, Paul McHale, Brad Smith
Image processing system utilizing plural parallel processors and image processing method utilizing plural parallel processors

Patent number: 8830506

Abstract: An image processing system includes intermediate-data generating apparatuses and one or more drawing-data generating apparatuses. The intermediate-data generating apparatuses interpret data of pages forming PDL document data, the pages being assigned to the corresponding intermediate-data generating apparatuses, to generate elements of intermediate data of the pages. The drawing-data generating apparatuses each obtain assigned elements of the intermediate data and each draw the obtained elements to generate drawing data including information concerning pixels forming each obtained element. The drawing-data generating apparatuses each include a memory that stores intermediate data or drawing data of a common element used in the obtained elements. If the intermediate data or the drawing data of the common element is stored in the memory, the drawing-data generating apparatuses generate drawing data of the obtained elements using the stored intermediate data or drawing data.

Type: Grant

Filed: August 4, 2011

Date of Patent: September 9, 2014

Assignee: Fuji Xerox Co., Ltd.

Inventor: Michio Hayakawa
Image formation processing apparatus and image processing method

Patent number: 8824010

Abstract: To realize effective load distribution and improve the performance in image formation processing, an image processing apparatus includes a first image processing unit configured to perform image processing on a drawing area, a second image processing unit configured to be differentiated from the first image processing unit, a load analysis unit configured to analyze a composition processing load of an object in the drawing area, a rotational angle analysis unit configured to analyze a rotational angle of the object in the drawing area, and a load distribution determination unit configured to determine whether to distribute a part of image formation processing to be applied on the drawing area from the first image processing unit to the second image processing unit based on the analyzed composition processing load of the object and the analyzed rotational angle of the object.

Type: Grant

Filed: October 23, 2012

Date of Patent: September 2, 2014

Assignee: Canon Kabushiki Kaisha

Inventor: Hiroshi Mori
FINE-GRAINED CPU-GPU SYNCHRONIZATION USING FULL/EMPTY BITS

Publication number: 20140240327

Abstract: A heterogeneous computing system includes a central processing unit (CPU) and a graphics processing unit (GPU). The CPU and the GPU are synchronized using a data-based synchronization scheme, wherein offloading of a kernel from the CPU to the GPU is coordinated based upon the data associated with the kernel transferred between the CPU and the GPU. By using a data-based synchronization scheme, additional synchronization operations between the CPU and the GPU are reduced or eliminated, and the overhead of offloading a process from the CPU to the GPU is reduced.

Type: Application

Filed: February 22, 2013

Publication date: August 28, 2014

Applicant: THE TRUSTEES OF PRINCETON UNIVERSITY

Inventor: THE TRUSTEES OF PRINCETON UNIVERSITY
Distributed stream output in a parallel processing unit

Patent number: 8817031

Abstract: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.

Type: Grant

Filed: September 29, 2010

Date of Patent: August 26, 2014

Assignee: NVIDIA Corporation

Inventors: Ziyad S. Hakura, Rohit Gupta, Michael C. Shebanow, Emmett M. Kilgariff
GPGPU systems and services

Patent number: 8817030

Abstract: Graphics processing units (GPUs) deployed in general purpose GPU (GPGPU) units are combined into a GPGPU cluster. Access to the GPGPU cluster is then offered as a service to users who can use their own computers to communicate with the GPGPU cluster. The users develop applications to be run on the cluster and a profiling module tracks the applications' resource utilization and can report it to the user and to a subscription server. The user can examine the report to thereby optimize the application or the cluster's configuration. The subscription server can interpret the report to thereby invoice the user or otherwise govern the users' access to the cluster.

Type: Grant

Filed: September 30, 2010

Date of Patent: August 26, 2014

Assignee: CreativeC LLC

Inventors: Greg Scantlen, Gary Scantlen
Allocation of GPU resources across multiple clients

Patent number: 8803892

Abstract: Methods, apparatuses and systems directed to hosting, on a computer system, a plurality of application instances, each application instance corresponding to a remote client application; maintaining a network connection to each of the remote client applications for which an application instance is hosted; allocating resources of a graphics processing unit of the computer system between at least two of the remote client applications; concurrently rendering, utilizing the resources of the graphics processing unit of the computer system, the graphical output of the application instances corresponding to the at least two of the remote client applications; and transmitting the rendered graphical output to the at least two of the remote client applications over the respective network connections.

Type: Grant

Filed: June 10, 2010

Date of Patent: August 12, 2014

Assignee: Otoy, Inc.

Inventor: Julian Michael Urbach
Image data processing apparatus

Patent number: 8803893

Abstract: An image data processing apparatus includes: a plurality of operational processing circuits each of which is configured to have a variable circuit configuration and to execute operational processing on image data; and a control section that controls each of the operational processing circuits such that each of the operational processing circuits executes one of a plurality of types of operational processing performed on image data in a predetermined order. The control section controls each of the operational processing circuits so that when image data to be newly given to one of the operational processing circuits is interrupted, said one of the operational processing circuits and another one of the operational processing circuits execute operational processing by taking partial charge of the operational processing.

Type: Grant

Filed: March 8, 2010

Date of Patent: August 12, 2014

Assignee: Fuji Xerox Co., Ltd.

Inventors: Makoto Shimamura, Susumu Kimura
Method for preempting graphics tasks to accommodate compute tasks in an accelerated processing device (APD)

Patent number: 8803891

Abstract: Embodiments described herein provide a method of arbitrating a processing resource. The method includes receiving a command to preempt a task and preventing additional wavefronts associated with the task from being processed. The method also includes evicting currently executing wavefronts associated with the task from being processed based upon predetermined criteria.

Type: Grant

Filed: November 30, 2011

Date of Patent: August 12, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller
Graphics scenegraph rendering for web applications using native code modules

Patent number: 8797337

Abstract: One embodiment provides a system that facilitates the execution of a web application. During operation, the system loads a native code module that includes a scenegraph renderer into a secure runtime environment. Next, the system uses the scenegraph renderer to create a scenegraph from a graphics model associated with the web application and generate a set of rendering commands from the scenegraph. The system then writes the rendering commands to a command buffer and reads the rendering commands from the command buffer. Finally, the system uses the rendering commands to render, for the web application, an image corresponding to the graphics model by executing the rendering commands using a graphics-processing unit (GPU).

Type: Grant

Filed: July 2, 2009

Date of Patent: August 5, 2014

Assignee: Google Inc.

Inventors: Antoine Labour, Matthew Papakipos
Facilitating efficient switching between graphics-processing units

Patent number: 8797334

Abstract: The disclosed embodiments provide a system that facilitates seamlessly switching between graphics-processing units (GPUs) to drive a display. In one embodiment, the system receives a request to switch from using a first GPU to using a second GPU to drive the display. In response to this request, the system uses a kernel thread which operates in the background to configure the second GPU to prepare the second GPU to drive the display. While the kernel thread is configuring the second GPU, the system continues to drive the display with the first GPU and a user thread continues to execute a window manager which performs operations associated with servicing user requests. When configuration of the second GPU is complete, the system switches the signal source for the display from the first GPU to the second GPU.

Type: Grant

Filed: January 6, 2010

Date of Patent: August 5, 2014

Assignee: Apple Inc.

Inventors: Thomas W. Costa, Simon M. Douglas, David J. Redman
Image processing apparatus having enhanced display mode and image processing method thereof

Patent number: 8792108

Abstract: An image processing apparatus includes an image-processing designating unit that allows a user to designate predetermined image processing to be applied to image data for generating a preview image that represents a state of an output image before image output; a preview-image generating unit that generates a preview image in accordance with the designated image processing; a preview-image display unit that displays the preview image generated by the preview-image generating unit; and a display-mode switching control unit that, when the preview image is displayed, switches to a display mode with an enhanced viewability relative to a power-saving display state in accordance with a content of the designated image processing.

Type: Grant

Filed: July 29, 2011

Date of Patent: July 29, 2014

Assignee: Ricoh Company, Limited

Inventor: Tomoyuki Yoshida
Parallel processing for distance transforms

Patent number: 8786616

Abstract: Parallel processing for distance transforms is described. In an embodiment a raster scan algorithm is used to compute a distance transform such that each image element of a distance image is assigned a distance value. This distance value is a shortest distance from the image element to the seed region. In an embodiment two threads execute in parallel with a first thread carrying out a forward raster scan over the distance image and a second thread carrying out a backward raster scan over the image. In an example, a thread pauses when a cross-over condition is met until the other thread meets the condition after which both threads continue. In embodiments distances may be computed in Euclidean space or along geodesics defined on a surface. In an example, four threads execute two passes in parallel with each thread carrying out a raster scan over a different quarter of the image.

Type: Grant

Filed: December 11, 2009

Date of Patent: July 22, 2014

Assignee: Microsoft Corporation

Inventors: Toby Sharp, Antonio Criminisi
Chaining image-processing functions on a SIMD processor

Patent number: 8786614

Abstract: In a single-instruction-multiple-data (SIMD) processor having multiple lanes, and local memory dedicated to each lane, a method of processing an image is disclosed. The method comprises mapping consecutive rasters of the image to consecutive lanes such that groups of consecutive rasters form image strips, and vertical stacks of strips comprise strip columns. Local memory allocates memory to the image strips. A sequence of functions is processed for execution on the SIMD processor in a pipeline implementation, such that the pipeline loops over portions of the image in multiple iterations, and intermediate data processed during the functions is stored in the local memory. Data associated with the image is traversed by first processing image strips from top to bottom in a left-most strip column, then progressing to each adjacent unprocessed strip column.

Type: Grant

Filed: May 2, 2013

Date of Patent: July 22, 2014

Assignee: Calos Fund Limited Liability Company

Inventors: Donald James Curry, Ujval J. Kapasi
Parallelization of random number generation processing by employing GPU

Patent number: 8786617

Abstract: A method of carrying out random number generation processing uses a GPU including a plurality of blocks each including at least one core, the random number generation processing including update processing of updating state vectors and conversion processing of converting the updated state vectors into random numbers having another distribution. The method includes carrying out, by one of the plurality of blocks, the update processing (S3), and carrying out, by the plurality of blocks, the conversion processing in parallel based on results of the update processing (S9). Therefore, it is possible to more efficiently generate a random number sequence which is the same as the one obtained through random number generation processing performed in a serial manner, by parallelizing a single random number generator in a GPU.

Type: Grant

Filed: March 2, 2011

Date of Patent: July 22, 2014

Assignee: Mizuho-DL Financial Technology Co. Ltd.

Inventor: Tomohisa Yamakami
Parallel Image Processing System

Publication number: 20140192065

Abstract: System and method for a parallel image processing mechanism for applying mask data patterns to substrate in a lithography manufacturing process are disclosed. In one embodiment, the parallel image processing system includes a graphics engine configured to partition an object into a plurality of trapezoids and form an edge list for representing each of the plurality of trapezoids, and a distributor configured to receive the edge list from the graphics engine and distribute the edge list to a plurality of scan line image processing units. The system further includes a sentinel configured to synchronize operations of the plurality of scan line image processing units, and a plurality of buffers configured to store image data from corresponding scan line image processing units and outputs the stored image data using the sentinel.

Type: Application

Filed: March 10, 2014

Publication date: July 10, 2014

Applicant: PINEBROOK IMAGING, INC.

Inventors: BARRY KEANE, THOMAS LAIDIG
Video multiviewer system with serial digital interface and related methods

Patent number: 8773469

Abstract: A video multiviewer system may include a plurality of video scalers operating in parallel for generating initially scaled video streams by performing video scaling in at least one dimension on a plurality of video input streams. The video multiviewer system may also include at least one video cross-point switcher coupled downstream from the video scalers, and a processing unit coupled downstream from the video cross-point switcher for generating additionally scaled video streams by performing additional video scaling on the initially scaled video stream. The video scalers and the processing unit may communicate through the video cross-point switcher using a serial digital interface.

Type: Grant

Filed: April 9, 2008

Date of Patent: July 8, 2014

Assignee: Imagine Communications Corp.

Inventors: Marcin Andrzej Komorowski, Cristian Camer, Anthony Singh
Rendering of stereoscopic images with multithreaded rendering software pipeline

Patent number: 8773449

Abstract: A circuit arrangement, program product and circuit arrangement render stereoscopic images in a multithreaded rendering software pipeline using first and second rendering channels respectively configured to render left and right views for the stereoscopic image. Separate transformations are applied to received vertex data to generate transformed vertex data for use by each of the first and second rendering channels in rendering the left and right views for the stereoscopic image.

Type: Grant

Filed: September 14, 2009

Date of Patent: July 8, 2014

Assignee: International Business Machines Corporation

Inventors: Russell Dean Hoover, Eric Oliver Mejdrich, Paul Emery Schardt, Robert Allen Shearer
Synchronous parallel pixel processing for scalable color reproduction systems

Patent number: 8773446

Abstract: What is disclosed is a novel system and method for parallel processing of intra-image data in a distributed computing environment. A generic architecture and method are presented which collectively facilitate image segmentation and block sorting and merging operations with a certain level of synchronization in a parallel image processing environment which has been traditionally difficult to parallelize. The present system and method enables pixel-level processing at higher speeds thus making it a viable service for a print/copy job document reproduction environment. The teachings hereof have been simulated on a cloud-based computing environment with a demonstrable increase of ?2× with nominal 8-way parallelism, and an increase of ?20×-100× on a graphics processor. In addition to production and office scenarios where intra-image processing are likely to be performed, these teachings are applicable to other domains where high-speed video and audio processing are desirable.

Type: Grant

Filed: February 9, 2011

Date of Patent: July 8, 2014

Assignee: Xerox Corporation

Inventors: Shanmuga-Nathan Gnanasambandam, Lalit Keshav Mestha
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IDENTIFYING A FAULTY PROCESSING UNIT

Publication number: 20140184616

Abstract: A system, process, and computer program product are provided for identifying a faulty processing unit. A shader program that configures a plurality of processing units to generate data is executed and the data is compared with verification data to produce a test result. The test result is examined to identify a faulty processing unit of the plurality of processing units, where a unique identifier corresponding to each processing unit is encoded into the data generated by the respective processing unit.

Type: Application

Filed: December 28, 2012

Publication date: July 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Apoorv Gupta, David William Crowe, Carl William Davies
Method and system for dynamically adding and removing display modes coordinated across multiple graphics processing units

Patent number: 8766989

Abstract: The present invention provides a method and system for coordinating graphics processing units in a single computing system. A method is disclosed which allows for the construction of a list of shared display modes that may be employed by both of the graphics processing units to render an output in a display device. By creating the list of shared commonly supportable display modes, the output displayed in the display device may advantageously provide a consistent graphical experience persisting through the use of alternate graphics processing units in the system. One method builds a list of shared display modes by compiling a list from a GPU specific base mode list and dynamic display modes acquired from an attached display device. Another method provides the ability to generate graphical output configurations according to a user-selected display mode that persists when alternate graphics processing units in the system are used to generate graphical output.

Type: Grant

Filed: July 29, 2009

Date of Patent: July 1, 2014

Assignee: Nvidia Corporation

Inventors: David Wyatt, Linda Glanville
SYSTEM AND METHOD FOR GRAPHICAL PROCESSING OF MEDICAL DATA

Publication number: 20140176576

Abstract: The invention provides a computer server with a graphical processer that can process data from multiple medical imaging systems simultaneously. Data sets can be provided by any suitable imaging system (x-ray, angiography, PET scans, MRI, IVUS, OCT, cath labs, etc.) and a processing system of the invention allocates resources in the form of a virtual machine, processing power, operating system, applications, etc., as-needed. Embodiments of the invention may find particular application with cath labs due to the particular processing requirements of typical cath lab systems.

Type: Application

Filed: December 16, 2013

Publication date: June 26, 2014

Applicant: VOLCANO CORPORATION

Inventor: Jason Spencer
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR TILED DEFERRED SHADING

Publication number: 20140176575

Abstract: A system, method, and computer program product are provided for tiled deferred shading. In operation, a plurality of photons associated with at least one scene are identified. Further, a plurality of screen-space tiles associated with the at least one scene are identified. Additionally, each of the plurality of screen-space tiles capable of being affected by a projection of an effect sphere for each of the plurality of photons are identified. Furthermore, at least a subset of photons associated with each of the screen-space tiles from which to compute shading are selected. Moreover, shading for the at least one scene is computed utilizing the selected at least a subset of photons.

Type: Application

Filed: August 30, 2013

Publication date: June 26, 2014

Applicant: NVIDIA Corporation

Inventors: Morgan McGuire, Michael Thomas Mara, David Patrick Luebke, Jacopo Pantaleoni
Method and Apparatus for Interprocessor Communication Employing Modular Space Division

Publication number: 20140176574

Abstract: Novel method and system for distributed database ray-tracing is presented, based on modular mapping of scene-data among processors. Its inherent properties include scattering data among processors for improved load balancing, and matching between geographical proximity in the scene with communication proximity between processors. High utilization is enabled by unique mechanism of cache sharing. The resulting improved performance enables deep level of ray tracing for real time applications.

Type: Application

Filed: December 26, 2012

Publication date: June 26, 2014

Inventor: Reuven Bakalash
Restart index that sets a topology

Patent number: 8760455

Abstract: One embodiment of the present invention sets forth a technique for reducing overhead associated with transmitting primitive draw commands from memory to a graphics processing unit (GPU). Command pairs comprising an end draw command and a begin draw command associated with a conventional graphics application programming interface (API) are selectively replaced with a new construct. The new construct is a reset topology index, which implements a combined function of the end draw command and begin draw command. The new construct improves efficiency by reducing total data transmitted from memory to the GPU.

Type: Grant

Filed: October 4, 2010

Date of Patent: June 24, 2014

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Thomas Roell, James C. Bowman
ASYNCHRONOUS COMPUTE INTEGRATED INTO LARGE-SCALE DATA RENDERING USING DEDICATED, SEPARATE COMPUTING AND RENDERING CLUSTERS

Publication number: 20140168230

Abstract: An asynchronous computing and rendering system includes a data storage unit that provides storage for processing a large-scale data set organized in accordance to data subregions and a computing cluster containing a parallel plurality of asynchronous computing machines that provide compute results based on the data subregions. The asynchronous computing and rendering system also includes a rendering cluster containing a parallel multiplicity of asynchronous rendering machines coupled to the asynchronous computing machines, wherein each rendering machine renders a subset of the data subregions. Additionally, the asynchronous computing and rendering system includes a data interpretation platform coupled to the asynchronous rendering machines that provides user interaction and rendered viewing capabilities for the large-scale data set. An asynchronous computing and rendering method is also provided.

Type: Application

Filed: December 19, 2012

Publication date: June 19, 2014

Applicant: NVIDIA CORPORATION

Inventors: Marc Nienhaus, Joerg Mensmann, Hitoshi Yamauchi
FINE-GRAINED PARALLEL TRAVERSAL FOR RAY TRACING

Publication number: 20140168228

Abstract: Techniques are disclosed for tracing a ray within a parallel processing unit. A first thread receives a ray or a ray segment for tracing and identifies a first node within an acceleration structure associated with the ray, where the first node is associated with a volume of space traversed by the ray. The thread identifies the child nodes of the first node, where each child node is associated with a different sub-volume of space, and each sub-volume is associated with a corresponding ray segment. The thread determines that two or more nodes are associated with sub-volumes of space that intersect the ray segment. The thread selects one of these nodes for processing by the first thread and another for processing by a second thread. One advantage of the disclosed technique is that the threads in a thread group perform ray tracing more efficiently in that idle time is reduced.

Type: Application

Filed: December 13, 2012

Publication date: June 19, 2014

Applicant: NVIDIA Corporation

Inventors: David LUEBKE, Timo AILA, Jacopo PANTALEONI, David TARJAN
CPU-GPU PARALLELIZATION

Publication number: 20140168229

Abstract: Embodiments described herein relate to improving throughput of a CPU and a GPU working in conjunction to render graphics. Time frames for executing CPU and GPU work units are synchronized with a refresh rate of a display. Pending CPU work is performed when a time frame starts (a vsync occurs). When a prior GPU work unit is still executing on the GPU, then a parallel mode is entered. In the parallel mode, some GPU work and some CPU work is performed concurrently. When the parallel mode is exited, for example when there is no CPU work to perform, the parallel mode may be exited.

Type: Application

Filed: December 14, 2012

Publication date: June 19, 2014

Applicant: Microsoft

Inventor: Microsoft
Apparatus and method for selectable hardware accelerators

Patent number: 8754893

Abstract: A method and apparatus employing selectable hardware accelerators in a data driven architecture are described. In one embodiment, the apparatus includes a plurality of processing elements (PEs). A plurality of hardware accelerators are coupled to a selection unit. A register is coupled to the selection unit and the plurality of processing elements. In one embodiment, the register includes a plurality of general purpose registers (GPR), which are accessible by the plurality of processing elements, as well as the plurality of hardware accelerators. In one embodiment, at least one of the GPRs includes a bit to enable a processing element to enable access a selected hardware accelerator via the selection unit.

Type: Grant

Filed: October 19, 2011

Date of Patent: June 17, 2014

Assignee: Intel Corporation

Inventors: Louis A. Lippincott, Patrick F. Johnson
Silicon chip of a monolithic construction for use in implementing multiple graphic cores in a graphics processing and display subsystem

Patent number: 8754897

Abstract: A silicon chip of a monolithic construction for use in implementing a multiple core graphics processing and display subsystem in a computing system having a CPU, a system memory, an operating system (OS), a CPU bus, and a display device with a display surface. The computing system supports (i) one or more software applications for issuing graphics commands, (ii) one or more graphics libraries for storing data used to implement said graphics commands. The silicon chip comprises multiple graphic pipeline cores, a partial frame buffer for buffering pixels corresponding to image fragments, a routing center, control unit, and a display interface, for displaying composited images on the display surface of the computing system.

Type: Grant

Filed: November 15, 2010

Date of Patent: June 17, 2014

Assignee: Lucidlogix Software Solutions, Ltd.

Inventors: Reuven Bakalash, Offir Remez, Efi Fogel
Internet-based graphics application profile management system for updating graphic application profiles stored within the multi-GPU graphics rendering subsystems of client machines running graphics-based applications

Patent number: 8754894

Abstract: A multi-user computer network, in which graphics performance of client machines running graphics-based applications is optimized using an automated Internet-based graphics application profile management system. The automated Internet-based graphics application profile management system includes an Internet-based communication server, operably connected to the infrastructure of the Internet, and to a central database server, through an application server. The central database server stores graphic application profiles (GAPs) for different graphics-based applications that are capable of running on the client machines. The graphics application profiles are stored in a profile database in the multi-GPU graphics rendering subsystem of each client machine. The Internet-based communication server communicates with each client machine over the Internet, and automatically programs updated graphics application profiles (GAPs) in the profile database of each client machine.

Type: Grant

Filed: November 8, 2010

Date of Patent: June 17, 2014

Assignee: Lucidlogix Software Solutions, Ltd.

Inventors: Reuven Bakalash, Yaniv Leviathan
Memory Cell Array with Dedicated Nanoprocessors

Publication number: 20140160135

Abstract: A processing architecture uses stationary operands and opcodes common on a plurality of processors. Only data moves through the processors. The same opcode and operand is used by each processor assigned to operate, for example, on one row of pixels, one row of numbers, or one row of points in space.

Type: Application

Filed: December 28, 2011

Publication date: June 12, 2014

Inventor: Scott A. Krig
Multi-thread graphics processing system

Patent number: 8749563

Abstract: A graphics processing system comprises at least one memory device storing a plurality of pixel command threads and a plurality of vertex command threads. An arbiter coupled to the at least one memory device is provided that selects a pixel command thread from the plurality of pixel command threads and a vertex command thread from the plurality of vertex command threads. The arbiter further selects a command thread from the previously selected pixel command thread and the vertex command thread, which command thread is provided to a command processing engine capable of processing pixel command threads and vertex command threads.

Type: Grant

Filed: March 18, 2013

Date of Patent: June 10, 2014

Assignee: ATI Technologies ULC

Inventors: Laurent Lefebvre, Andrew Gruber, Stephen Morein
Hybrid graphic display

Patent number: 8736617

Abstract: A method of displaying graphics data is described. The method involves accessing the graphics data in a memory subsystem associated with one graphics subsystem. The graphics data is transmitted to a second graphics subsystem, where it is displayed on a monitor coupled to the second graphics subsystem.

Type: Grant

Filed: August 4, 2008

Date of Patent: May 27, 2014

Assignee: Nvidia Corporation

Inventors: Stephen Lew, Bruce R. Intihar, Abraham B. de Waal, David G. Reed, Tony Tamasi, David Wyatt, Franck R. Diard, Brad Simeral
Parallel array architecture for a graphics processor

Patent number: 8730249

Abstract: A parallel array architecture for a graphics processor includes a multithreaded core array including a plurality of processing clusters, each processing cluster including at least one processing core operable to execute a pixel shader program that generates pixel data from coverage data; a rasterizer configured to generate coverage data for each of a plurality of pixels; and pixel distribution logic configured to deliver the coverage data from the rasterizer to one of the processing clusters in the multithreaded core array. A crossbar coupled to each of the processing clusters is configured to deliver pixel data from the processing clusters to a frame buffer having a plurality of partitions.

Type: Grant

Filed: October 7, 2011

Date of Patent: May 20, 2014

Assignee: NVIDIA Corporation

Inventors: John M. Danskin, John S. Montrym, John Erik Lindholm, Steven E. Molnar, Mark French
BOOT DISPLAY DEVICE DETECTION AND SELECTION TECHNIQUES IN MULTI-GPU DEVICES

Publication number: 20140132612

Abstract: Techniques for selecting a boot display device in the multi-GPU configured computing device include a graphic initialization routine for determining a topology of a plurality of GPUs. It is then determined if a display is coupled to any of the plurality of GPUs. The determination of whether the display is coupled to a GPU is communicated to the other of the plurality of GPUs based upon the determined topology. Thereafter, selection of a given GPU as a primary boot device, by a system initialization routine, is influenced by representing each GPU not coupled to the display as a graphics device and the GPUs coupled to a given display as the primary boot device if one or more displays are coupled to GPUs, and by representing the given GPU as the primary boot device and all other GPUs as graphics devices when the display is not coupled to any of the GPUs.

Type: Application

Filed: April 19, 2013

Publication date: May 15, 2014

Applicant: NVIDIA Corporation

Inventor: NVIDIA Corporation
Automated Latency Management And Cross-Communication Exchange Conversion

Publication number: 20140125683

Abstract: A system and method for communication in a parallel computing system is applied to a system having multiple processing units, each processing unit including processor(s), memory, and a network interface, where the network interface is adapted to support virtual connections. The memory has at least a portion of a parallel processing application program and a parallel processing operating system. The system has a network fabric between processing units. The method involves identifying need for communication by the first processing unit with a group of processing units, creating virtual connections between the processing units, and transferring data between the first processing units.

Type: Application

Filed: January 14, 2014

Publication date: May 8, 2014

Applicant: Massively Parallel Technologies, Inc.

Inventor: Kevin D. Howard
METHOD OF DYNAMIC LOAD-BALANCING WITHIN A PC-BASED COMPUTING SYSTEM EMPLOYING A MULTIPLE GPU-BASED GRAPHICS PIPELINE ARCHITECTURE SUPPORTING MULTIPLE MODES OF GPU PARALLELIZATION

Publication number: 20140125682

Abstract: A hub mechanism for use in a multiple graphics processing unit (GPU) system includes a hub routing unit positioned on a bus between a controller unit and multiple GPUs. The hub mechanism is used for routing data and commands over a graphic pipeline between a user interface and one or more display units. The hub mechanism also includes a hub driver for issuing commands for controlling the hub routing unit.

Type: Application

Filed: January 13, 2014

Publication date: May 8, 2014

Applicant: Lucidlogix Software Solutions, Ltd.

Inventors: Reuven BAKALASH, Offir REMEZ, Gigy BAR-OR, Efi FOGEL, Amir SHAHAM
METHOD AND APPARATUS FOR ENABLING PARALLEL PROCESSING OF PIXELS IN AN IMAGE

Publication number: 20140125681

Abstract: A method, non-transitory computer readable medium, and apparatus for enabling parallel processing of pixels in an image are disclosed. For example, the method performs, via a multiple core processor, a one-dimensional error diffusion on the pixels in the image to reduce a number of bits per pixel to a value lower than an initial number of bits per pixel and greater than one, and performs a two-dimensional error diffusion on the pixels in the image that have undergone the one-dimensional error diffusion, to reduce the number of bits per pixel to one bit per pixel.

Type: Application

Filed: November 6, 2012

Publication date: May 8, 2014

Applicant: Xerox Corporation

Inventor: Xing Li
Method and system for automatically analyzing GPU test results

Patent number: 8717370

Abstract: A method and system for automatically analyzing graphics processing unit (“GPU”) test results are disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of identifying the GPU test results associated with a first register type, creating a template document associated with the same first register type, wherein the template document is pre-configured to store and operate on the GPU test results of the first register type, filling the GPU test results in the template document, aggregating the GPU test results associated with the first register type to establish a common output, and determining a suitable register value from a passing range of register values based on the common output without human intervention.

Type: Grant

Filed: November 30, 2007

Date of Patent: May 6, 2014

Assignee: Nvidia Corporation

Inventor: James Chen
BARRIER COMMANDS IN A CACHE TILING ARCHITECTURE

Publication number: 20140118362

Abstract: One embodiment of the present invention includes a graphics subsystem. The graphics subsystem includes a first processing entity and a second processing entity. Both the first processing entity and the second processing entity are configured to receive first and second batches of primitives, and a barrier command in between the first and second batches of primitives. The barrier command may be either a tiled or a non-tiled barrier command. A tiled barrier command is transmitted through the graphics subsystem for each cache tile. A non-tiled barrier command is transmitted through the graphics subsystem only once. The barrier command causes work that is after the barrier command to stop at a barrier point until a release signal is received. The back-end unit transmits a release signal to both processing entities after the first batch of primitives has been processed by both the first processing entity and the second processing entity.

Type: Application

Filed: July 3, 2013

Publication date: May 1, 2014

Inventors: Ziyad S. HAKURA, Dale L. KIRKLAND
MANAGING DEFERRED CONTEXTS IN A CACHE TILING ARCHITECTURE

Publication number: 20140118363

Abstract: A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command.

Type: Application

Filed: October 1, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Jeffrey A. BOLZ, Amanpreet GREWAL, Matthew JOHNSON, Andrei KHODAKOVSKY
DISTRIBUTED TILED CACHING

Publication number: 20140118364

Abstract: One embodiment of the present invention sets forth a graphics subsystem configured to implement distributed cache tiling. The graphics subsystem includes one or more world-space pipelines, one or more screen-space pipelines, one or more tiling units, and a crossbar unit. Each world-space pipeline is implemented in a different processing entity and is coupled to a different tiling unit. Each screen-space pipeline is implemented in a different processing entity and is coupled to the crossbar unit. The tiling units are configured to receive primitives from the world-space pipelines, generate cache tile batches based on the primitives, and transmit the primitives to the screen-space pipelines. One advantage of the disclosed approach is that primitives are processed in application-programming-interface order in a highly parallel tiling architecture. Another advantage is that primitives are processed in cache tile order, which reduces memory bandwidth consumption and improves cache memory utilization.

Type: Application

Filed: October 18, 2013

Publication date: May 1, 2014

Applicant: NVIDIA CORPORATION

Inventors: Ziyad S. HAKURA, Cynthia Ann Edgeworth ALLISON, Dale L. KIRKLAND, Walter R. STEINER

prev … 2 3 4 5 6 7 8 9 10 … next