Parallel Processors (e.g., Identical Processors) Patents (Class 345/505)
  • Patent number: 8847959
    Abstract: A new hardware architecture defines an indexing and encoding method for accelerating incoherent ray traversal. Accelerating multiple ray traversal may be accomplished by organizing the rays for minimal movement of data, hiding latency due to external memory access, and performing adaptive binning. Rays may be binned into coarse grain and fine grain spatial bins, independent of direction.
    Type: Grant
    Filed: February 13, 2014
    Date of Patent: September 30, 2014
    Assignee: Raycast Systems, Inc.
    Inventor: Alvin D. Zimmerman
  • Patent number: 8842122
    Abstract: Aspects of the disclosure relate to a method of controlling a graphics processing unit. In an example, the method includes receiving one or more tasks from a host processor, and scheduling, independently from the host processor, the one or more tasks to be selectively executed by a shader processor and one or more fixed function hardware units, wherein the shader processor is configured to execute a plurality of instructions in parallel, and the one or more fixed function hardware units are configured to render graphics data.
    Type: Grant
    Filed: December 15, 2011
    Date of Patent: September 23, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Petri Olavi Nordlund, Jukka-Pekka Arvo, Robert J. Simpson
  • Patent number: 8842117
    Abstract: A new hardware architecture defines an indexing and encoding method for accelerating incoherent ray traversal. Accelerating multiple ray traversal may be accomplished by organizing the rays for minimal movement of data, hiding latency due to external memory access, and performing adaptive binning. Rays may be binned into coarse grain and fine grain spatial bins, independent of direction.
    Type: Grant
    Filed: February 13, 2014
    Date of Patent: September 23, 2014
    Assignee: Raycast Systems, Inc.
    Inventor: Alvin D. Zimmerman
  • Patent number: 8842121
    Abstract: A single instruction multiple data (SIMD) processor with a given width may operate on registers of the same width completely filled with fragments. A parallel set of registers are loaded and tested. The fragments that fail are eliminated and the register set is refilled from the parallel set.
    Type: Grant
    Filed: February 3, 2011
    Date of Patent: September 23, 2014
    Assignee: Intel Corporation
    Inventors: Tomas Akenine-Möller, Jon N. Hasselgren, Carl J. Munkberg, Robert M. Toth, Franz P. Clarberg
  • Patent number: 8842133
    Abstract: Embodiments enable a graphics processor to more efficiently process graphics and compositing processing commands. In certain embodiments, a client application submits client graphics commands to a graphics driver. The client in certain embodiments can notify a window server that client graphics commands have been submitted. In response, the window server can generate compositing processing commands and provide these commands to the graphics driver. Advantageously, a graphics processor can execute the client graphics commands while the window server generates compositing processing commands. As a result, processing resource can be used more efficiently.
    Type: Grant
    Filed: June 26, 2013
    Date of Patent: September 23, 2014
    Assignee: Apple Inc.
    Inventors: John Harper, Kenneth C. Dyke
  • Patent number: 8836708
    Abstract: A device for processing a data stream originating from a device generating matrices of Nl rows by Nc columns of data includes K computation tiles and interconnection means for transferring the data stream between the computation tiles. At least one computation tile includes: one or more control units to provide instructions, n processing units, each processing unit carrying out the instructions received from a control unit on a neighborhood of Vl rows by Vc columns of data, a storage unit to place the data of the stream in the form of neighborhoods of Vl rows by (n+Vc?1) columns of data. The storage unit includes a block of shaping memories of dimension Vl×Nc and a block of neighborhood registers of dimension Vl×(n+Vc?1), an input/output unit to convey the data stream between the interconnection means and the storage unit on the one hand, and between the processing units and the interconnection means on the other hand.
    Type: Grant
    Filed: June 8, 2009
    Date of Patent: September 16, 2014
    Assignee: Commissariat a l'Energie Atomique et aux Energies Alternatives
    Inventors: Laurent Letellier, Mathieu Thevenin
  • Publication number: 20140253565
    Abstract: System on chip comprising a general purpose processing element, a graphics processing unit and a display interface, supporting graphics visualization on mobile computing devices and on embedded systems.
    Type: Application
    Filed: May 19, 2014
    Publication date: September 11, 2014
    Inventor: Reuven Bakalash
  • Patent number: 8830268
    Abstract: A display system and method for displaying an image on a non-planar display that allows the images to be mapped by image mappers while encompassing image data of an adjacent sub-image or sub-images. This allows a single unified image to be displayed in real time without any tearing or positional/angular artifacts at the image boundaries.
    Type: Grant
    Filed: November 7, 2008
    Date of Patent: September 9, 2014
    Assignee: Barco NV
    Inventors: Robert M. Clodfelter, Jeff Bayer, Paul McHale, Brad Smith
  • Patent number: 8830506
    Abstract: An image processing system includes intermediate-data generating apparatuses and one or more drawing-data generating apparatuses. The intermediate-data generating apparatuses interpret data of pages forming PDL document data, the pages being assigned to the corresponding intermediate-data generating apparatuses, to generate elements of intermediate data of the pages. The drawing-data generating apparatuses each obtain assigned elements of the intermediate data and each draw the obtained elements to generate drawing data including information concerning pixels forming each obtained element. The drawing-data generating apparatuses each include a memory that stores intermediate data or drawing data of a common element used in the obtained elements. If the intermediate data or the drawing data of the common element is stored in the memory, the drawing-data generating apparatuses generate drawing data of the obtained elements using the stored intermediate data or drawing data.
    Type: Grant
    Filed: August 4, 2011
    Date of Patent: September 9, 2014
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Michio Hayakawa
  • Patent number: 8824010
    Abstract: To realize effective load distribution and improve the performance in image formation processing, an image processing apparatus includes a first image processing unit configured to perform image processing on a drawing area, a second image processing unit configured to be differentiated from the first image processing unit, a load analysis unit configured to analyze a composition processing load of an object in the drawing area, a rotational angle analysis unit configured to analyze a rotational angle of the object in the drawing area, and a load distribution determination unit configured to determine whether to distribute a part of image formation processing to be applied on the drawing area from the first image processing unit to the second image processing unit based on the analyzed composition processing load of the object and the analyzed rotational angle of the object.
    Type: Grant
    Filed: October 23, 2012
    Date of Patent: September 2, 2014
    Assignee: Canon Kabushiki Kaisha
    Inventor: Hiroshi Mori
  • Publication number: 20140240327
    Abstract: A heterogeneous computing system includes a central processing unit (CPU) and a graphics processing unit (GPU). The CPU and the GPU are synchronized using a data-based synchronization scheme, wherein offloading of a kernel from the CPU to the GPU is coordinated based upon the data associated with the kernel transferred between the CPU and the GPU. By using a data-based synchronization scheme, additional synchronization operations between the CPU and the GPU are reduced or eliminated, and the overhead of offloading a process from the CPU to the GPU is reduced.
    Type: Application
    Filed: February 22, 2013
    Publication date: August 28, 2014
    Applicant: THE TRUSTEES OF PRINCETON UNIVERSITY
    Inventor: THE TRUSTEES OF PRINCETON UNIVERSITY
  • Patent number: 8817031
    Abstract: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.
    Type: Grant
    Filed: September 29, 2010
    Date of Patent: August 26, 2014
    Assignee: NVIDIA Corporation
    Inventors: Ziyad S. Hakura, Rohit Gupta, Michael C. Shebanow, Emmett M. Kilgariff
  • Patent number: 8817030
    Abstract: Graphics processing units (GPUs) deployed in general purpose GPU (GPGPU) units are combined into a GPGPU cluster. Access to the GPGPU cluster is then offered as a service to users who can use their own computers to communicate with the GPGPU cluster. The users develop applications to be run on the cluster and a profiling module tracks the applications' resource utilization and can report it to the user and to a subscription server. The user can examine the report to thereby optimize the application or the cluster's configuration. The subscription server can interpret the report to thereby invoice the user or otherwise govern the users' access to the cluster.
    Type: Grant
    Filed: September 30, 2010
    Date of Patent: August 26, 2014
    Assignee: CreativeC LLC
    Inventors: Greg Scantlen, Gary Scantlen
  • Patent number: 8803892
    Abstract: Methods, apparatuses and systems directed to hosting, on a computer system, a plurality of application instances, each application instance corresponding to a remote client application; maintaining a network connection to each of the remote client applications for which an application instance is hosted; allocating resources of a graphics processing unit of the computer system between at least two of the remote client applications; concurrently rendering, utilizing the resources of the graphics processing unit of the computer system, the graphical output of the application instances corresponding to the at least two of the remote client applications; and transmitting the rendered graphical output to the at least two of the remote client applications over the respective network connections.
    Type: Grant
    Filed: June 10, 2010
    Date of Patent: August 12, 2014
    Assignee: Otoy, Inc.
    Inventor: Julian Michael Urbach
  • Patent number: 8803893
    Abstract: An image data processing apparatus includes: a plurality of operational processing circuits each of which is configured to have a variable circuit configuration and to execute operational processing on image data; and a control section that controls each of the operational processing circuits such that each of the operational processing circuits executes one of a plurality of types of operational processing performed on image data in a predetermined order. The control section controls each of the operational processing circuits so that when image data to be newly given to one of the operational processing circuits is interrupted, said one of the operational processing circuits and another one of the operational processing circuits execute operational processing by taking partial charge of the operational processing.
    Type: Grant
    Filed: March 8, 2010
    Date of Patent: August 12, 2014
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Makoto Shimamura, Susumu Kimura
  • Patent number: 8803891
    Abstract: Embodiments described herein provide a method of arbitrating a processing resource. The method includes receiving a command to preempt a task and preventing additional wavefronts associated with the task from being processed. The method also includes evicting currently executing wavefronts associated with the task from being processed based upon predetermined criteria.
    Type: Grant
    Filed: November 30, 2011
    Date of Patent: August 12, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller
  • Patent number: 8797337
    Abstract: One embodiment provides a system that facilitates the execution of a web application. During operation, the system loads a native code module that includes a scenegraph renderer into a secure runtime environment. Next, the system uses the scenegraph renderer to create a scenegraph from a graphics model associated with the web application and generate a set of rendering commands from the scenegraph. The system then writes the rendering commands to a command buffer and reads the rendering commands from the command buffer. Finally, the system uses the rendering commands to render, for the web application, an image corresponding to the graphics model by executing the rendering commands using a graphics-processing unit (GPU).
    Type: Grant
    Filed: July 2, 2009
    Date of Patent: August 5, 2014
    Assignee: Google Inc.
    Inventors: Antoine Labour, Matthew Papakipos
  • Patent number: 8797334
    Abstract: The disclosed embodiments provide a system that facilitates seamlessly switching between graphics-processing units (GPUs) to drive a display. In one embodiment, the system receives a request to switch from using a first GPU to using a second GPU to drive the display. In response to this request, the system uses a kernel thread which operates in the background to configure the second GPU to prepare the second GPU to drive the display. While the kernel thread is configuring the second GPU, the system continues to drive the display with the first GPU and a user thread continues to execute a window manager which performs operations associated with servicing user requests. When configuration of the second GPU is complete, the system switches the signal source for the display from the first GPU to the second GPU.
    Type: Grant
    Filed: January 6, 2010
    Date of Patent: August 5, 2014
    Assignee: Apple Inc.
    Inventors: Thomas W. Costa, Simon M. Douglas, David J. Redman
  • Patent number: 8792108
    Abstract: An image processing apparatus includes an image-processing designating unit that allows a user to designate predetermined image processing to be applied to image data for generating a preview image that represents a state of an output image before image output; a preview-image generating unit that generates a preview image in accordance with the designated image processing; a preview-image display unit that displays the preview image generated by the preview-image generating unit; and a display-mode switching control unit that, when the preview image is displayed, switches to a display mode with an enhanced viewability relative to a power-saving display state in accordance with a content of the designated image processing.
    Type: Grant
    Filed: July 29, 2011
    Date of Patent: July 29, 2014
    Assignee: Ricoh Company, Limited
    Inventor: Tomoyuki Yoshida
  • Patent number: 8786616
    Abstract: Parallel processing for distance transforms is described. In an embodiment a raster scan algorithm is used to compute a distance transform such that each image element of a distance image is assigned a distance value. This distance value is a shortest distance from the image element to the seed region. In an embodiment two threads execute in parallel with a first thread carrying out a forward raster scan over the distance image and a second thread carrying out a backward raster scan over the image. In an example, a thread pauses when a cross-over condition is met until the other thread meets the condition after which both threads continue. In embodiments distances may be computed in Euclidean space or along geodesics defined on a surface. In an example, four threads execute two passes in parallel with each thread carrying out a raster scan over a different quarter of the image.
    Type: Grant
    Filed: December 11, 2009
    Date of Patent: July 22, 2014
    Assignee: Microsoft Corporation
    Inventors: Toby Sharp, Antonio Criminisi
  • Patent number: 8786614
    Abstract: In a single-instruction-multiple-data (SIMD) processor having multiple lanes, and local memory dedicated to each lane, a method of processing an image is disclosed. The method comprises mapping consecutive rasters of the image to consecutive lanes such that groups of consecutive rasters form image strips, and vertical stacks of strips comprise strip columns. Local memory allocates memory to the image strips. A sequence of functions is processed for execution on the SIMD processor in a pipeline implementation, such that the pipeline loops over portions of the image in multiple iterations, and intermediate data processed during the functions is stored in the local memory. Data associated with the image is traversed by first processing image strips from top to bottom in a left-most strip column, then progressing to each adjacent unprocessed strip column.
    Type: Grant
    Filed: May 2, 2013
    Date of Patent: July 22, 2014
    Assignee: Calos Fund Limited Liability Company
    Inventors: Donald James Curry, Ujval J. Kapasi
  • Patent number: 8786617
    Abstract: A method of carrying out random number generation processing uses a GPU including a plurality of blocks each including at least one core, the random number generation processing including update processing of updating state vectors and conversion processing of converting the updated state vectors into random numbers having another distribution. The method includes carrying out, by one of the plurality of blocks, the update processing (S3), and carrying out, by the plurality of blocks, the conversion processing in parallel based on results of the update processing (S9). Therefore, it is possible to more efficiently generate a random number sequence which is the same as the one obtained through random number generation processing performed in a serial manner, by parallelizing a single random number generator in a GPU.
    Type: Grant
    Filed: March 2, 2011
    Date of Patent: July 22, 2014
    Assignee: Mizuho-DL Financial Technology Co. Ltd.
    Inventor: Tomohisa Yamakami
  • Publication number: 20140192065
    Abstract: System and method for a parallel image processing mechanism for applying mask data patterns to substrate in a lithography manufacturing process are disclosed. In one embodiment, the parallel image processing system includes a graphics engine configured to partition an object into a plurality of trapezoids and form an edge list for representing each of the plurality of trapezoids, and a distributor configured to receive the edge list from the graphics engine and distribute the edge list to a plurality of scan line image processing units. The system further includes a sentinel configured to synchronize operations of the plurality of scan line image processing units, and a plurality of buffers configured to store image data from corresponding scan line image processing units and outputs the stored image data using the sentinel.
    Type: Application
    Filed: March 10, 2014
    Publication date: July 10, 2014
    Applicant: PINEBROOK IMAGING, INC.
    Inventors: BARRY KEANE, THOMAS LAIDIG
  • Patent number: 8773469
    Abstract: A video multiviewer system may include a plurality of video scalers operating in parallel for generating initially scaled video streams by performing video scaling in at least one dimension on a plurality of video input streams. The video multiviewer system may also include at least one video cross-point switcher coupled downstream from the video scalers, and a processing unit coupled downstream from the video cross-point switcher for generating additionally scaled video streams by performing additional video scaling on the initially scaled video stream. The video scalers and the processing unit may communicate through the video cross-point switcher using a serial digital interface.
    Type: Grant
    Filed: April 9, 2008
    Date of Patent: July 8, 2014
    Assignee: Imagine Communications Corp.
    Inventors: Marcin Andrzej Komorowski, Cristian Camer, Anthony Singh
  • Patent number: 8773449
    Abstract: A circuit arrangement, program product and circuit arrangement render stereoscopic images in a multithreaded rendering software pipeline using first and second rendering channels respectively configured to render left and right views for the stereoscopic image. Separate transformations are applied to received vertex data to generate transformed vertex data for use by each of the first and second rendering channels in rendering the left and right views for the stereoscopic image.
    Type: Grant
    Filed: September 14, 2009
    Date of Patent: July 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: Russell Dean Hoover, Eric Oliver Mejdrich, Paul Emery Schardt, Robert Allen Shearer
  • Patent number: 8773446
    Abstract: What is disclosed is a novel system and method for parallel processing of intra-image data in a distributed computing environment. A generic architecture and method are presented which collectively facilitate image segmentation and block sorting and merging operations with a certain level of synchronization in a parallel image processing environment which has been traditionally difficult to parallelize. The present system and method enables pixel-level processing at higher speeds thus making it a viable service for a print/copy job document reproduction environment. The teachings hereof have been simulated on a cloud-based computing environment with a demonstrable increase of ?2× with nominal 8-way parallelism, and an increase of ?20×-100× on a graphics processor. In addition to production and office scenarios where intra-image processing are likely to be performed, these teachings are applicable to other domains where high-speed video and audio processing are desirable.
    Type: Grant
    Filed: February 9, 2011
    Date of Patent: July 8, 2014
    Assignee: Xerox Corporation
    Inventors: Shanmuga-Nathan Gnanasambandam, Lalit Keshav Mestha
  • Publication number: 20140184616
    Abstract: A system, process, and computer program product are provided for identifying a faulty processing unit. A shader program that configures a plurality of processing units to generate data is executed and the data is compared with verification data to produce a test result. The test result is examined to identify a faulty processing unit of the plurality of processing units, where a unique identifier corresponding to each processing unit is encoded into the data generated by the respective processing unit.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Apoorv Gupta, David William Crowe, Carl William Davies
  • Patent number: 8766989
    Abstract: The present invention provides a method and system for coordinating graphics processing units in a single computing system. A method is disclosed which allows for the construction of a list of shared display modes that may be employed by both of the graphics processing units to render an output in a display device. By creating the list of shared commonly supportable display modes, the output displayed in the display device may advantageously provide a consistent graphical experience persisting through the use of alternate graphics processing units in the system. One method builds a list of shared display modes by compiling a list from a GPU specific base mode list and dynamic display modes acquired from an attached display device. Another method provides the ability to generate graphical output configurations according to a user-selected display mode that persists when alternate graphics processing units in the system are used to generate graphical output.
    Type: Grant
    Filed: July 29, 2009
    Date of Patent: July 1, 2014
    Assignee: Nvidia Corporation
    Inventors: David Wyatt, Linda Glanville
  • Publication number: 20140176576
    Abstract: The invention provides a computer server with a graphical processer that can process data from multiple medical imaging systems simultaneously. Data sets can be provided by any suitable imaging system (x-ray, angiography, PET scans, MRI, IVUS, OCT, cath labs, etc.) and a processing system of the invention allocates resources in the form of a virtual machine, processing power, operating system, applications, etc., as-needed. Embodiments of the invention may find particular application with cath labs due to the particular processing requirements of typical cath lab systems.
    Type: Application
    Filed: December 16, 2013
    Publication date: June 26, 2014
    Applicant: VOLCANO CORPORATION
    Inventor: Jason Spencer
  • Publication number: 20140176575
    Abstract: A system, method, and computer program product are provided for tiled deferred shading. In operation, a plurality of photons associated with at least one scene are identified. Further, a plurality of screen-space tiles associated with the at least one scene are identified. Additionally, each of the plurality of screen-space tiles capable of being affected by a projection of an effect sphere for each of the plurality of photons are identified. Furthermore, at least a subset of photons associated with each of the screen-space tiles from which to compute shading are selected. Moreover, shading for the at least one scene is computed utilizing the selected at least a subset of photons.
    Type: Application
    Filed: August 30, 2013
    Publication date: June 26, 2014
    Applicant: NVIDIA Corporation
    Inventors: Morgan McGuire, Michael Thomas Mara, David Patrick Luebke, Jacopo Pantaleoni
  • Publication number: 20140176574
    Abstract: Novel method and system for distributed database ray-tracing is presented, based on modular mapping of scene-data among processors. Its inherent properties include scattering data among processors for improved load balancing, and matching between geographical proximity in the scene with communication proximity between processors. High utilization is enabled by unique mechanism of cache sharing. The resulting improved performance enables deep level of ray tracing for real time applications.
    Type: Application
    Filed: December 26, 2012
    Publication date: June 26, 2014
    Inventor: Reuven Bakalash
  • Patent number: 8760455
    Abstract: One embodiment of the present invention sets forth a technique for reducing overhead associated with transmitting primitive draw commands from memory to a graphics processing unit (GPU). Command pairs comprising an end draw command and a begin draw command associated with a conventional graphics application programming interface (API) are selectively replaced with a new construct. The new construct is a reset topology index, which implements a combined function of the end draw command and begin draw command. The new construct improves efficiency by reducing total data transmitted from memory to the GPU.
    Type: Grant
    Filed: October 4, 2010
    Date of Patent: June 24, 2014
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Thomas Roell, James C. Bowman
  • Publication number: 20140168230
    Abstract: An asynchronous computing and rendering system includes a data storage unit that provides storage for processing a large-scale data set organized in accordance to data subregions and a computing cluster containing a parallel plurality of asynchronous computing machines that provide compute results based on the data subregions. The asynchronous computing and rendering system also includes a rendering cluster containing a parallel multiplicity of asynchronous rendering machines coupled to the asynchronous computing machines, wherein each rendering machine renders a subset of the data subregions. Additionally, the asynchronous computing and rendering system includes a data interpretation platform coupled to the asynchronous rendering machines that provides user interaction and rendered viewing capabilities for the large-scale data set. An asynchronous computing and rendering method is also provided.
    Type: Application
    Filed: December 19, 2012
    Publication date: June 19, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Marc Nienhaus, Joerg Mensmann, Hitoshi Yamauchi
  • Publication number: 20140168228
    Abstract: Techniques are disclosed for tracing a ray within a parallel processing unit. A first thread receives a ray or a ray segment for tracing and identifies a first node within an acceleration structure associated with the ray, where the first node is associated with a volume of space traversed by the ray. The thread identifies the child nodes of the first node, where each child node is associated with a different sub-volume of space, and each sub-volume is associated with a corresponding ray segment. The thread determines that two or more nodes are associated with sub-volumes of space that intersect the ray segment. The thread selects one of these nodes for processing by the first thread and another for processing by a second thread. One advantage of the disclosed technique is that the threads in a thread group perform ray tracing more efficiently in that idle time is reduced.
    Type: Application
    Filed: December 13, 2012
    Publication date: June 19, 2014
    Applicant: NVIDIA Corporation
    Inventors: David LUEBKE, Timo AILA, Jacopo PANTALEONI, David TARJAN
  • Publication number: 20140168229
    Abstract: Embodiments described herein relate to improving throughput of a CPU and a GPU working in conjunction to render graphics. Time frames for executing CPU and GPU work units are synchronized with a refresh rate of a display. Pending CPU work is performed when a time frame starts (a vsync occurs). When a prior GPU work unit is still executing on the GPU, then a parallel mode is entered. In the parallel mode, some GPU work and some CPU work is performed concurrently. When the parallel mode is exited, for example when there is no CPU work to perform, the parallel mode may be exited.
    Type: Application
    Filed: December 14, 2012
    Publication date: June 19, 2014
    Applicant: Microsoft
    Inventor: Microsoft
  • Patent number: 8754893
    Abstract: A method and apparatus employing selectable hardware accelerators in a data driven architecture are described. In one embodiment, the apparatus includes a plurality of processing elements (PEs). A plurality of hardware accelerators are coupled to a selection unit. A register is coupled to the selection unit and the plurality of processing elements. In one embodiment, the register includes a plurality of general purpose registers (GPR), which are accessible by the plurality of processing elements, as well as the plurality of hardware accelerators. In one embodiment, at least one of the GPRs includes a bit to enable a processing element to enable access a selected hardware accelerator via the selection unit.
    Type: Grant
    Filed: October 19, 2011
    Date of Patent: June 17, 2014
    Assignee: Intel Corporation
    Inventors: Louis A. Lippincott, Patrick F. Johnson
  • Patent number: 8754897
    Abstract: A silicon chip of a monolithic construction for use in implementing a multiple core graphics processing and display subsystem in a computing system having a CPU, a system memory, an operating system (OS), a CPU bus, and a display device with a display surface. The computing system supports (i) one or more software applications for issuing graphics commands, (ii) one or more graphics libraries for storing data used to implement said graphics commands. The silicon chip comprises multiple graphic pipeline cores, a partial frame buffer for buffering pixels corresponding to image fragments, a routing center, control unit, and a display interface, for displaying composited images on the display surface of the computing system.
    Type: Grant
    Filed: November 15, 2010
    Date of Patent: June 17, 2014
    Assignee: Lucidlogix Software Solutions, Ltd.
    Inventors: Reuven Bakalash, Offir Remez, Efi Fogel
  • Patent number: 8754894
    Abstract: A multi-user computer network, in which graphics performance of client machines running graphics-based applications is optimized using an automated Internet-based graphics application profile management system. The automated Internet-based graphics application profile management system includes an Internet-based communication server, operably connected to the infrastructure of the Internet, and to a central database server, through an application server. The central database server stores graphic application profiles (GAPs) for different graphics-based applications that are capable of running on the client machines. The graphics application profiles are stored in a profile database in the multi-GPU graphics rendering subsystem of each client machine. The Internet-based communication server communicates with each client machine over the Internet, and automatically programs updated graphics application profiles (GAPs) in the profile database of each client machine.
    Type: Grant
    Filed: November 8, 2010
    Date of Patent: June 17, 2014
    Assignee: Lucidlogix Software Solutions, Ltd.
    Inventors: Reuven Bakalash, Yaniv Leviathan
  • Publication number: 20140160135
    Abstract: A processing architecture uses stationary operands and opcodes common on a plurality of processors. Only data moves through the processors. The same opcode and operand is used by each processor assigned to operate, for example, on one row of pixels, one row of numbers, or one row of points in space.
    Type: Application
    Filed: December 28, 2011
    Publication date: June 12, 2014
    Inventor: Scott A. Krig
  • Patent number: 8749563
    Abstract: A graphics processing system comprises at least one memory device storing a plurality of pixel command threads and a plurality of vertex command threads. An arbiter coupled to the at least one memory device is provided that selects a pixel command thread from the plurality of pixel command threads and a vertex command thread from the plurality of vertex command threads. The arbiter further selects a command thread from the previously selected pixel command thread and the vertex command thread, which command thread is provided to a command processing engine capable of processing pixel command threads and vertex command threads.
    Type: Grant
    Filed: March 18, 2013
    Date of Patent: June 10, 2014
    Assignee: ATI Technologies ULC
    Inventors: Laurent Lefebvre, Andrew Gruber, Stephen Morein
  • Patent number: 8736617
    Abstract: A method of displaying graphics data is described. The method involves accessing the graphics data in a memory subsystem associated with one graphics subsystem. The graphics data is transmitted to a second graphics subsystem, where it is displayed on a monitor coupled to the second graphics subsystem.
    Type: Grant
    Filed: August 4, 2008
    Date of Patent: May 27, 2014
    Assignee: Nvidia Corporation
    Inventors: Stephen Lew, Bruce R. Intihar, Abraham B. de Waal, David G. Reed, Tony Tamasi, David Wyatt, Franck R. Diard, Brad Simeral
  • Patent number: 8730249
    Abstract: A parallel array architecture for a graphics processor includes a multithreaded core array including a plurality of processing clusters, each processing cluster including at least one processing core operable to execute a pixel shader program that generates pixel data from coverage data; a rasterizer configured to generate coverage data for each of a plurality of pixels; and pixel distribution logic configured to deliver the coverage data from the rasterizer to one of the processing clusters in the multithreaded core array. A crossbar coupled to each of the processing clusters is configured to deliver pixel data from the processing clusters to a frame buffer having a plurality of partitions.
    Type: Grant
    Filed: October 7, 2011
    Date of Patent: May 20, 2014
    Assignee: NVIDIA Corporation
    Inventors: John M. Danskin, John S. Montrym, John Erik Lindholm, Steven E. Molnar, Mark French
  • Publication number: 20140132612
    Abstract: Techniques for selecting a boot display device in the multi-GPU configured computing device include a graphic initialization routine for determining a topology of a plurality of GPUs. It is then determined if a display is coupled to any of the plurality of GPUs. The determination of whether the display is coupled to a GPU is communicated to the other of the plurality of GPUs based upon the determined topology. Thereafter, selection of a given GPU as a primary boot device, by a system initialization routine, is influenced by representing each GPU not coupled to the display as a graphics device and the GPUs coupled to a given display as the primary boot device if one or more displays are coupled to GPUs, and by representing the given GPU as the primary boot device and all other GPUs as graphics devices when the display is not coupled to any of the GPUs.
    Type: Application
    Filed: April 19, 2013
    Publication date: May 15, 2014
    Applicant: NVIDIA Corporation
    Inventor: NVIDIA Corporation
  • Publication number: 20140125683
    Abstract: A system and method for communication in a parallel computing system is applied to a system having multiple processing units, each processing unit including processor(s), memory, and a network interface, where the network interface is adapted to support virtual connections. The memory has at least a portion of a parallel processing application program and a parallel processing operating system. The system has a network fabric between processing units. The method involves identifying need for communication by the first processing unit with a group of processing units, creating virtual connections between the processing units, and transferring data between the first processing units.
    Type: Application
    Filed: January 14, 2014
    Publication date: May 8, 2014
    Applicant: Massively Parallel Technologies, Inc.
    Inventor: Kevin D. Howard
  • Publication number: 20140125682
    Abstract: A hub mechanism for use in a multiple graphics processing unit (GPU) system includes a hub routing unit positioned on a bus between a controller unit and multiple GPUs. The hub mechanism is used for routing data and commands over a graphic pipeline between a user interface and one or more display units. The hub mechanism also includes a hub driver for issuing commands for controlling the hub routing unit.
    Type: Application
    Filed: January 13, 2014
    Publication date: May 8, 2014
    Applicant: Lucidlogix Software Solutions, Ltd.
    Inventors: Reuven BAKALASH, Offir REMEZ, Gigy BAR-OR, Efi FOGEL, Amir SHAHAM
  • Publication number: 20140125681
    Abstract: A method, non-transitory computer readable medium, and apparatus for enabling parallel processing of pixels in an image are disclosed. For example, the method performs, via a multiple core processor, a one-dimensional error diffusion on the pixels in the image to reduce a number of bits per pixel to a value lower than an initial number of bits per pixel and greater than one, and performs a two-dimensional error diffusion on the pixels in the image that have undergone the one-dimensional error diffusion, to reduce the number of bits per pixel to one bit per pixel.
    Type: Application
    Filed: November 6, 2012
    Publication date: May 8, 2014
    Applicant: Xerox Corporation
    Inventor: Xing Li
  • Patent number: 8717370
    Abstract: A method and system for automatically analyzing graphics processing unit (“GPU”) test results are disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of identifying the GPU test results associated with a first register type, creating a template document associated with the same first register type, wherein the template document is pre-configured to store and operate on the GPU test results of the first register type, filling the GPU test results in the template document, aggregating the GPU test results associated with the first register type to establish a common output, and determining a suitable register value from a passing range of register values based on the common output without human intervention.
    Type: Grant
    Filed: November 30, 2007
    Date of Patent: May 6, 2014
    Assignee: Nvidia Corporation
    Inventor: James Chen
  • Publication number: 20140118362
    Abstract: One embodiment of the present invention includes a graphics subsystem. The graphics subsystem includes a first processing entity and a second processing entity. Both the first processing entity and the second processing entity are configured to receive first and second batches of primitives, and a barrier command in between the first and second batches of primitives. The barrier command may be either a tiled or a non-tiled barrier command. A tiled barrier command is transmitted through the graphics subsystem for each cache tile. A non-tiled barrier command is transmitted through the graphics subsystem only once. The barrier command causes work that is after the barrier command to stop at a barrier point until a release signal is received. The back-end unit transmits a release signal to both processing entities after the first batch of primitives has been processed by both the first processing entity and the second processing entity.
    Type: Application
    Filed: July 3, 2013
    Publication date: May 1, 2014
    Inventors: Ziyad S. HAKURA, Dale L. KIRKLAND
  • Publication number: 20140118363
    Abstract: A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command.
    Type: Application
    Filed: October 1, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Jeffrey A. BOLZ, Amanpreet GREWAL, Matthew JOHNSON, Andrei KHODAKOVSKY
  • Publication number: 20140118364
    Abstract: One embodiment of the present invention sets forth a graphics subsystem configured to implement distributed cache tiling. The graphics subsystem includes one or more world-space pipelines, one or more screen-space pipelines, one or more tiling units, and a crossbar unit. Each world-space pipeline is implemented in a different processing entity and is coupled to a different tiling unit. Each screen-space pipeline is implemented in a different processing entity and is coupled to the crossbar unit. The tiling units are configured to receive primitives from the world-space pipelines, generate cache tile batches based on the primitives, and transmit the primitives to the screen-space pipelines. One advantage of the disclosed approach is that primitives are processed in application-programming-interface order in a highly parallel tiling architecture. Another advantage is that primitives are processed in cache tile order, which reduces memory bandwidth consumption and improves cache memory utilization.
    Type: Application
    Filed: October 18, 2013
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Ziyad S. HAKURA, Cynthia Ann Edgeworth ALLISON, Dale L. KIRKLAND, Walter R. STEINER