Including Coprocessor Patents (Class 712/34)
  • Patent number: 8914618
    Abstract: In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 29, 2005
    Date of Patent: December 16, 2014
    Assignee: Intel Corporation
    Inventors: Hong Wang, John Shen, Hong Jiang, Richard Hankins, Per Hammarlund, Dion Rodgers, Gautham Chinya, Baiju Patel, Shiv Kaushik, Bryant Bigbee, Gad Sheaffer, Yoav Talgam, Yuval Yosef, James P. Held
  • Patent number: 8891757
    Abstract: A cryptographic integrated circuit including a programmable main processor for executing cryptographic functions, an internal memory, and a data transmission bus to which the main processor and the internal memory are electrically connected. The cryptographic integrated circuit also includes a programmable arithmetic coprocessor that has specific hardware arithmetic units each being designed to carry out a predetermined arithmetical operation. The programmable arithmetic coprocessor is separate from the main processor and is also electrically connected to the data transmission bus.
    Type: Grant
    Filed: February 17, 2012
    Date of Patent: November 18, 2014
    Assignee: Bull SAS
    Inventor: Patrick Le Quéré
  • Patent number: 8893126
    Abstract: A heterogeneous processing element model is provided where I/O devices look and act like processors. In order to be treated like a processor, an I/O processing element, or other special purpose processing element, must follow some rules and have some characteristics of a processor, such as address translation, security, interrupt handling, and exception processing, for example. The heterogeneous processing element model puts special purpose processing elements on the same playing field as processors, from a programming perspective, operating system perspective, and power perspective. The operating system can get work to a security engine, for example, in the same way it does to a processor.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: November 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Lakshminarayana B. Arimilli, Ravi K. Arimilli, Guy L. Guthrie, Charles F. Marino, William J. Starke
  • Publication number: 20140337677
    Abstract: A mechanism is provided for merging in a network processor results from a parser and results from an external coprocessor providing processing support requested by said parser. The mechanism enqueues in a result queue both parser results needing to be merged with a coprocessor result and parser results which have no need to be merged with a coprocessor result. An additional queue is used to enqueue the addresses of the result queue where the parser results are stored. The result from the coprocessor is received in a simple response register. The coprocessor result is read by the result queue management logic from the response register and merged to the corresponding incomplete parser result read in the result queue at the address enqueued in the additional queue.
    Type: Application
    Filed: May 10, 2013
    Publication date: November 13, 2014
    Applicant: International Business Machines Corporation
    Inventors: Claude Basso, Jean L. Calvignac, Chih-jen Chang, Philippe Damon, Natarajan Vaidhyanathan, Fabrice J. Verplanken, Colin B. Verrilli
  • Patent number: 8880850
    Abstract: One embodiment of the present includes a heterogeneous, high-performance, scalable processor having at least one W-type sub-processor capable of processing W bits in parallel, W being an integer value, at least one N-type sub-processor capable of processing N bits in parallel, N being an integer value smaller than W by a factor of two. The processor further includes a shared bus coupling the at least one W-type sub-processor and at least one N-type sub-processor and memory shared coupled to the at least one W-type sub-processor and the at least one N-type sub-processor, wherein the W-type sub-processor rearranges memory to accommodate execution of applications allowing for fast operations.
    Type: Grant
    Filed: February 25, 2013
    Date of Patent: November 4, 2014
    Assignee: Icelero Inc
    Inventors: Amit Ramchandran, John Reid Hauser
  • Patent number: 8874805
    Abstract: A mechanism is provided for offloading an input/output (I/O) completion operation. Responsive to a second processor identifying that a flag has been set by a first processor requesting assistance in completing an I/O operation, the second processor copies an I/O response from a first I/O response data structure associated with the first processor to a second I/O response data structure associated with the second processor. The second processor deletes the I/O response from the first I/O response data structure, clears the flag, and processes the I/O operation by addressing the I/O response in the second I/O response data structure. Responsive to completing the I/O operation, the second processor deletes the I/O response from the second I/O response data structure.
    Type: Grant
    Filed: November 19, 2012
    Date of Patent: October 28, 2014
    Assignee: International Business Machines Corporation
    Inventors: Bruce G. Mealey, Greg R. Mewhinney, Mysore S. Srinivas, Suresh E. Warrier
  • Patent number: 8868848
    Abstract: A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit (GPU) and a shared virtual memory supported by a physical private memory space of at least one heterogeneous processor or a physical shared memory shared by the heterogeneous processor. The CPU (producer) may create shared multi-version data and store such shared multi-version data in the physical private memory space or the physical shared memory. The GPU (consumer) may acquire or access the shared multi-version data.
    Type: Grant
    Filed: December 21, 2009
    Date of Patent: October 21, 2014
    Assignee: Intel Corporation
    Inventors: Ying Gao, Hu Chen, Shoumeng Yan, Xiaocheng Zhou, Sai Luo, Bratin Saha
  • Patent number: 8843728
    Abstract: In one embodiment, the present invention includes a method for communicating an assertion signal from a first instruction sequencer to a plurality of accelerators coupled to the first instruction sequencer, detecting the assertion signal in the accelerators and communicating a request for a lock, and registering an accelerator that achieves the lock by communication of a registration message for the accelerator to the first instruction sequencer. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 20, 2012
    Date of Patent: September 23, 2014
    Assignee: Intel Corporation
    Inventors: Perry Wang, Jamison Collins, Hong Wang
  • Patent number: 8843673
    Abstract: A mechanism is provided for offloading an input/output (I/O) completion operation. Responsive to a second processor identifying that a flag has been set by a first processor requesting assistance in completing an I/O operation, the second processor copies an I/O response from a first I/O response data structure associated with the first processor to a second I/O response data structure associated with the second processor. The second processor deletes the I/O response from the first I/O response data structure, clears the flag, and processes the I/O operation by addressing the I/O response in the second I/O response data structure. Responsive to completing the I/O operation, the second processor deletes the I/O response from the second I/O response data structure.
    Type: Grant
    Filed: February 20, 2013
    Date of Patent: September 23, 2014
    Assignee: International Business Machines Corporation
    Inventors: Bruce G. Mealey, Greg R. Mewhinney, Mysore S. Srinivas, Suresh E. Warrier
  • Patent number: 8839256
    Abstract: A novel and useful system and method of improving the utilization of a special purpose accelerator in a system incorporating a general purpose processor. In some embodiments, the current queue status of the special purpose accelerator is periodically monitored using a background monitoring process/thread and the current queue status is stored in a shared memory. A shim redirection layer added a priori to a library function task determines at runtime and in user space whether to execute the library function task on the special purpose accelerator or the general purpose processor. At runtime, using the shim redirection layer and based on the current queue status, it is determined whether to execute the library function task on the special purpose accelerator or on the general purpose processor.
    Type: Grant
    Filed: June 9, 2010
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Heather D. Achilles, Giora Biran, Amit Golander, Nancy A. Greco
  • Publication number: 20140244975
    Abstract: A multi-core processor includes M cores. If the multi-core processor is operated under a non-multiprocessing support operating system, only a single core is configured as a central processing unit and N cores are configured as co-processors, wherein M and N are positive integers, and N is smaller than M.
    Type: Application
    Filed: April 12, 2013
    Publication date: August 28, 2014
    Applicant: RDC Semiconductor Co., Ltd.
    Inventors: Chang-Cheng Yap, Ming-Chi Shih
  • Patent number: 8803891
    Abstract: Embodiments described herein provide a method of arbitrating a processing resource. The method includes receiving a command to preempt a task and preventing additional wavefronts associated with the task from being processed. The method also includes evicting currently executing wavefronts associated with the task from being processed based upon predetermined criteria.
    Type: Grant
    Filed: November 30, 2011
    Date of Patent: August 12, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Robert Scott Hartog, Ralph Clay Taylor, Michael Mantor, Sebastien Nussbaum, Rex McCrary, Mark Leather, Philip J. Rogers, Thomas R. Woller
  • Patent number: 8780120
    Abstract: Techniques for GPU self throttling are described. In one or more embodiments, timing information for GPU frame processing is obtained using a timeline for the GPU. This may occur by inserting callbacks into the GPU processing timeline. An elapsed time for unpredictable work that is inserted into the GPU workload is determined based on the obtained timing information. A decision is then made regarding whether to “throttle” designated optional/non-critical portions of the work for a frame based on the amount of elapsed time. In one approach the elapsed time is compared to a configurable timing threshold. If the elapsed time exceeds the threshold, work is throttled by performing light or no processing for one or more optional portions of a frame. If the elapsed time is less than the threshold, heavy processing (e.g., “normal” work) is performed for the frame.
    Type: Grant
    Filed: October 19, 2011
    Date of Patent: July 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Nicholas P. Sagall, Christopher J. Tector, Orest B. Zborowski
  • Patent number: 8761188
    Abstract: In the provided architecture, one or more multi-threaded processors may be combined with hardware blocks. The resulting combination allows for data packets to undergo a processing sequence having the flexibility of software programmability with the high-performance of dedicated hardware. For example, a multi-threaded processor can control the high-level tasks of a processing sequence, while the computationally intensive events (e.g., signal processing filters, matrix operations, etc.) are handled by dedicated hardware blocks.
    Type: Grant
    Filed: April 30, 2008
    Date of Patent: June 24, 2014
    Assignee: Altera Corporation
    Inventors: Anargyros Krikelis, Martin Roberts
  • Patent number: 8756402
    Abstract: A processing module, a processor circuit, an instruction set for processing data, and a method for synchronizing the processing of codes are provided. In an embodiment of the invention, a processing module for processing instructions, the instructions relating to user data and control data according to a communication protocol. The processing module includes a first processing circuit configured to process the instructions relating to the control data, and a second processing circuit configured to process the instructions relating to the user data.
    Type: Grant
    Filed: September 14, 2007
    Date of Patent: June 17, 2014
    Assignee: Intel Mobile Communications GmbH
    Inventors: Mario Steinert, Werner Hein, Ralf Itjeshorst
  • Patent number: 8751774
    Abstract: A system and method for controlling messaging between a first processor and a second processor is disclosed. The second processor controls one or more peripheral devices on behalf of a plurality of predetermined tasks being executed by the first processor. The system includes a message control module that receives an input message intended for the second processor from the first processor and maintains a message history based on the received input message and previously received input messages. The message history indicates which peripheral devices of the system are to be on and which tasks of the plurality of tasks requested the peripheral devices to be on. The message control module is further configured to generate an output message that includes output instructions for the second processor based on the message history and an output duration based on the message history. The second processor executes the output instructions.
    Type: Grant
    Filed: March 31, 2011
    Date of Patent: June 10, 2014
    Assignees: DENSO International America, Inc., Denso Corporation
    Inventors: Wan-ping Yang, Koji Shinoda, Hiroaki Shibata
  • Patent number: 8731071
    Abstract: A system for performing finite input response filtering. The system includes an array of random access memories (RAMs) for storing at least one two-dimensional (2D) block of pixel data. The pixel data is stored such that one of each type of column or row from the 2D block of pixel data is stored per RAM. A control block provides address translation between the 2D block of pixel data and corresponding addresses in the array of RAMs. An input crossbar writes pixel data to the array of RAMs as directed by the control block. An output crossbar simultaneously reads pixel data from each of the array of RAMs and passes the data to an appropriate replicated data path, as directed by the control block. A single instruction multiple data path block includes a plurality of replicated data paths for simultaneously performing the FIR filtering, as directed by the control block.
    Type: Grant
    Filed: December 15, 2005
    Date of Patent: May 20, 2014
    Assignee: Nvidia Corporation
    Inventor: Scott A. Kimura
  • Patent number: 8698817
    Abstract: A video processor for executing video processing operations. The video processor includes a host interface for implementing communication between the video processor and a host CPU. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A scalar execution unit is coupled to the host interface and the memory interface and is configured to execute scalar video processing operations. A vector execution unit is coupled to the host interface and the memory interface and is configured to execute vector video processing operations.
    Type: Grant
    Filed: November 4, 2005
    Date of Patent: April 15, 2014
    Assignee: Nvidia Corporation
    Inventors: Shirish Gadre, Ashish Karandikar, Stephen D. Lew, Christopher T. Cheng
  • Patent number: 8667254
    Abstract: In one embodiment, a network device is disclosed. For example, in one embodiment of the present invention, the device comprises a processor and a core memory having a receive buffer and a transmit buffer. The device comprises a bus coupled to the processor and the core memory. The device comprises at least one co-processor coupled to the core memory via a direct link, wherein the at least one co-processor is capable of accessing at least one of: the receive buffer, or the transmit buffer, without assistance from the processor.
    Type: Grant
    Filed: May 15, 2008
    Date of Patent: March 4, 2014
    Assignee: Xilinx, Inc.
    Inventors: Carl F. Rohrer, Patrick J. Smith, Stacey Secatch
  • Patent number: 8610725
    Abstract: Among other things, dynamically selecting or configuring one or more hardware resources to render a particular display data includes obtaining a request for rendering display data. The request includes a specification describing a desired rendering process. Based on the specification and the display data, hardware is selected or configured. The display data is rendered using the selected or configured hardware.
    Type: Grant
    Filed: December 14, 2007
    Date of Patent: December 17, 2013
    Assignee: Apple Inc.
    Inventors: Jeremy Todd Sandmel, John Stuart Harper, Kenneth Christian Dyke
  • Patent number: 8578133
    Abstract: Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer.
    Type: Grant
    Filed: October 31, 2012
    Date of Patent: November 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Gary R. Ricard, Brian E. Smith
  • Patent number: 8578132
    Abstract: Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer.
    Type: Grant
    Filed: March 29, 2010
    Date of Patent: November 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Gary R. Ricard, Brian E. Smith
  • Patent number: 8564803
    Abstract: An image forming apparatus includes: plural processing units which execute plural processing functions that are different from each other; an execution-in-progress information acquiring unit which acquires execution-in-progress function information that is information about a first processing unit which is executing processing, of the plural processing units; a discrimination unit which discriminates a second processing unit that cannot execute processing when the first processing unit indicated by the execution-in-progress function information acquired by the execution-in-progress information acquiring unit is executing processing, from among the plural processing units; and an executability information generating unit which generates inexecutable function information that is information about the second processing unit, based on a result of determination by the discrimination unit.
    Type: Grant
    Filed: March 23, 2011
    Date of Patent: October 22, 2013
    Assignees: Kabushiki Kaisha Toshiba, Toshiba Tec Kabushiki Kaisha
    Inventor: Kanako Asari
  • Patent number: 8555251
    Abstract: A signal processing apparatus for performing signal processing including a plurality of steps in data units by software signal processing includes signal processing modules performing the steps, a circuit configuration information storing and managing unit storing the signal processing modules and circuit configuration information, a signal processing order determining unit determining a signal processing order by performing path routing, a signal processing executing unit executing the signal processing in the determined order, and a circuit configuration changing unit changing circuit configuration information and causing the signal processing order determining unit to re-execute path routing to determine a signal processing order for the changed circuit configuration information during a period from the end of the software signal processing in the data unit to the beginning of the subsequent data unit.
    Type: Grant
    Filed: March 21, 2006
    Date of Patent: October 8, 2013
    Assignee: Sony Corporation
    Inventor: Kosei Yamashita
  • Publication number: 20130238878
    Abstract: One embodiment of the present includes a heterogeneous, high-performance, scalable processor having at least one W-type sub-processor capable of processing W bits in parallel, W being an integer value, at least one N-type sub-processor capable of processing N bits in parallel, N being an integer value smaller than W by a factor of two. The processor further includes a shared bus coupling the at least one W-type sub-processor and at least one N-type sub-processor and memory shared coupled to the at least one W-type sub-processor and the at least one N-type sub-processor, wherein the W-type sub-processor rearranges memory to accommodate execution of applications allowing for fast operations.
    Type: Application
    Filed: February 25, 2013
    Publication date: September 12, 2013
    Applicant: ICELERO INC
    Inventors: Amit Ramchandran, John Reid Hauser
  • Patent number: 8489861
    Abstract: A processing architecture includes a first CPU core portion coupled to a second embedded dynamic random access memory (DRAM) portion. These architectural components jointly implement a single processor and instruction set. Advantageously, the embedded logic on the DRAM chip implements the memory intensive processing tasks, thus reducing the amount of traffic that needs to be bussed back and forth between the CPU core and the embedded DRAM chips. The embedded DRAM logic monitors and manipulates the instruction stream into the CPU core. The architecture of the instruction set, data paths, addressing, control, caching, and interfaces are developed to allow the system to operate using a standard programming model. Specialized video and graphics processing systems are developed. Also, an extended very long instruction word (VLIW) architecture implemented as a primary VLIW processor coupled to an embedded DRAM VLIW extension processor efficiently deals with memory intensive tasks.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: July 16, 2013
    Assignee: Round Rock Research, LLC
    Inventor: Eric M. Dowling
  • Patent number: 8489860
    Abstract: A wireless data platform comprises a plurality of processors. Channels of communication are set up between processors such that they may communicate information as tasks are performed. A dynamic cross compiler executed on one processor compiles code into native processing code for another processor. A dynamic cross linker links the compiled code for other processor. Native code may also be downloaded to the platform through use of a JAVA Bean (or other language type) which encapsulates the native code. The JAVA Bean can be encrypted and digitally signed for security purposes.
    Type: Grant
    Filed: December 22, 1997
    Date of Patent: July 16, 2013
    Assignee: Texas Instruments Incorporated
    Inventors: Michael McMahon, Marion C. Lineberry, Matthew A. Woolsey, Gerard Chauvel
  • Patent number: 8473717
    Abstract: A data processing apparatus is provided, configured to carry out data processing operations on behalf of a main data processing apparatus, comprising a coprocessor core configured to perform the data processing operations and a reset controller configured to cause the coprocessor core to reset. The coprocessor core performs its data processing in dependence on current configuration data stored therein, the current configuration data being associated with a current processing session. The reset controller is configured to receive pending configuration data from the main data processing apparatus, the pending configuration data associated with a pending processing session, and to store the pending configuration data in a configuration data queue. The reset controller is configured, when the coprocessor core resets, to transfer the pending configuration data from the configuration data queue to be stored in the coprocessor core, replacing the current configuration data.
    Type: Grant
    Filed: February 3, 2010
    Date of Patent: June 25, 2013
    Assignee: ARM Limited
    Inventors: Ola Hugosson, Erik Persson, Pontus Borg
  • Patent number: 8464025
    Abstract: A signal processing apparatus able to raise a processing capability in processing accompanying access to a storing means is provided. Stream control units (SCU) 203—0 to 203—3 access data at an external memory system or local memories 204—0 to 204—3 according to a thread under control from a host processor. Processor units (PU) arrays 202—0 to 202—3 perform image processing by a different thread from the thread of the SCUs 203—0 to 203—3.
    Type: Grant
    Filed: May 22, 2006
    Date of Patent: June 11, 2013
    Assignee: Sony Corporation
    Inventors: Yuji Yamaguchi, Masatoshi Imai, Toshiharu Noda, Naosuke Asari, Tomoo Mitsunaga, Mitsuharu Ohki, Kazumasa Ito, Hidetoshi Nagano, Sumito Arakawa, Kei Ito
  • Publication number: 20130138921
    Abstract: A de-coupled co-processor interface (CPIF) is provided. The de-coupled CPIF transfers endian information along with the dispatching of co-processor (COP) instructions. The de-coupled CPIF divides the status report provided by a COP into an early status report and a late status report. The de-coupled CPIF may disable the late status report in order to improve the performance. The de-coupled CPIF further provides multiple early flush interfaces (EFIs) to transfer early flush events from a main processor (MP) to a corresponding COP. As a result, the de-coupled CPIF can improve the performance of the processing of data endian, status reports and early flush events between an MP and a COP.
    Type: Application
    Filed: November 28, 2011
    Publication date: May 30, 2013
    Applicant: ANDES TECHNOLOGY CORPORATION
    Inventors: Yuan-Yuan Shih, Chuan-Hua Chang, Chi-Chang Lai
  • Patent number: 8442927
    Abstract: A coprocessor and method for processing convolutional neural networks includes a configurable input switch coupled to an input. A plurality of convolver elements are enabled in accordance with the input switch. An output switch is configured to receive outputs from the set of convolver elements to provide data to output branches. A controller is configured to provide control signals to the input switch and the output switch such that the set of convolver elements are rendered active and a number of output branches are selected for a given cycle in accordance with the control signals.
    Type: Grant
    Filed: February 1, 2010
    Date of Patent: May 14, 2013
    Assignee: NEC Laboratories America, Inc.
    Inventors: Srimat Chakradhar, Murugan Sankaradas, Venkata S. Jakkula, Srihari Cadambi
  • Patent number: 8429383
    Abstract: A system comprises a first processor, a second processor coupled to the first processor, memory coupled to, and shared by, the first and second processors, and a synchronization unit coupled to the first and second processors. The second processor preferably comprises stack storage that resides in the core of the second processor. Further, the second processor executes stack-based instructions while the first processor executes one or more tasks including, for example, managing the memory via an operating system that executes only on the first processor. Associated methods are also disclosed.
    Type: Grant
    Filed: July 31, 2003
    Date of Patent: April 23, 2013
    Assignee: Texas Instruments Incorporated
    Inventors: Gerard Chauvel, Serge Lasserre, Maija Kuusela, Dominique D'Inverno
  • Patent number: 8386751
    Abstract: One embodiment of the present includes a heterogenous, high-performance, scalable processor having at least one W-type sub-processor capable of processing W bits in parallel, W being an integer value, at least one N-type sub-processor capable of processing N bits in parallel, N being an integer value smaller than W by a factor of two. The processor further includes a shared bus coupling the at least one W-type sub-processor and at least one N-type sub-processor and memory shared coupled to the at least one W-type sub-processor and the at least one N-type sub-processor, wherein the W-type sub-processor rearranges memory to accommodate execution of applications allowing for fast operations.
    Type: Grant
    Filed: May 18, 2010
    Date of Patent: February 26, 2013
    Assignee: Icelero LLC
    Inventors: Amit Ramchandran, John Reid Hauser, Jr.
  • Patent number: 8359462
    Abstract: In one embodiment the present invention includes a method and apparatus for enabling a main core and one or more co-processors to operate in a de-coupled mode, thereby facilitating the execution of two or more instruction threads in parallel. A co-processor, according to an embodiment of the invention, has a coupling manager including a loop buffer for storing instructions which can be independently fetched and executed by the co-processor when operating in de-coupled mode. In addition, the coupling manager includes a loop descriptor and a counter/condition descriptor. The loop descriptor and condition descriptor work in conjunction with one another to determine what, if any, action should be taken when a co-processor is in a particular processing state, for example, as indicated by a counter keeping track of loop processing.
    Type: Grant
    Filed: November 21, 2008
    Date of Patent: January 22, 2013
    Assignee: Marvell International Ltd.
    Inventors: Moinul H. Khan, Mark N. Fullerton, Arthur R. Miller, Anitha Kona
  • Publication number: 20130013892
    Abstract: A hierarchical multi-core processor includes a core group for each hierarchy of a hierarchy group constituting a series of communication functions divided according to communication protocol, where a first core group of a given hierarchy among the hierarchy group is connected to a second core group of another hierarchy constituting a first communication function to be executed following a second communication function of the given hierarchy.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 10, 2013
    Applicant: FUJITSU LIMITED
    Inventors: Koichiro Yamashita, Hiromasa Yamauchi, Kiyoshi Miyazaki, Takahisa Suzuki, Koji Kurihara
  • Patent number: 8307194
    Abstract: A method and apparatus to provide specifiable ordering between and among vector and scalar operations within a single streaming processor (SSP) via a local synchronization (Lsync) instruction that operates within a relaxed memory consistency model. Various aspects of that relaxed memory consistency model are described. Further, a combined memory synchronization and barrier synchronization (Msync) for a multistreaming processor (MSP) system is described. Also, a global synchronization (Gsync) instruction provides synchronization even outside a single MSP system is described. Advantageously, the pipeline or queue of pending memory requests does not need to be drained before the synchronization operation, nor is it required to refrain from determining addresses for and inserting subsequent memory accesses into the pipeline.
    Type: Grant
    Filed: August 18, 2003
    Date of Patent: November 6, 2012
    Assignee: Cray Inc.
    Inventors: Steven L. Scott, Gregory J. Faanes, Brick Stephenson, William T. Moore, Jr., James R. Kohn
  • Publication number: 20120254587
    Abstract: An apparatus and method of submitting hardware accelerator engine commands over an interconnect link such as a PCI Express (PCIe) link. In one embodiment, the mechanism is implemented inside a PCIe Host Bridge which is integrated into a host IC or chipset. The mechanism provides an interface compatible with other integrated accelerators thereby eliminating the overhead of maintaining different programming models for local and remote accelerators. Co-processor requests issued by threads requesting a service (client threads) targeting remote accelerator are queued and sent to a PCIe adapter and remote accelerator engine over a PCIe link. The remote accelerator engine performs the requested processing task, delivers results back to host memory and the PCIe Host Bridge performs co-processor request completion sequence (status update, write to flag, interrupt) include in the co-processor command.
    Type: Application
    Filed: March 31, 2011
    Publication date: October 4, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Giora Biran, Ilya Granovsky
  • Patent number: 8281315
    Abstract: Exemplary embodiments include a system and storage medium for managing computer processing functions in a multi-processor computer environment. The system includes a physical processor, a standard logical processor, an assist logical processor sharing a same logical partition as the standard logical processor, and a single operating system instance associated with the logical partition, the single operating system instance including a switch-to service and a switch-from service. The system also includes a dispatch component managed by the single operating system instance. Upon invoking the switch-to service by standard code, the switch-to service checks to see if an assist logical processor is online and, if so, it updates an integrated assist field of a work element block associated with the task for indicating the task is eligible to be executed on the assist logical processor. The switch-to service also assigns a work queue to the work element block.
    Type: Grant
    Filed: April 3, 2008
    Date of Patent: October 2, 2012
    Assignee: International Business Machines Corporation
    Inventors: Donald F. Ault, Jose R. Castano, Jeffrey P. Kubala, Robert J. Maddison, Bernard R. Pierce, Gary S. Puchkoff, Peter J. Relson, Robert R. Rogers, Donald W. Schmidt, Leslie W. Wyman
  • Patent number: 8250341
    Abstract: A pipeline accelerator includes a bus and a plurality of pipeline units, each unit coupled to the bus and including at least one respective hardwired-pipeline circuit. By including a plurality of pipeline units in the pipeline accelerator, one can increase the accelerator's data-processing performance as compared to a single-pipeline-unit accelerator. Furthermore, by designing the pipeline units so that they communicate via a common bus, one can alter the number of pipeline units, and thus alter the configuration and functionality of the accelerator, by merely coupling or uncoupling pipeline units to or from the bus. This eliminates the need to design or redesign the pipeline-unit interfaces each time one alters one of the pipeline units or alters the number of pipeline units within the accelerator.
    Type: Grant
    Filed: May 2, 2008
    Date of Patent: August 21, 2012
    Assignee: Lockheed Martin Corporation
    Inventors: Kenneth R Schulz, John W Rapp, Larry Jackson, Mark Jones, Troy Cherasaro
  • Patent number: 8250578
    Abstract: A method of pipelining hardware accelerators of a computing system includes associating hardware addresses to at least one processing unit (PU) or at least one logical partition (LPAR) of the computing system, receiving a work request for an associated hardware accelerator address, and queuing the work request for a hardware accelerator using the associated hardware accelerator address.
    Type: Grant
    Filed: February 22, 2008
    Date of Patent: August 21, 2012
    Assignee: International Business Machines Corporation
    Inventors: Rajaram B. Krishnamurthy, Thomas A. Gregg
  • Publication number: 20120159462
    Abstract: Techniques are described that enable restoring interrupted program execution from a checkpoint without the need for cooperation from the computer's operating system. These techniques can be implemented by modifying existing code using an automated tool that adds instructions for enabling restoring interrupted program execution.
    Type: Application
    Filed: December 20, 2010
    Publication date: June 21, 2012
    Applicant: Microsoft Corporation
    Inventors: Stephen Leibman, Jonathon Michael Stall, Parry Jones Reginald Husbands
  • Patent number: 8205066
    Abstract: A co-processor is provided that comprises one or more application engines that can be dynamically configured to a desired personality. For instance, the application engines may be dynamically configured to any of a plurality of different vector processing instruction sets, such as a single-precision vector processing instruction set and a double-precision vector processing instruction set. The co-processor further comprises a common infrastructure that is common across all of the different personalities, such as an instruction decode infrastructure, memory management infrastructure, system interface infrastructure, and/or scalar processing unit (that has a base set of instructions). Thus, the personality of the co-processor can be dynamically modified (by reconfiguring one or more application engines of the co-processor), while the common infrastructure of the co-processor remains consistent across the various personalities.
    Type: Grant
    Filed: October 31, 2008
    Date of Patent: June 19, 2012
    Assignee: Convey Computer
    Inventors: Tony Brewer, Steven J. Wallach
  • Patent number: 8205097
    Abstract: A Microprocessor (1) in a security-sensitive computing system for processing an operand according to an instruction is for improving its security provided with a modulo-based check hardware (2) to perform operations in parallel to the microprocessor (1) and for comparing both results regarding congruence.
    Type: Grant
    Filed: May 9, 2008
    Date of Patent: June 19, 2012
    Assignee: NXP B.V.
    Inventors: Ralf Malzahn, Li Tao
  • Publication number: 20120144158
    Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
    Type: Application
    Filed: February 9, 2012
    Publication date: June 7, 2012
    Inventors: Matthew N. Papakipos, Brian K. Grant, Christopher G. Demetriou, Morgan S. McGuire
  • Patent number: 8190858
    Abstract: There is disclosed an interface device for interfacing between a main processor and one or more processing engines. The interface device is configurable, so that it may be used with a wide range of processing engines without being redesigned.
    Type: Grant
    Filed: February 25, 2003
    Date of Patent: May 29, 2012
    Assignee: Topside Research, LLC
    Inventors: Yaxin Shui, Phil Terry, Kevin Robertson, Quang Hong, Bao K. Vuong
  • Patent number: 8145822
    Abstract: One aspect relates to a computer system including a first data processing unit, a second data processing unit and a data transmission/memory device. The data transmission/memory can transmit sets of data from the first data processing unit to the second data processing unit. The data transmission/memory device includes a first memory region and a second memory region.
    Type: Grant
    Filed: March 10, 2005
    Date of Patent: March 27, 2012
    Assignee: Infineon Technologies AG
    Inventors: Ulrich Hachmann, Christian Sauer
  • Patent number: 8145882
    Abstract: A system implemented in hardware includes a main processing core decoding instructions for out of order execution. The instructions include template based user defined instructions. A user execution block executes the template based user defined instructions. An interface is positioned between the main processing core and the user execution block. A computer readable medium includes executable instructions to describe a processing core supporting execution of a proprietary instruction set and decoding of customized instructions that adhere to a specified pattern. The specified pattern includes a source, a destination and a latency period. A user execution block is connected to the processing core to execute the customized instructions.
    Type: Grant
    Filed: May 25, 2006
    Date of Patent: March 27, 2012
    Assignee: MIPS Technologies, Inc.
    Inventors: Karagada Ramarao Kishore, Gideon Intrater, Xing Xu Jiang, Maria Ukanwa
  • Patent number: 8131904
    Abstract: A processing module, interface, and information handling system are disclosed. According to an aspect, a processing module can include a plurality of components coupled to a circuit card operable to be coupled to a host processing system. The processing module can also include a processing module interface configured to be coupled to a host interface of the host processing system. According to an aspect, the processing module interface can include a plurality of contacts operable to couple a plurality of signals configured to be coupled between the host processing and the circuit card to enable or disable use of resources of the circuit card during a reduced operating state of the host processor.
    Type: Grant
    Filed: August 8, 2008
    Date of Patent: March 6, 2012
    Assignee: Dell Products, LP
    Inventors: James R. Utz, Andrew T. Sultenfuss
  • Patent number: 8125487
    Abstract: A game console system capable of parallelizing the operation of multiple graphics processing units (GPUs) supported on game console board, using a graphics hub device, and a multi-mode parallel graphics rendering subsystem supporting multiple modes of parallel operation and having software and hardware implemented components. The game console system includes (i) CPU memory space for storing one or more graphics-based applications, (ii) one or more CPUs for executing the graphics-based applications, (iii) a plurality of graphic processing pipelines (GPPLs), implemented using the GPUs, and (iv) an automatic mode control module. During the run-time of the graphics-based application, the automatic mode control module automatically controls the mode of parallel operation of the multi-mode parallel graphics rendering subsystem so that the GPUs are driven in a parallelized manner.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: February 28, 2012
    Assignee: Lucid Information Technology, Ltd
    Inventors: Reuven Bakalash, Yaniv Leviathan
  • Patent number: RE45097
    Abstract: An input/output processor for speeding the input/output and memory access operations for a processor is presented. The key idea of an input/output processor is to functionally divide input/output and memory access operations tasks into a compute intensive part that is handled by the processor and an I/O or memory intensive part that is then handled by the input/output processor. An input/output processor is designed by analyzing common input/output and memory access patterns and implementing methods tailored to efficiently handle those commonly occurring patterns. One technique that an input/output processor may use is to divide memory tasks into high frequency or high-availability components and low frequency or low-availability components.
    Type: Grant
    Filed: February 2, 2012
    Date of Patent: August 26, 2014
    Assignee: Cisco Technology, Inc.
    Inventors: Sundar Iyer, Nick McKeown