Array Processor Patents (Class 712/10)
  • Patent number: 9043802
    Abstract: Embodiments provide various techniques for dynamic adjustment of a number of threads for execution in any domain based on domain utilizations. In a multiprocessor system, the utilization for each domain is monitored. If a utilization of any of these domains changes, then the number of threads for each of the domains determined for execution may also be adjusted to adapt to the change.
    Type: Grant
    Filed: January 8, 2014
    Date of Patent: May 26, 2015
    Assignee: NetApp, Inc.
    Inventors: Gokul Nadathur, Manpreet Singh, Grace Ho
  • Patent number: 9037833
    Abstract: A High Performance Computing (HPC) node comprises a motherboard, a switch comprising eight or more ports integrated on the motherboard, and at least two processors operable to execute an HPC job, with each processor communicably coupled to the integrated switch and integrated on the motherboard.
    Type: Grant
    Filed: December 12, 2012
    Date of Patent: May 19, 2015
    Assignee: RAYTHEON COMPANY
    Inventors: James D. Ballew, Gary R. Early
  • Patent number: 9032407
    Abstract: In a multiprocessor system, in general, a processor assigned with a larger amount of tasks is apt to perform a larger amount of communication with other processors assigned with tasks, than a processor assigned with a smaller amount of tasks. Thus in order for each processor to be able to perform the routing process efficiently, tasks are assigned such that, when there are a first processor and a second processor, the number of processors each assigned with one or more tasks and directly connected with the second processor being smaller than the number of processors each assigned with one or more tasks and directly connected with the first processor, the amount of tasks assigned to the first processor is equal to or larger than the amount of tasks assigned to the second processor.
    Type: Grant
    Filed: May 20, 2010
    Date of Patent: May 12, 2015
    Assignee: Panasonic Intellectual Property Corporation of America
    Inventor: Masahiko Saito
  • Patent number: 9032077
    Abstract: Methods and apparatus for client-allocatable bandwidth pools are disclosed. A system includes a plurality of resources of a provider network and a resource manager. In response to a determination to accept a bandwidth pool creation request from a client for a resource group, where the resource group comprises a plurality of resources allocated to the client, the resource manager stores an indication of a total network traffic rate limit of the resource group. In response to a bandwidth allocation request from the client to allocate a specified portion of the total network traffic rate limit to a particular resource of the resource group, the resource manager initiates one or more configuration changes to allow network transmissions within one or more network links of the provider network accessible from the particular resource at a rate up to the specified portion.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: May 12, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Matthew D. Klein, Michael David Marr
  • Patent number: 9032185
    Abstract: A command engine for an active memory receives high level tasks from a host and generates corresponding sets of either DCU commands to a DRAM control unit or ACU commands to a processing array control unit. The DCU commands include memory addresses, which are also generated by the command engine, and the ACU command include instruction memory addresses corresponding to an address in an array control unit where processing array instructions are stored.
    Type: Grant
    Filed: May 23, 2012
    Date of Patent: May 12, 2015
    Assignee: Micron Technology, Inc.
    Inventor: Graham Kirsch
  • Patent number: 8997113
    Abstract: A computing platform may include heterogeneous processors (e.g., CPU and a GPU) to support sharing of virtual functions between such processors. In one embodiment, a CPU side vtable pointer used to access a shared object from the CPU 110 may be used to determine a GPU vtable if a GPU-side table exists. In other embodiment, a shared non-coherent region, which may not maintain data consistency, may be created within the shared virtual memory. The CPU and the GPU side data stored within the shared non-coherent region may have a same address as seen from the CPU and the GPU side. However, the contents of the CPU-side data may be different from that of GPU-side data as shared virtual memory may not maintain coherency during the run-time. In one embodiment, the vptr may be modified to point to the CPU vtable and GPU vtable stored in the shared virtual memory.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: March 31, 2015
    Assignee: Intel Corporation
    Inventors: Shoumeng Yan, Sai Luo, Xiaocheng Zhou, Ying Gao, Hu Chen, Bratin Saha
  • Patent number: 8966461
    Abstract: A medium, method, and apparatus are disclosed for eliding superfluous function invocations in a vector-processing environment. A compiler receives program code comprising a width-contingent invocation of a function. The compiler creates a width-specific executable version of the program code by determining a vector width of a target computer system and omitting the function from the width-specific executable if the vector width meets one or more criteria. For example, the compiler may omit the function call if the vector width is greater than a minimum size.
    Type: Grant
    Filed: September 29, 2011
    Date of Patent: February 24, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benedict R. Gaster, Lee W. Howes, Mark D. Hummel
  • Patent number: 8966222
    Abstract: Technologies pertaining to cluster-on-chip computing environments are described herein. More particularly, mechanisms for supporting message passing in such environments are described herein, where cluster-on-chip computing environments do not support hardware cache coherency.
    Type: Grant
    Filed: December 15, 2010
    Date of Patent: February 24, 2015
    Assignee: Microsoft Corporation
    Inventors: Alexey Pakhunov, Ajith Jayamohan, Suyash Sinha
  • Patent number: 8954721
    Abstract: Mechanisms, in a multi-chip data processing system, for performing a boot process for booting each of a plurality of processor chips of the multi-chip data processing system are provided. With these mechanisms, a multi-chip agnostic isolated boot phase operation is performed, in parallel, to perform an initial boot of each of the plurality of processor chips as if each of the processor chips were an only processor chip in the multi-chip data processing system. A multi-chip aware isolated boot phase operation of each of the processor chips is performed in parallel, where each of the processor chips has its own separately configured address space. In addition, a unified configuration phase operation is performed to select a master processor chip from the plurality of processor chips and configure other processor chips in the plurality of processor chips to operate as slave processor chips that are controlled by the master processor chip.
    Type: Grant
    Filed: December 8, 2011
    Date of Patent: February 10, 2015
    Assignee: International Business Machines Corporation
    Inventors: Eberhard Amann, Frank Haverkamp, Thomas Huth, Jan Kunigk
  • Patent number: 8952727
    Abstract: Systems and methods for clock generation and distribution are disclosed. Embodiments include arrangements of synchronization signals implemented using a mesh circuit. The mesh circuit is comprised of a plurality of null convention logic (NCL) gates organized into rings. Each ring shares at least one NCL gate with an adjacent ring. The rings are configured in such a way that each ring in the mesh operates synchronously with the other rings in the mesh.
    Type: Grant
    Filed: August 19, 2013
    Date of Patent: February 10, 2015
    Assignee: Wave Semiconductor, Inc.
    Inventors: Scott E Johnston, Karl Michael Fant
  • Patent number: 8949576
    Abstract: An apparatus for processing operations in an adaptive computing environment is provided. The adaptive computing environment including at least one processing node. A node includes a memory configured to receive and store data. The data is received from a programmable interconnection network and stored. The node also includes an execution unit configured to perform a signal processing operation. The operation is performed using data retrieved from the memory and an output result is generated. The output result may be used for further computations or sent directly to the programmable interconnection network for transfer to another processing node in the adaptive computing environment.
    Type: Grant
    Filed: February 13, 2003
    Date of Patent: February 3, 2015
    Assignee: NVIDIA Corporation
    Inventor: Eugene B. Hogenauer
  • Patent number: 8935510
    Abstract: For flexibly setting up an execution environment according to contents of processing to be executed while taking stability or a security level into consideration, the multiple processor system includes the execution environment main control unit 10 which determines CPU assignment at the time of deciding CPU assignment, the execution environment sub control unit 20 which controls starting, stopping and switching of an execution environment according to an instruction from the execution environment main control unit 10 to synchronize with the execution environment main control unit 10, and the execution environment management unit 30 which receives input of management information or reference refusal information of shared resources for each CPU 4 or each execution environment 100 to separate the execution environment main control unit 10 from the execution environment sub control units 20a through 20n, or the execution environment sub control units 20a through 20n from each other.
    Type: Grant
    Filed: November 1, 2007
    Date of Patent: January 13, 2015
    Assignee: NEC Corporation
    Inventors: Hiroaki Inoue, Junji Sakai, Tsuyoshi Abe, Masato Edahiro
  • Patent number: 8904152
    Abstract: Efficient computation of complex multiplication results and very efficient fast Fourier transforms (FFTs) are provided. A parallel array VLIW digital signal processor is employed along with specialized complex multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs are used allowing the complex multiplication pipeline hardware to be efficiently used. In addition, efficient techniques for supporting combined multiply accumulate operations are described.
    Type: Grant
    Filed: May 26, 2011
    Date of Patent: December 2, 2014
    Assignee: Altera Corporation
    Inventors: Nikos P. Pitsianis, Gerald George Pechanek, Ricardo Rodriguez
  • Patent number: 8897293
    Abstract: In a media access control (MAC) processor, a programmable controller is configured to execute machine readable instructions for implementing MAC functions corresponding to data received by a communication device. A tightly coupled memory is associated with the programmable controller. A system memory is coupled to the programmable controller via a system bus, and a hardware processor is coupled to the system bus and the tightly coupled memory. The hardware processor is configured to implement MAC functions on data received in a communication frame, store, in the tightly coupled memory, processed data corresponding to data in the communication frame that indicates a structure of downlink data in the communication frame, and store, in the system memory, processed data corresponding to other data in the communication frame.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: November 25, 2014
    Assignee: Marvell International Ltd.
    Inventors: Bhaskar Chowdhuri, Srikanth Shubhakoti, Vinod Ananth, Hongyu Xie, Shui Cheong Lee
  • Patent number: 8892624
    Abstract: A cooperative data stream processing system is provided that utilizes a plurality of independent, autonomous and possibly heterogeneous sites in a cooperative arrangement to process user-defined job requests over dynamic, continuous streams of data. A method is provided to organize the distributed sites into a plurality of virtual organizations that can be further combined and virtualized into virtualized virtual organizations. These virtualized virtual organizations can also include additional distributed sites and existing virtualized virtual organizations and all members of a given virtualized virtual organization can share data and processing resources in order to process jobs on either a task-based or goal-based allocation mechanism. The virtualized virtual organization is created dynamically using ad-hoc collaborations among the members and is arranged in either a federated or cooperative architecture. Collaborations between members is either tightly-coupled or loosely coupled.
    Type: Grant
    Filed: May 11, 2007
    Date of Patent: November 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Michael J. Branson, Frederick Douglis, Bradley W. Fawcett, Zhen Liu, William Waller, Fan Ye
  • Patent number: 8880809
    Abstract: Embodiments are described for a method for controlling access to memory in a processor-based system comprising monitoring a number of interference events, such as bank contentions, bus contentions, row-buffer conflicts, and increased write-to-read turnaround time caused by a first core in the processor-based system that causes a delay in access to the memory by a second core in the processor-based system; deriving a control signal based on the number of interference events; and transmitting the control signal to one or more resources of the processor-based system to reduce the number of interference events from an original number of interference events.
    Type: Grant
    Filed: October 29, 2012
    Date of Patent: November 4, 2014
    Assignee: Advanced Micro Devices Inc.
    Inventors: Gabriel Loh, James O'Connor
  • Patent number: 8874837
    Abstract: An integrated circuit can include a programmable circuitry operable according to a first clock frequency and a block random access memory. The block random access memory can include a random access memory (RAM) element having at least one data port and a memory processor coupled to the data port of the RAM element and to the programmable circuitry. The memory processor can be operable according to a second clock frequency that is higher than the first clock frequency. Further, the memory processor can be hardwired and dedicated to perform operations in the RAM element of the block random access memory.
    Type: Grant
    Filed: November 8, 2011
    Date of Patent: October 28, 2014
    Assignee: Xilinx, Inc.
    Inventors: Christopher E. Neely, Gordon J. Brebner
  • Patent number: 8861434
    Abstract: A system for providing multi-cell support within a single SMP partition in a telecommunications network is disclosed. The typically includes a modem board and a multi-core processor having a plurality of processor cores, wherein the multi-core processor is configured to disable non-essential interrupts arriving on a plurality of data plane cores and route the non-essential interrupts to a plurality of control plane cores. Optionally, the multi-core processor may be configured so that all non-real-time threads and processes are bound to processor cores that are dedicated for all control plane activities and processor cores that are dedicated for all data plane activities will not host or run any threads that are not directly needed for data path implementation or Layer 2 processing.
    Type: Grant
    Filed: November 29, 2010
    Date of Patent: October 14, 2014
    Assignee: Alcatel Lucent
    Inventors: Mohammad R. Khawer, Mugur Abulius
  • Patent number: 8825986
    Abstract: A switch includes at least one input configured to receive data and at least two outputs configured to send data to at least two further switches in a network via at least two output links. Each output link has a known hop value. The switch further includes a direction determinator that determines a routing direction for the data from information identifying a relative location of the switch in the network and information identifying a destination of said data. A distributor within the switch processes the routing direction and direction information about each output link in order to select one of said at least two outputs for outputting said data. The selection that is made prioritizes output links for selection which have relatively higher known hop values.
    Type: Grant
    Filed: October 13, 2011
    Date of Patent: September 2, 2014
    Assignee: STMicroelectronics (Grenoble 2) SAS
    Inventors: Antonio-Marcello Coppola, Riccardo Locatelli, Jose Flich Cardo, Jose Cano Reyes, Jose Francisco Duato Marin
  • Patent number: 8825924
    Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. A plurality of read lines (18), write lines (20) and data lines (22) interconnect the computers (12). When one computer (12) sets a read line (18) high and the other computer sets a corresponding write line (20) then data is transferred on the data lines (22). When both the read line (18) and corresponding write line (20) go low this allows both communicating computers (12) to know that the communication is completed. An acknowledge line (72) goes high to restart the computers (12).
    Type: Grant
    Filed: March 4, 2011
    Date of Patent: September 2, 2014
    Assignee: Array Portfolio LLC
    Inventor: Charles H. Moore
  • Patent number: 8799914
    Abstract: Managing processes in a computing system comprising one or more cores includes generating an object in an operating system running on at least one core. A reference to the object is distributed to each of at least one and fewer than all of a plurality of processes to be executed on the at least one core. The operating system controls access to a resource such that processes to which the reference to the object was distributed have access to the resource and processes to which the reference to the object was not distributed do not have access to the resource.
    Type: Grant
    Filed: September 20, 2010
    Date of Patent: August 5, 2014
    Assignee: Tilera Corporation
    Inventor: Christopher D. Metcalf
  • Patent number: 8788441
    Abstract: An intelligent control system based on an explicit model of cognitive development (Table 1) performs high-level functions. It comprises up to O hierarchically stacked neural networks, Nm, . . . , Nm+(O?1), where m denotes the stage/order tasks performed in the first neural network, Nm, and O denotes the highest stage/order tasks performed in the highest-level neural network. The type of processing actions performed in a network, Nm, corresponds to the complexity for stage/order m. Thus N1 performs tasks at the level corresponding to stage/order 1. N5 processes information at the level corresponding to stage/order 5. Stacked neural networks begin and end at any stage/order, but information must be processed by each stage in ascending order sequence. Stages/orders cannot be skipped. Each neural network in a stack may use different architectures, interconnections, algorithms, and training methods, depending on the stage/order of the neural network and the type of intelligent control system implemented.
    Type: Grant
    Filed: November 3, 2009
    Date of Patent: July 22, 2014
    Inventors: Michael Lamport Commons, Mitzi Sturgeon White
  • Patent number: 8787255
    Abstract: A system for providing multi-cell support within a single SMP partition in a telecommunications network is disclosed. The typically includes a modem board and a multi-core processor having a plurality of processor cores, wherein the multi-core processor is configured to disable non-essential interrupts arriving on a plurality of data plane cores and route the non-essential interrupts to a plurality of control plane cores. Optionally, the multi-core processor may be configured so that all non-real-time threads and processes are bound to processor cores that are dedicated for all control plane activities and processor cores that are dedicated for all data plane activities will not host or run any threads that are not directly needed for data path implementation or Layer 2 processing.
    Type: Grant
    Filed: November 29, 2010
    Date of Patent: July 22, 2014
    Assignee: Alcatel Lucent
    Inventors: Mohammad R. Khawer, Mugur Abulius
  • Patent number: 8768642
    Abstract: The present invention systems and methods facilitate configuration of functional components included in a remotely located integrated circuit die. In one exemplary implementation, a die functional component reconfiguration request process is engaged in wherein a system requests a reconfiguration code from a remote centralized resource. A reconfiguration code production process is executed in which a request for a reconfiguration code and a permission indicator are received, validity of permission indicator is analyzed, and a reconfiguration code is provided if the permission indicator is valid. A die functional component configuration process is performed on the die when an appropriate reconfiguration code is received by the die. The functional component configuration process includes directing alteration of a functional component configuration. Workflow is diverted from disabled functional components to enabled functional components.
    Type: Grant
    Filed: December 18, 2003
    Date of Patent: July 1, 2014
    Assignee: Nvidia Corporation
    Inventors: Michael B. Diamond, John S. Montrym, James M. Van Dyke, Michael B. Nagy, Sean J. Treichler
  • Patent number: 8745604
    Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a processor, a switch including switching circuitry to forward data over data paths from other tiles to the processor and to switches of other tiles, and a switch memory that stores instruction streams that are able to operate independently for respective output ports of the switch.
    Type: Grant
    Filed: February 25, 2008
    Date of Patent: June 3, 2014
    Assignee: Massachusetts Institute of Technology
    Inventor: Anant Agarwal
  • Patent number: 8725961
    Abstract: Disclosed are methods and devices, among which is a method for configuring an electronic device. In one embodiment, an electronic device may include one or more memory locations having stored values representative of the capabilities of the device. According to an example configuration method, a configuring system may access the device capabilities from the one or more memory locations and configure the device based on the accessed device capabilities.
    Type: Grant
    Filed: March 20, 2012
    Date of Patent: May 13, 2014
    Assignee: Micron Technology Inc.
    Inventor: Harold B Noyes
  • Patent number: 8719839
    Abstract: A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU, for example. The GPU may be coupled to a GPU compiler and a GPU linker/loader and the CPU may be coupled to a CPU compiler and a CPU linker/loader. The user may create a shared object in an object oriented language and the shared object may include virtual functions. The shared object may be fine grain partitioned between the heterogeneous processors. The GPU compiler may allocate the shared object to the CPU and may create a first and a second enabling path to allow the GPU to invoke virtual functions of the shared object. Thus, the shared object that may include virtual functions may be shared seamlessly between the CPU and the GPU.
    Type: Grant
    Filed: October 30, 2009
    Date of Patent: May 6, 2014
    Assignee: Intel Corporation
    Inventors: Shoumeng Yan, Xiaocheng Zhou, Ying Gao, Mohan Rajagopalan, Rajiv Deodhar, David Putzolu, Clark Nelson, Milind Girkar, Robert Geva, Tiger Chen, Sai Luo, Stephen Junkins, Bratin Saha, Ravi Narayanaswamy, Patrick Xi
  • Patent number: 8676506
    Abstract: Methods and systems for identifying missing signage are described herein. The method includes generating a route from an origin to a destination, the route having a plurality of maneuvers. The method further includes receiving missing signage information from a first device, the missing signage information relating to one or more maneuvers of the plurality of maneuvers, and providing the missing signage information and at least one of the one or more related maneuvers to a second device.
    Type: Grant
    Filed: November 15, 2011
    Date of Patent: March 18, 2014
    Assignee: Google Inc.
    Inventor: Daniel M. LaLiberte
  • Patent number: 8661226
    Abstract: A system, method, and computer program product are provided for performing a scan operation on a sequence of single-bit values using a parallel processing architecture. In operation, a scan operation instruction is received. Additionally, in response to the scan operation instruction, a scan operation is performed on a sequence of single-bit values using a parallel processor architecture with a plurality of processing elements.
    Type: Grant
    Filed: November 15, 2007
    Date of Patent: February 25, 2014
    Assignee: NVIDIA Corporation
    Inventors: Michael J. Garland, Samuli M. Laine, Timo O. Aila, David Patrick Luebke
  • Patent number: 8656143
    Abstract: A serial array processor may have an execution unit, which is comprised of a multiplicity of single bit arithmetic logic units (ALUs), and which may perform parallel operations on a subset of all the words in memory by serially accessing and processing them, one bit at a time, while an instruction unit of the processor is pre-fetching the next instruction, a word at a time, in a manner orthogonal to the execution unit.
    Type: Grant
    Filed: February 3, 2010
    Date of Patent: February 18, 2014
    Inventor: Laurence H. Cooke
  • Patent number: 8656141
    Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a pipelined processor configured to process multiple streams of instructions for the processor; and a switch including switching circuitry to forward data over data paths from other tiles to one or more pipeline stages of the processor and to switches of other tiles. At least some of the data is forwarded based on one or more streams of instructions for the switch.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: February 18, 2014
    Assignee: Massachusetts Institute of Technology
    Inventor: Anant Agarwal
  • Patent number: 8650338
    Abstract: Fencing direct memory access (‘DMA’) data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two en
    Type: Grant
    Filed: March 4, 2013
    Date of Patent: February 11, 2014
    Assignee: International Business Machines Corporation
    Inventors: Michael A. Blocksome, Amith R. Mamidala
  • Patent number: 8639912
    Abstract: A data processor and a method for processing data is disclosed. The processor has an input port for receiving packets of data to be processed. A master controller acts to analyze the packets and to provide a header including a list of processes to perform on the packet of data and an ordering thereof. The master controller is programmed with process related data relating to the overall processing function of the processor. The header is appended to the packet of data. The packet with the appended header information is stored within a buffer. A buffer controller acts to determine for each packet stored within the buffer based on the header within the packet a next processor to process the packet. The controller then provides the packet to the determined processor for processing. The processed packet is returned with some indication that the processing is done. For example, the process may be deleted from the list of processes.
    Type: Grant
    Filed: November 16, 2009
    Date of Patent: January 28, 2014
    Assignee: Mosaid Technologies Incorporated
    Inventors: Arthur John Low, Stephen J. Davis
  • Patent number: 8638805
    Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: January 28, 2014
    Assignee: LSI Corporation
    Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
  • Patent number: 8631415
    Abstract: Embodiments provide various techniques for dynamic adjustment of a number of threads for execution in any domain based on domain utilizations. In a multiprocessor system, the utilization for each domain is monitored. If a utilization of any of these domains changes, then the number of threads for each of the domains determined for execution may also be adjusted to adapt to the change.
    Type: Grant
    Filed: August 25, 2009
    Date of Patent: January 14, 2014
    Assignee: NetApp, Inc.
    Inventors: Gokul Nadathur, Manpreet Singh, Grace Ho
  • Patent number: 8601176
    Abstract: Techniques for providing improved data distribution to and collection from multiple memories are described. Such memories are often associated with and local to processing elements (PEs) within an array processor. Improved data transfer control within a data processing system provides support for radix 2, 4 and 8 fast Fourier transform (FFT) algorithms through data reordering or bit-reversed addressing across multiple PEs, carried out concurrently with FFT computation on a digital signal processor (DSP) array by a DMA unit. Parallel data distribution and collection through forms of multicast and packet-gather operations are also supported.
    Type: Grant
    Filed: July 10, 2012
    Date of Patent: December 3, 2013
    Assignee: Altera Corporation
    Inventors: Edwin Franklin Barry, Nikos P. Pitsianis, Kevin Coopman
  • Patent number: 8572353
    Abstract: Communicating among cores in a computing system comprising a plurality of cores, each core comprising a processor and a switch, includes: routing a packet from an origin core to a destination core over a route including multiple cores; and at each core in the route before the destination core, routing the packet to the next core in the route according to a respective symbol in a sequence of multiple symbols. The respective symbol has a first symbol value indicating a single likely direction and the respective symbol has a second symbol value indicating multiple less likely directions.
    Type: Grant
    Filed: September 20, 2010
    Date of Patent: October 29, 2013
    Assignee: Tilera Corporation
    Inventors: Ian Rudolf Bratt, Carl G. Ramey, Matthew Mattina
  • Patent number: 8532288
    Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: September 10, 2013
    Assignee: International Business Machines Corporation
    Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
  • Patent number: 8527672
    Abstract: Fencing direct memory access (‘DMA’) data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two en
    Type: Grant
    Filed: November 5, 2010
    Date of Patent: September 3, 2013
    Assignee: International Business Machines Corporation
    Inventors: Michael A. Blocksome, Amith R. Mamidala
  • Patent number: 8521989
    Abstract: A microcontroller includes a program memory, data memory, central processing unit, at least one register module, a memory management unit, and a transport network. Instructions are executed in one clock cycle via an instruction word. The instruction word indicates the source module from which data is to be retrieved and the destination module to which data is to be stored. The address/data capability of an instruction word may be extended via a prefix module. If an operation is performed on the data, the source module or the destination module may perform the operation during the same clock cycle in which the data is transferred.
    Type: Grant
    Filed: May 18, 2006
    Date of Patent: August 27, 2013
    Assignee: Maxim Integrated Products, Inc.
    Inventors: Jeffrey Dean Owens, Edward Tang K. Ma, Don Loomis, Tom Chenot
  • Patent number: 8516222
    Abstract: An integrated circuit includes a plurality of processor core. Processing instructions in the integrated circuit includes: managing a plurality of sets of processor cores, each set including one or more processor cores assigned to a function associated with executing instructions; and reconfiguring the number of processor cores assigned to at least one of the sets during execution based on characteristics associated with executing the instructions.
    Type: Grant
    Filed: December 9, 2011
    Date of Patent: August 20, 2013
    Assignee: Massachusetts Institute of Technology
    Inventors: Anant Agarwal, David Wentzlaff
  • Patent number: 8516179
    Abstract: A processing system on an integrated circuit includes a group of processing cores. A group of dedicated random access memories are severally coupled to one of the group of processing cores or shared among the group. A star bus couples the group of processing cores and random access memories. Additional layer(s) of star bus may couple many such clusters to each other and to an off-chip environment.
    Type: Grant
    Filed: November 30, 2004
    Date of Patent: August 20, 2013
    Assignee: Digital RNA, LLC
    Inventor: Joel Henry Hinrichs
  • Patent number: 8516461
    Abstract: A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item.
    Type: Grant
    Filed: September 15, 2012
    Date of Patent: August 20, 2013
    Assignee: International Business Machines Corporation
    Inventors: Gregory Howard Bellows, Brian H. Horton, Joaquin Madruga, Barry L. Minor
  • Patent number: 8495345
    Abstract: A computing apparatus and method of handling an interrupt are provided. The computing apparatus includes a coarse-grained array, a host processor, and an interrupt supervisor. When an interrupt occurs in the coarse-grained array while performing a loop operation, the host processor processes the interrupt, and the interrupt supervisor may perform mode switching between the coarse-grained array and the host processor.
    Type: Grant
    Filed: December 16, 2009
    Date of Patent: July 23, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dong-hoon Yoo, Soo-jung Ryu, Yeon-gon Cho, Bernhard Egger, Il-hyun Park
  • Patent number: 8495604
    Abstract: A system provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The system comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: July 23, 2013
    Assignee: International Business Machines Corporation
    Inventors: Gregory H. Bellows, Brian H. Horton, Joaquin Madruga, Barry L. Minor
  • Patent number: 8489857
    Abstract: A parallel processing architecture comprising a cluster of embedded processors that share a common code distribution bus. Pages or blocks of code are concurrently loaded into respective program memories of some or all of these processors (typically all processors assigned to a particular task) over the code distribution bus, and are executed in parallel by these processors. A task control processor determines when all of the processors assigned to a particular task have finished executing the current code page, and then loads a new code page (e.g., the next sequential code page within a task) into the program memories of these processors for execution. The processors within the cluster preferably share a common memory (1 per cluster) that is used to receive data inputs from, and to provide data outputs to, a higher level processor. Multiple interconnected clusters may be integrated within a common integrated circuit device.
    Type: Grant
    Filed: November 5, 2010
    Date of Patent: July 16, 2013
    Assignee: Schism Electronics, L.L.C.
    Inventors: Richard F. Hobson, Bill Ressl, Allan R. Dyck
  • Patent number: 8484276
    Abstract: Techniques are disclosed for converting data into a format tailored for efficient multidimensional fast Fourier transforms (FFTS) on single instruction, multiple data (SIMD) multi-core processor architectures. The technique includes converting data from a multidimensional array stored in a conventional row-major order into SIMD format. Converted data in SIMD format consists of a sequence of blocks, where each block interleaves s rows such that SIMD vector processors may operate on s rows simultaneously. As a result, the converted data in SIMD format enables smaller-sized 1D FFTs to be optimized in SIMD multi-core processor architectures.
    Type: Grant
    Filed: March 18, 2009
    Date of Patent: July 9, 2013
    Assignee: International Business Machines Corporation
    Inventors: David G. Carlson, Travis M. Drucker, Timothy J. Mullins, Jeffrey S. McAllister, Nelson Ramirez
  • Patent number: 8464025
    Abstract: A signal processing apparatus able to raise a processing capability in processing accompanying access to a storing means is provided. Stream control units (SCU) 203—0 to 203—3 access data at an external memory system or local memories 204—0 to 204—3 according to a thread under control from a host processor. Processor units (PU) arrays 202—0 to 202—3 perform image processing by a different thread from the thread of the SCUs 203—0 to 203—3.
    Type: Grant
    Filed: May 22, 2006
    Date of Patent: June 11, 2013
    Assignee: Sony Corporation
    Inventors: Yuji Yamaguchi, Masatoshi Imai, Toshiharu Noda, Naosuke Asari, Tomoo Mitsunaga, Mitsuharu Ohki, Kazumasa Ito, Hidetoshi Nagano, Sumito Arakawa, Kei Ito
  • Patent number: 8453003
    Abstract: A communication method is provided to reduce an overhead of inter-processor synchronization for a communication phase in collective communication and to speed up the collective communication. Each of processors in a parallel computer start a previous process before a collective communication phase in which communications are performed at a same time among the processors through a inter-processor network. Each processor executes a synchronization command in advance at a time when a portion of the previous process for a predetermined time t is left. The inter-processor synchronization control section transmits a synchronization completion notice to each processor, if a synchronization condition is met. For the period, each processor executes the previous process in parallel. Then, the plurality of processors enter the collective communication phase.
    Type: Grant
    Filed: April 9, 2008
    Date of Patent: May 28, 2013
    Assignee: NEC Corporation
    Inventor: Yasushi Kanoh
  • Patent number: 8438512
    Abstract: Disclosed is an improved method and system for implementing parallelism for execution of electronic design automation (EDA) tools, such as layout processing tools. Examples of EDA layout processing tools are placement and routing tools. Efficient locking mechanism are described for facilitating parallel processing and to minimize blocking.
    Type: Grant
    Filed: August 30, 2011
    Date of Patent: May 7, 2013
    Assignee: Cadence Design Systems, Inc.
    Inventors: David Cross, Eric Nequist