Operation Patents (Class 712/30)

Master/slave (Class 712/31)

3-D STACKED MULTIPROCESSOR STRUCTURES AND METHODS TO ENABLE RELIABLE OPERATION OF PROCESSORS AT SPEEDS ABOVE SPECIFIED LIMITS

Publication number: 20140006750

Abstract: A three-dimensional (3-D) processor system includes a first processor chip and a second processor chip in a stacked configuration. The first processor chip includes a first processor having a first set of state registers. The second processor chip includes a second processor having a second set of state registers that corresponds to the first set of state registers. The first and second processors are connected through vertical connections between the first and second processor chips. A mode control circuit operates the processor system in one of a plurality of operating modes. In one mode of operation, the first processor is active and the second processor is inactive, and the first processor operates at a speed greater than a maximum safe speed of the first processor, and the first processor uses the second set of state registers of the second processor to checkpoint a state of the first processor.

Type: Application

Filed: September 4, 2012

Publication date: January 2, 2014

Applicant: International Business Machines Corporation

Inventors: Alper Buyuktosunoglu, Philip G. Emma, Allan M. Hartstein, Michael B. Healy, Krishnan K. Kailas
Lock wait time reduction in a distributed processing environment

Patent number: 8607238

Abstract: Aspects of the present invention reduce a lock wait time in a distributed processing environment. A plurality of wait-for dependencies between a first plurality of transactions and a second plurality of transactions in a distributed processing environment is identified. The first plurality of transactions waits for the second plurality of transactions to release a plurality of locks on a plurality of shared resources. An amount of time the first plurality of transactions will wait for the second plurality of transactions in the distributed processing environment is determined based on the plurality of wait-for dependencies between the first plurality of transactions and the second plurality of transactions. Historical transaction data related to the plurality of wait-for dependencies between the first plurality of transactions and the second plurality of transactions is analyzed.

Type: Grant

Filed: July 8, 2011

Date of Patent: December 10, 2013

Assignee: International Business Machines Corporation

Inventors: Abhinay Ravinder Nagpal, Sri Ramanathan, Sandeep Ramesh Patil, Matthew Bunkley Trevathan
SYSTEM AND METHOD FOR DISTRIBUTED COMPUTING

Publication number: 20130326191

Abstract: The invention refers to tightly coupled multiprocessor distributed computing systems. The proposed solution enables to develop distributed applications as usual monolithic applications with use of typical compilers and builders. These applications support complicated logic of interaction between elements executed in different nodes and, at that, have limited complexity of development. The invention determines requirements to a distributed application and a method of its execution, memory organization and system node interaction manner.

Type: Application

Filed: October 4, 2011

Publication date: December 5, 2013

Inventor: Alexander Yakovlevich Bogdanov
Performing a deterministic reduction operation in a parallel computer

Patent number: 8601237

Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.

Type: Grant

Filed: November 9, 2012

Date of Patent: December 3, 2013

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
COMPOSITE PROCESSORS

Publication number: 20130318325

Abstract: In one example, a composite processor (100) includes a circuit board (1200), a first processor element package (1230), and a second processor element package (1240). The circuit board has an optical link (1211) and an electrical link (1221). The first processor element package (1230) includes a substrate (1231) with an integrated circuit (240), a sub-wavelength grating optical coupler (1232), and an electrical coupler (1233) coupled to the electrical link (1221) of the circuit board (1200). The second processor element package (1240) includes a substrate (1241) with an integrated circuit (240), a sub-wavelength grating optical coupler (1242), and an electrical coupler (1243) coupled to the electrical link (1221) of the circuit board (1220).

Type: Application

Filed: January 20, 2011

Publication date: November 28, 2013

Inventors: Raymond G. Beausoleil, Marco Fiorentino, Moray McLaren, Greg Astfalk, Nathan Lorenzo Binkert, David A. Fattal
Power-efficient sensory recognition processor

Patent number: 8588555

Abstract: This invention provides a computer processor architecture optimized for power-efficient computation of certain sensory recognition (e.g. vision) algorithms on a single computer chip. Illustratively, the architecture is optimized to carry out low-level routines and a special class of high-level sensory recognition routines derived from research into human brain perception processes. In an illustrative embodiment, the processor includes a plurality of processing nodes, arranged in a hierarchy of layers, and the processor resolves features from sensory information input and provides the feature information as input to a lowest hierarchy layer thereof. The hierarchy simultaneously, recognizes multiple components of the features, which are transferred between the layers so as to build likely recognition candidates. Each node can further include memory constructed and arranged to refresh and retain features determined to be likely recognition candidates by a thresholding process.

Type: Grant

Filed: June 11, 2010

Date of Patent: November 19, 2013

Assignee: Cognitive Electronics, Inc.

Inventors: Andrew C. Felch, Richard H. Granger
PERFORMING A CYCLIC REDUNDANCY CHECKSUM OPERATION RESPONSIVE TO A USER-LEVEL INSTRUCTION

Publication number: 20130305011

Abstract: In one embodiment, the present invention includes a method for receiving incoming data in a processor and performing a checksum operation on the incoming data in the processor pursuant to a user-level instruction for the checksum operation. For example, a cyclic redundancy checksum may be computed in the processor itself responsive to the user-level instruction. Other embodiments are described and claimed.

Type: Application

Filed: July 12, 2013

Publication date: November 14, 2013

Inventors: Steven R. KING, Frank L. Berry, Michael E. Kounavis
DYNAMIC CORE SWAPPING

Publication number: 20130297909

Abstract: An embodiment of the present invention is a technique to dynamically swap processor cores. A first core has a first instruction set. The first core executes a program at a first performance level. The first core stops executing the program when a triggering event occurs. A second core has a second instruction set compatible with the first instruction set and has a second performance level different than the first performance level. The second core is in a power down state when the first core is executing the program. A circuit powers up the second core after the first core stops executing the program such that the second core continues executing the program at the second performance level.

Type: Application

Filed: July 9, 2013

Publication date: November 7, 2013

Inventors: Brian V. Belmont, Animesh Mishra, James P. Kardach
Direct injection of data to be transferred in a hybrid computing environment

Patent number: 8578133

Abstract: Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer.

Type: Grant

Filed: October 31, 2012

Date of Patent: November 5, 2013

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Gary R. Ricard, Brian E. Smith
Direct injection of data to be transferred in a hybrid computing environment

Patent number: 8578132

Abstract: Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer.

Type: Grant

Filed: March 29, 2010

Date of Patent: November 5, 2013

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Gary R. Ricard, Brian E. Smith
PERFORMING A DETERMINISTIC REDUCTION OPERATION IN A PARALLEL COMPUTER

Publication number: 20130290673

Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.

Type: Application

Filed: November 9, 2012

Publication date: October 31, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: International Business Machines Corporation
Parallel computing system, synchronization device, and control method of parallel computing system

Patent number: 8572615

Abstract: A synchronization device includes a receiver that receives data from at least two synchronization devices establishing synchronization, and extracts synchronization information and register selection information from the received data, a transmitter that transmits data to each of the at least two synchronization devices establishing synchronization among a plurality of synchronization devices, a first and a second receiving state register that each stores the extracted synchronization information, a second receiving state register that stores the extracted synchronization information, and a controller that stores the extracted synchronization information into the first receiving state register and the second receiving state register alternately based on the register selection information, and controls the transmitter to transmit data including the register selection information to each of the at least two synchronization devices when the extracted synchronization information is completed in one of the first a

Type: Grant

Filed: December 14, 2011

Date of Patent: October 29, 2013

Assignee: Fujitsu Limited

Inventors: Tomohiro Inoue, Yuichiro Ajima, Shinya Hiramoto
3-D STACKED MULTIPROCESSOR STRUCTURES AND METHODS FOR MULTIMODAL OPERATION OF SAME

Publication number: 20130283009

Abstract: Three-dimensional (3-D) processor devices are provided, which are constructed by connecting processors in a stacked configuration. For instance, a processor system includes a first processor chip comprising a first processor and a second processor chip comprising a second processor. The first and second processor chips are connected in a stacked configuration with the first and second processors connected through vertical connections between the first and second processor chips. The processor system further includes a mode control circuit to selectively operate the processor system in one of a plurality of operating modes. For example, in a one mode of operation, the first and second processors are configured to implement a run-ahead function, wherein the first processor operates a primary thread of execution and the second processor operates a run-ahead thread of execution.

Type: Application

Filed: April 20, 2012

Publication date: October 24, 2013

Applicant: International Business Machines Corporation

Inventors: Alper Buyuktosunoglu, Philip G. Emma, Allan M. Hartstein, Michael B. Healy, Krishnan Kunjunny Kailas
3-D STACKED MULTIPROCESSOR STRUCTURES AND METHODS FOR MULTIMODAL OPERATION OF SAME

Publication number: 20130283010

Abstract: Three-dimensional (3-D) processor devices are provided, which are constructed by connecting processors in a stacked configuration. For instance, a processor system includes a first processor chip comprising a first processor and a second processor chip comprising a second processor. The first and second processor chips are connected in a stacked configuration with the first and second processors connected through vertical connections between the first and second processor chips. The processor system further includes a mode control circuit to selectively operate the processor system in one of a plurality of operating modes. For example, in a one mode of operation, the first and second processors are configured to implement a run-ahead function, wherein the first processor operates a primary thread of execution and the second processor operates a run-ahead thread of execution.

Type: Application

Filed: August 31, 2012

Publication date: October 24, 2013

Applicant: International Business Machines Corporation

Inventors: Alper Buyuktosunoglu, Philip G. Emma, Allan M. Hartstein, Michael B. Healy, Krishnan Kunjunny Kailas
Managing thread affinity on multi-core processors

Patent number: 8561073

Abstract: Embodiments of the invention intelligently associate processes with core processors in a multi-core processor. The core processors are asymmetrical in that the core processors support different features or provide different resources. The features or resources are published by the core processors or otherwise identified (e.g., via a query). Responsive to a request to execute an instruction associated with a thread, one of the core processors is selected based on the resource or feature supporting execution of the instruction. The thread is assigned to the selected core processor such that the selected core processor executes the instruction and subsequent instructions from the assigned thread. In some embodiments, the resource or feature is emulated until an activity limit is reached upon which the thread assignment occurs.

Type: Grant

Filed: September 19, 2008

Date of Patent: October 15, 2013

Assignee: Microsoft Corporation

Inventors: Yadhu Nandh Gopalan, John Mark Miller, Bor-Ming Hsieh
NON-INVASIVE SAFETY WRAPPER FOR COMPUTER SYSTEMS

Publication number: 20130269044

Abstract: A processing system comprising: a first processor adapted to perform one or more tasks according to a predetermined schedule and generate one or more first outputs; and a second processor synchronised with the first processor; wherein the second processor is adapted to receive the one or more first outputs and generate one or more corresponding second outputs when the timing of the one or more first outputs corresponds with the predetermined schedule.

Type: Application

Filed: April 19, 2011

Publication date: October 10, 2013

Applicant: TTE Systems Limited

Inventor: Michael Pont
Parallel computing apparatus and parallel computing method

Patent number: 8549261

Abstract: Computational unit area selecting units, each of which is provided in individual multiple cores, sequentially select uncomputed computational unit areas in a computational area. Computing units, each of which is provided in the individual multiple cores, perform computation for the selected computational unit areas. In addition, the computing units write computational results in a memory device which is accessible from each of the multiple cores. Computational result transmitting unit of the core performs computational result acquisition and transmission processing in a different time period with respect to each of multiple computational result transmission areas. The computational result acquisition processing is for acquiring, from the memory device, computational results related to the computational result transmission areas.

Type: Grant

Filed: April 30, 2012

Date of Patent: October 1, 2013

Assignee: Fujitsu Limited

Inventor: Yoshie Inada
Apparatus for processing data and method for generating manipulated and re-manipulated configuration data for processor

Patent number: 8549260

Abstract: Some embodiments comprise an apparatus for processing data, the apparatus having a second configurable processor configured to process data using second configuration data, and a configuration data re-manipulator configured to retrieve manipulated second configuration data and first data of a first processor, to re-manipulate the manipulated second configuration data depending on the first data, and to feed the re-manipulated second configuration data to the second configurable processor as the second configuration data.

Type: Grant

Filed: January 29, 2009

Date of Patent: October 1, 2013

Assignee: Infineon Technologies AG

Inventor: Steffen Marc Sonnekalb
Processing System With Interspersed Processors and Communication Elements Having Improved Wormhole Routing

Publication number: 20130254515

Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.

Type: Application

Filed: May 29, 2013

Publication date: September 26, 2013

Applicant: Coherent Logix, Incorporated

Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
Method and apparatus for QR-factorizing matrix on a multiprocessor system

Patent number: 8543626

Abstract: A method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.

Type: Grant

Filed: July 27, 2012

Date of Patent: September 24, 2013

Assignee: International Business Machines Corporation

Inventors: Hui Li, Bai Ling Wang
PROCESSOR, ELECTRONIC CONTROL UNIT AND GENERATING PROGRAM

Publication number: 20130246736

Abstract: A processor in which plural cores perform respective programs includes: a first own core execution point acquiring part configured to acquire first code block information if a first core executes an execution history recording instruction described at an execution history recording point in the program, the first code block information indicating, with a single address, a series of instructions executed by the first core; a first other core execution point acquiring part configured to acquire first execution address information of an instruction, the instruction being executed by a second core, if the first core executes the execution history recording instruction; and a first execution point information recording part configured to record the first code block information and the first execution address information in a shared memory in time series such that they are associated with each other.

Type: Application

Filed: November 25, 2010

Publication date: September 19, 2013

Applicant: TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventor: Kenji HONTANI
PARTITION-FREE MULTI-SOCKET MEMORY SYSTEM ARCHITECTURE

Publication number: 20130246719

Abstract: A technique to increase memory bandwidth for throughput applications. In one embodiment, memory bandwidth can be increased, particularly for throughput applications, without increasing interconnect trace or pin count by pipelining pages between one or more memory storage areas on half cycles of a memory access clock.

Type: Application

Filed: March 5, 2013

Publication date: September 19, 2013

Inventor: ERIC SPRANGLE
System for dedicating a number of processors to a network polling task and disabling interrupts of the dedicated processors

Patent number: 8539489

Abstract: Improving the performance of multitasking processors are provided. For example, a subset of M processors within a Symmetric Multi-Processing System (SMP) with N processors is dedicated for a specific task. The M (M>0) of the N processors are dedicate to a task, thus, leaving (N?M) processors for running normal operating system (OS). The processors dedicated to the task may have their interrupt mechanism disabled to avoid interrupt handler switching overhead. Therefore, these processors run in an independent context and can communicate with the normal OS and cooperation with the normal OS to achieve higher network performance.

Type: Grant

Filed: May 7, 2012

Date of Patent: September 17, 2013

Assignee: Fortinet, Inc.

Inventor: Jianzu Ding
Cache-aware thread scheduling in multi-threaded systems

Patent number: 8533719

Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.

Type: Grant

Filed: April 5, 2010

Date of Patent: September 10, 2013

Assignee: Oracle International Corporation

Inventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
Iterative process partner pairing scheme for global reduce operation

Patent number: 8527739

Abstract: Distributing a computing operation among processes and for gathering results of the computing operation from the plurality of processes. An exemplary method includes the operations of pairing a plurality of processes such that each process has a maximum of one interaction partner, selecting half of the data located at a process, dividing the selected half of the data into a plurality of data segments, transmitting a first data segment resulting from the dividing operation from the process to the interaction partner of the process, receiving a second data segment at the process from the interaction partner, concurrently with the transferring and receiving operations, performing a computing operation on a third data segment previously received from a previous interaction partner and a fourth data segment from the data segments, and iterating over the transmitting, receiving and computing operations until all the data segments have been exchanged.

Type: Grant

Filed: October 9, 2011

Date of Patent: September 3, 2013

Assignee: International Business Machines Corporation

Inventor: Bin Jia
NETWORK ON CHIP PROCESSOR WITH MULTIPLE CORES AND ROUTING METHOD THEREOF

Publication number: 20130219148

Abstract: An exemplary embodiment of the present disclosure illustrates a network on chip processor including multiple cores and a Kautz NoC. Each of the cores is assigned with an addressing string with L based-D words, and the addressing string does not have two neighboring identical words, wherein L present of an addressing string length is an integer larger than 1, D present of a word selection is an integer larger than 2. Each of the cores is unidirectionally link to other (D?1) cores through the Kautz NoC, and in the two connected cores, the last (L?1) words associated with the addressing string of one core are same as the first (L?1) words associated with the addressing string of the other core.

Type: Application

Filed: August 30, 2012

Publication date: August 22, 2013

Applicant: NATIONAL TAIWAN UNIVERSITY

Inventors: LIANG-GEE CHEN, CHUAN-YUNG TSAI
Self-similar processing network

Patent number: 8504800

Abstract: Self-similar processing by unit processing cells may together solve a problem. A unit processing cell may include a processor, a memory and a plurality of Input/Output (IO) channels coupled to the processor. The memory may include a dictionary having one or more instructions that configure the processor to perform at least one function. The plurality of IO channels may be used to communicably couple the unit processing cell with a plurality of other unit processing cells each including their own respective dictionary. The unit processing cell and the plurality of other unit processing cells may be independent of one another and may perform together without a centralized control. The processor may update the dictionary so that the unit processing cell builds a different dictionary from the plurality of other unit processing cells, thereby being self-similar to the plurality of other unit processing cells.

Type: Grant

Filed: September 21, 2010

Date of Patent: August 6, 2013

Assignee: Hilbert Technology, Inc.

Inventor: Bjorn J. Gruenwald
PROCESSOR CONTROL APPARATUS AND METHOD THEREFOR

Publication number: 20130191613

Abstract: Whether each of a plurality of processor cores is in a suspend state or operation state is detected. The processor utilization of a processor core of interest in the operation state is acquired. The number of processes assigned to the processor core of interest is obtained. The stop control or startup control of a processor core is performed based on the suspend state or operation state, the processor utilization, and the number of processes.

Type: Application

Filed: December 12, 2012

Publication date: July 25, 2013

Applicant: CANON KABUSHIKI KAISHA

Inventor: CANON KABUSHIKI KAISHA
INTERFERENCE-DRIVEN RESOURCE MANAGEMENT FOR GPU-BASED HETEROGENEOUS CLUSTERS

Publication number: 20130191612

Abstract: Systems and methods are disclosed that share coprocessor resources between two or more applications in a computing cluster using a job selector to receive jobs from a job queue; a node selector coupled to the job selector; an off line profiler with an interference prediction model; a coprocessor dynamic interference detection module; and a coprocessor interference response module.

Type: Application

Filed: October 6, 2012

Publication date: July 25, 2013

Applicant: NEC LABORATORIES AMERICA, INC.

Inventors: Cheng-Hong Li, Srihari Cadambi, Srimat T. Chakradhar, Rajat Phull
Performing a deterministic reduction operation in a compute node organized into a branched tree topology

Patent number: 8489859

Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.

Type: Grant

Filed: May 28, 2010

Date of Patent: July 16, 2013

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent number: 8484440

Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.

Type: Grant

Filed: May 21, 2008

Date of Patent: July 9, 2013

Assignee: International Business Machines Corporation

Inventor: Ahmad Faraj
MULTIPROCESSOR SYSTEM AND SYNCHRONOUS ENGINE DEVICE THEREOF

Publication number: 20130166879

Abstract: The invention discloses a multiprocessor System and synchronous engine device thereof.

Type: Application

Filed: August 30, 2011

Publication date: June 27, 2013

Inventors: Ninghui Sun, Fei Chen, Zheng Cao, Kai Wang, Xuejun An
Dynamic priority queuing

Patent number: 8468534

Abstract: Techniques are provided for dynamically re-ordering operation requests that have previously been submitted to a queue management unit. After the queue management unit has placed multiple requests in a queue to be executed in an order that is based on priorities that were assigned to the operations, the entity that requested the operations (the “requester”) sends one or more priority-change messages. The one or more priority-change messages include requests to perform operations that have already been queued. For at least one of the operations, the priority assigned to the operation in the subsequent request is different from the priority that was assigned to the same operation when that operation was initially queued for execution. Based on the change in priority, the operation whose priority has change is placed at a different location in the queue, relative to the other operations in the queue that were requested by the same requester.

Type: Grant

Filed: April 5, 2010

Date of Patent: June 18, 2013

Assignee: Apple Inc.

Inventor: Brian R. Tunning
MULTI-CORE PROCESSOR

Publication number: 20130151814

Abstract: A multi-core processor includes a monitored processor core whose process result is to be monitored; a monitoring processor core group including two or more monitoring processors which can perform a process for monitoring the monitored processor core; an evaluating part configured to evaluate a processing load of the monitoring processor core group; and a controlling part configured to make the monitoring processor core group perform the process for monitoring the monitored processor core in a distributed manner if the processing load of the monitoring processor core group evaluated by the evaluating part is low, and make the monitoring processor of the monitoring processor core group perform the process for monitoring the monitored processor core if the processing load of the monitoring processor core group evaluated by the evaluating part is high, the monitoring processor performing a process whose priority is relatively low.

Type: Application

Filed: December 13, 2011

Publication date: June 13, 2013

Applicant: Toyota Jidosha Kabushiki Kaisha

Inventor: Koji Ueda
Method and apparatus for computing massive spatio-temporal correlations using a hybrid CPU-GPU approach

Patent number: 8464026

Abstract: A CPU may select a variable from a variable set as a dependent variable. The variable set may be part of the data structure that includes a plurality of vector values, a vector value associated with a variable set of n number of variables, and each variable of the variable set having a variable value. The number of dependent variable steps for the dependent variable may be determined. The number of the vector values in a dependent variable step is determined as being number of independent variables. A function is mapped to a plurality of thread processors, and each thread processor is assigned for the function to be performed on each one of the independent variables for each of the dependent variable steps.

Type: Grant

Filed: February 17, 2010

Date of Patent: June 11, 2013

Assignee: International Business Machines Corporation

Inventors: Rajesh Ramkrishna Bordawekar, Ravishankar Rao
Performing a local reduction operation on a parallel computer

Patent number: 8458244

Abstract: A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.

Type: Grant

Filed: August 15, 2012

Date of Patent: June 4, 2013

Assignee: International Business Machines Corporation

Inventors: Michael A. Blocksome, Daniel A. Faraj
METHOD AND APPARATUS FOR PACKET PROCESSING AND A PREPROCESSOR

Publication number: 20130138920

Abstract: An apparatus for packet processing is provided. The apparatus is to be implemented in a server and includes: a preprocessor and at least two processors which are respectively connected with the preprocessor. The preprocessor is to classify packets received externally from the server, and to distribute the classified packets to the respective processors, wherein packets in a same flow are distributed to a same processor. Each of the processors is to receive and process a packet distributed by the preprocessor.

Type: Application

Filed: August 11, 2011

Publication date: May 30, 2013

Applicant: Hangzhou H3C Technologies, Co., Ltd.

Inventor: Changzhong Ge
Workflow control of reservations and regular jobs using a flexible job scheduler

Patent number: 8453152

Abstract: A scheduler receives at least one flexible reservation request for scheduling in a computing environment comprising consumable resources. The flexible reservation request specifies a duration and at least one required resource. The consumable resources comprise at least one machine resource and at least one floating resource. The scheduler creates a flexible job for the at least one flexible reservation request and places the flexible job in a prioritized job queue for scheduling, wherein the flexible job is prioritizes relative to at least one regular job in the prioritized job queue. The scheduler adds a reservation set to a waiting state for the at least one flexible reservation request.

Type: Grant

Filed: February 1, 2011

Date of Patent: May 28, 2013

Assignee: International Business Machines Corporation

Inventors: Alexander Druyan, Wei Li, Kailash N. Marthi, Yun T. Xiang, Linda C. Cham
Information processing device, information processing method, and recording medium

Patent number: 8448174

Abstract: An information processing device which has a plurality of process units for performing various kinds of processes includes a detecting unit that detects a processing loads of the process units; a determining unit that determines whether a total amount of the processing loads detected by the detecting unit is equal to or larger than a specific value; a designating unit that designates a process unit having a process state to be controlled, based on the processing loads of the process units detected by the detecting unit, when the determining unit determines that the total amount is equal to or larger than the specific value; a process identifying unit that identifies a process having an execution state to be controlled among processes being performed by the process unit designated by the designating unit; and a control unit that controls the execution state of the process identified by the process identifying unit.

Type: Grant

Filed: January 22, 2010

Date of Patent: May 21, 2013

Assignee: Fujitsu Limited

Inventors: Ryo Miyamoto, Ryuichi Matsukura, Takashi Ohno
Microprocessor with first processor for debugging second processor

Patent number: 8443175

Abstract: A microprocessor integrated circuit includes first and second processors, an internal memory accessible by the first and second processors, and a bus interface unit configured to interface to a bus external to the microprocessor for providing access to a memory external to the microprocessor. The bus interface unit, external bus, and external memory are accessible by the second processor but are inaccessible by the first processor. The first processor writes debug information to the internal memory. The first processor detects an event and provides a notification of the event to the second processor. The second processor, coupled to the bus interface unit, executes microcode in response to the event notification received from the first processor. The microcode reads the debug information from the internal memory and writes the debug information to the external memory via the bus interface unit and external bus for use in debugging the second processor.

Type: Grant

Filed: March 29, 2010

Date of Patent: May 14, 2013

Assignee: VIA Technologies, Inc.

Inventors: G. Glenn Henry, Jui-Shuan Chen
COPROCESSOR HAVING TASK SEQUENCE CONTROL

Publication number: 20130117533

Abstract: A coprocessor has: a processing unit for processing tasks in a data-processing system subject to at least one master processor; at least one storage module having memory areas, assignable in each case to the tasks, for storing data assigned to the tasks; and a buffer area for buffering instructions assigned to the tasks, the instructions including processing instructions, and upon retrieval of the processing instructions from the buffer area, the data stored in the storage module being processed on the basis of the processing instructions.

Type: Application

Filed: April 6, 2011

Publication date: May 9, 2013

Inventor: Jan Hayek
Main processing element for delegating virtualized control threads controlling clock speed and power consumption to groups of sub-processing elements in a system such that a group of sub-processing elements can be designated as pseudo main processing element

Patent number: 8438404

Abstract: The disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs using program code embodied as a set of virtualized control threads. The arrangement also enables MPEs delegate functionality to one or more groups of SPEs such that those group(s) of SPEs will act as pseudo MPEs. The pseudo MPEs will utilize pseudo virtualized control threads to control the behavior of other groups of SPEs. In a typical embodiment, the apparatus includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements.

Type: Grant

Filed: September 30, 2008

Date of Patent: May 7, 2013

Assignee: International Business Machines Corporation

Inventors: Karl J. Duvalsaint, Harm P. Hofstee, Daeik Kim, Moon J. Kim
CHARACTERIZATION AND VALIDATION OF PROCESSOR LINKS

Publication number: 20130103927

Abstract: A processor link that couples a first processor and a second processor is selected for validation and a plurality of communication parameter settings associated with the first and the second processors is identified. The first and the second processors are successively configured with each of the communication parameter settings. One or more test data pattern(s) are provided from the first processor to the second processor in accordance with the communication parameter setting. Performance measurements associated with the selected processor link and with the communication parameter setting are determined based, at least in part, on the test data pattern as received at the second processor. One of the communication parameter settings that is associated with the highest performance measurements is selected. The selected communication parameter setting is applied to the first and the second processors for subsequent communication between the first and the second processors via the processor link.

Type: Application

Filed: October 25, 2011

Publication date: April 25, 2013

Applicant: International Business Machines Corporation

Inventors: Robert W. Berry, JR., Anand Haridass, Prasanna Jayaraman
Method, Apparatus, And System For Optimizing Frequency And Performance In A Multidie Microprocessor

Publication number: 20130103928

Abstract: With the progress toward multi-core processors, each core is can not readily ascertain the status of the other dies with respect to an idle or active status. A proposal for utilizing an interface to transmit core status among multiple cores in a multi-die microprocessor is discussed. Consequently, this facilitates thermal management by allowing an optimal setting for setting performance and frequency based on utilizing each core status.

Type: Application

Filed: December 11, 2012

Publication date: April 25, 2013

Inventors: Jose P. Allarey, Varghese George, Sanjeev Jahagirdar, Oren Lamdan, Nathan Ofer, Tomer Ziv
UNIFIED, WORKLOAD-OPTIMIZED, ADAPTIVE RAS FOR HYBRID SYSTEMS

Publication number: 20130097407

Abstract: A method, system, and computer program product for maintaining reliability in a computer system. In an example embodiment, the method includes managing workloads on a first processor with a first processor architecture by an agent process executing on a second processor with a second processor architecture. The method proceeds by activating redundant computation on the second processor by the agent process. The method continues by performing a same computation from a workload of the workloads at least twice. Finally, the method includes comparing results of the same computation. In this embodiment the first processor is coupled the second processor by a network, and the first processor architecture and second processor architecture are different architectures.

Type: Application

Filed: December 8, 2012

Publication date: April 18, 2013

Applicant: International Business Machines Corporation

Inventor: International Business Machines Corporation
CLUSTER COMPUTING USING SPECIAL PURPOSE MICROPROCESSORS

Publication number: 20130097406

Abstract: In some embodiments, a computer cluster system comprises a plurality of nodes and a software package comprising a user interface and a kernel for interpreting program code instructions. In certain embodiments, a cluster node module is configured to communicate with the kernel and other cluster node modules. The cluster node module can accept instructions from the user interface and can interpret at least some of the instructions such that several cluster node modules in communication with one another and with a kernel can act as a computer cluster.

Type: Application

Filed: March 16, 2012

Publication date: April 18, 2013

Inventors: Zvi Tannenbaum, Dean E. Dauger
Sequential processing in network on chip nodes by threads generating message containing payload and pointer for nanokernel to access algorithm to be executed on payload in another node

Patent number: 8423749

Abstract: A computer-implemented method, system and computer program product for controlling an algorithm that is performed on a unit of work in a subsequent software pipeline stage in a Network On a Chip (NOC) is presented. In one embodiment, the method executes a first operation in a first node of the NOC. The first node generates payload, and then loads that payload into a message. The message with the payload is transmitted to a nanokernel that controls a second node in the NOC. The nanokernel calls an algorithm that is needed by a second operation in a second node in the NOC, which uses the algorithm to execute the second operation.

Type: Grant

Filed: October 22, 2008

Date of Patent: April 16, 2013

Assignee: International Business Machines Corporation

Inventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
PARALLEL COMPUTER ARCHITECTURE FOR COMPUTATION OF PARTICLE INTERACTIONS

Publication number: 20130091341

Abstract: A computation system for computing interactions in a multiple-body simulation includes an array of processing modules arranged into one or more serially interconnected processing groups of the processing modules. Each of the processing modules includes storage for data elements and includes circuitry for performing pairwise computations between data elements each associated with a spatial location. Each of the pairwise computations makes use of a data element from the storage of the processing module and a data element passing through the serially interconnected processing modules. Each of the processing modules includes circuitry for selecting the pairs of data elements according to separations between spatial locations associated with the data elements.

Type: Application

Filed: November 19, 2012

Publication date: April 11, 2013

Applicant: D.E. Shaw Research LLC

Inventors: David E. Shaw, Martin M. Deneroff, Ron O. Dror, Richard H. Larson, John K. Salmon
Assigning different serialization identifier to operations on different data set for execution in respective processor in multi-processor system

Patent number: 8417919

Abstract: A method of dynamic parallelization in a multi-processor identifies potentially independent computational operations, such as functions and methods, with a serializer that assigns a computational operation to a serialization set and a processor based on assessment of the data that the computational operation will be accessing upon execution.

Type: Grant

Filed: August 18, 2009

Date of Patent: April 9, 2013

Assignee: Wisconsin Alumni Research Foundation

Inventors: Matthew Allen, Gurindar S. Sohi
Distributed Data Scalable Adaptive Map-Reduce Framework

Publication number: 20130086355

Abstract: A method, an apparatus and an article of manufacture for generating a distributed data scalable adaptive map-reduce framework for at least one multi-core cluster. The method includes partitioning a cluster into at least one computational group, determining at least one key-group leader within each computational group, performing a local combine operation at each computational group, performing a global combine operation at each of the at least one key-group leader within each computational group based on a result from the local combine operation, and performing a global map-reduce operation across the at least one key-group leader within each computational group.

Type: Application

Filed: September 30, 2011

Publication date: April 4, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ankur Narang, Jyothish Soman

prev 1 2 3 4 5 6 7 8 … next