Operation Patents (Class 712/30)
-
Publication number: 20140006750Abstract: A three-dimensional (3-D) processor system includes a first processor chip and a second processor chip in a stacked configuration. The first processor chip includes a first processor having a first set of state registers. The second processor chip includes a second processor having a second set of state registers that corresponds to the first set of state registers. The first and second processors are connected through vertical connections between the first and second processor chips. A mode control circuit operates the processor system in one of a plurality of operating modes. In one mode of operation, the first processor is active and the second processor is inactive, and the first processor operates at a speed greater than a maximum safe speed of the first processor, and the first processor uses the second set of state registers of the second processor to checkpoint a state of the first processor.Type: ApplicationFiled: September 4, 2012Publication date: January 2, 2014Applicant: International Business Machines CorporationInventors: Alper Buyuktosunoglu, Philip G. Emma, Allan M. Hartstein, Michael B. Healy, Krishnan K. Kailas
-
Patent number: 8607238Abstract: Aspects of the present invention reduce a lock wait time in a distributed processing environment. A plurality of wait-for dependencies between a first plurality of transactions and a second plurality of transactions in a distributed processing environment is identified. The first plurality of transactions waits for the second plurality of transactions to release a plurality of locks on a plurality of shared resources. An amount of time the first plurality of transactions will wait for the second plurality of transactions in the distributed processing environment is determined based on the plurality of wait-for dependencies between the first plurality of transactions and the second plurality of transactions. Historical transaction data related to the plurality of wait-for dependencies between the first plurality of transactions and the second plurality of transactions is analyzed.Type: GrantFiled: July 8, 2011Date of Patent: December 10, 2013Assignee: International Business Machines CorporationInventors: Abhinay Ravinder Nagpal, Sri Ramanathan, Sandeep Ramesh Patil, Matthew Bunkley Trevathan
-
Publication number: 20130326191Abstract: The invention refers to tightly coupled multiprocessor distributed computing systems. The proposed solution enables to develop distributed applications as usual monolithic applications with use of typical compilers and builders. These applications support complicated logic of interaction between elements executed in different nodes and, at that, have limited complexity of development. The invention determines requirements to a distributed application and a method of its execution, memory organization and system node interaction manner.Type: ApplicationFiled: October 4, 2011Publication date: December 5, 2013Inventor: Alexander Yakovlevich Bogdanov
-
Patent number: 8601237Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.Type: GrantFiled: November 9, 2012Date of Patent: December 3, 2013Assignee: International Business Machines CorporationInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
-
Publication number: 20130318325Abstract: In one example, a composite processor (100) includes a circuit board (1200), a first processor element package (1230), and a second processor element package (1240). The circuit board has an optical link (1211) and an electrical link (1221). The first processor element package (1230) includes a substrate (1231) with an integrated circuit (240), a sub-wavelength grating optical coupler (1232), and an electrical coupler (1233) coupled to the electrical link (1221) of the circuit board (1200). The second processor element package (1240) includes a substrate (1241) with an integrated circuit (240), a sub-wavelength grating optical coupler (1242), and an electrical coupler (1243) coupled to the electrical link (1221) of the circuit board (1220).Type: ApplicationFiled: January 20, 2011Publication date: November 28, 2013Inventors: Raymond G. Beausoleil, Marco Fiorentino, Moray McLaren, Greg Astfalk, Nathan Lorenzo Binkert, David A. Fattal
-
Patent number: 8588555Abstract: This invention provides a computer processor architecture optimized for power-efficient computation of certain sensory recognition (e.g. vision) algorithms on a single computer chip. Illustratively, the architecture is optimized to carry out low-level routines and a special class of high-level sensory recognition routines derived from research into human brain perception processes. In an illustrative embodiment, the processor includes a plurality of processing nodes, arranged in a hierarchy of layers, and the processor resolves features from sensory information input and provides the feature information as input to a lowest hierarchy layer thereof. The hierarchy simultaneously, recognizes multiple components of the features, which are transferred between the layers so as to build likely recognition candidates. Each node can further include memory constructed and arranged to refresh and retain features determined to be likely recognition candidates by a thresholding process.Type: GrantFiled: June 11, 2010Date of Patent: November 19, 2013Assignee: Cognitive Electronics, Inc.Inventors: Andrew C. Felch, Richard H. Granger
-
Publication number: 20130305011Abstract: In one embodiment, the present invention includes a method for receiving incoming data in a processor and performing a checksum operation on the incoming data in the processor pursuant to a user-level instruction for the checksum operation. For example, a cyclic redundancy checksum may be computed in the processor itself responsive to the user-level instruction. Other embodiments are described and claimed.Type: ApplicationFiled: July 12, 2013Publication date: November 14, 2013Inventors: Steven R. KING, Frank L. Berry, Michael E. Kounavis
-
Publication number: 20130297909Abstract: An embodiment of the present invention is a technique to dynamically swap processor cores. A first core has a first instruction set. The first core executes a program at a first performance level. The first core stops executing the program when a triggering event occurs. A second core has a second instruction set compatible with the first instruction set and has a second performance level different than the first performance level. The second core is in a power down state when the first core is executing the program. A circuit powers up the second core after the first core stops executing the program such that the second core continues executing the program at the second performance level.Type: ApplicationFiled: July 9, 2013Publication date: November 7, 2013Inventors: Brian V. Belmont, Animesh Mishra, James P. Kardach
-
Patent number: 8578133Abstract: Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer.Type: GrantFiled: October 31, 2012Date of Patent: November 5, 2013Assignee: International Business Machines CorporationInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Gary R. Ricard, Brian E. Smith
-
Patent number: 8578132Abstract: Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer.Type: GrantFiled: March 29, 2010Date of Patent: November 5, 2013Assignee: International Business Machines CorporationInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Gary R. Ricard, Brian E. Smith
-
Publication number: 20130290673Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.Type: ApplicationFiled: November 9, 2012Publication date: October 31, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: International Business Machines Corporation
-
Patent number: 8572615Abstract: A synchronization device includes a receiver that receives data from at least two synchronization devices establishing synchronization, and extracts synchronization information and register selection information from the received data, a transmitter that transmits data to each of the at least two synchronization devices establishing synchronization among a plurality of synchronization devices, a first and a second receiving state register that each stores the extracted synchronization information, a second receiving state register that stores the extracted synchronization information, and a controller that stores the extracted synchronization information into the first receiving state register and the second receiving state register alternately based on the register selection information, and controls the transmitter to transmit data including the register selection information to each of the at least two synchronization devices when the extracted synchronization information is completed in one of the first aType: GrantFiled: December 14, 2011Date of Patent: October 29, 2013Assignee: Fujitsu LimitedInventors: Tomohiro Inoue, Yuichiro Ajima, Shinya Hiramoto
-
Publication number: 20130283009Abstract: Three-dimensional (3-D) processor devices are provided, which are constructed by connecting processors in a stacked configuration. For instance, a processor system includes a first processor chip comprising a first processor and a second processor chip comprising a second processor. The first and second processor chips are connected in a stacked configuration with the first and second processors connected through vertical connections between the first and second processor chips. The processor system further includes a mode control circuit to selectively operate the processor system in one of a plurality of operating modes. For example, in a one mode of operation, the first and second processors are configured to implement a run-ahead function, wherein the first processor operates a primary thread of execution and the second processor operates a run-ahead thread of execution.Type: ApplicationFiled: April 20, 2012Publication date: October 24, 2013Applicant: International Business Machines CorporationInventors: Alper Buyuktosunoglu, Philip G. Emma, Allan M. Hartstein, Michael B. Healy, Krishnan Kunjunny Kailas
-
Publication number: 20130283010Abstract: Three-dimensional (3-D) processor devices are provided, which are constructed by connecting processors in a stacked configuration. For instance, a processor system includes a first processor chip comprising a first processor and a second processor chip comprising a second processor. The first and second processor chips are connected in a stacked configuration with the first and second processors connected through vertical connections between the first and second processor chips. The processor system further includes a mode control circuit to selectively operate the processor system in one of a plurality of operating modes. For example, in a one mode of operation, the first and second processors are configured to implement a run-ahead function, wherein the first processor operates a primary thread of execution and the second processor operates a run-ahead thread of execution.Type: ApplicationFiled: August 31, 2012Publication date: October 24, 2013Applicant: International Business Machines CorporationInventors: Alper Buyuktosunoglu, Philip G. Emma, Allan M. Hartstein, Michael B. Healy, Krishnan Kunjunny Kailas
-
Patent number: 8561073Abstract: Embodiments of the invention intelligently associate processes with core processors in a multi-core processor. The core processors are asymmetrical in that the core processors support different features or provide different resources. The features or resources are published by the core processors or otherwise identified (e.g., via a query). Responsive to a request to execute an instruction associated with a thread, one of the core processors is selected based on the resource or feature supporting execution of the instruction. The thread is assigned to the selected core processor such that the selected core processor executes the instruction and subsequent instructions from the assigned thread. In some embodiments, the resource or feature is emulated until an activity limit is reached upon which the thread assignment occurs.Type: GrantFiled: September 19, 2008Date of Patent: October 15, 2013Assignee: Microsoft CorporationInventors: Yadhu Nandh Gopalan, John Mark Miller, Bor-Ming Hsieh
-
Publication number: 20130269044Abstract: A processing system comprising: a first processor adapted to perform one or more tasks according to a predetermined schedule and generate one or more first outputs; and a second processor synchronised with the first processor; wherein the second processor is adapted to receive the one or more first outputs and generate one or more corresponding second outputs when the timing of the one or more first outputs corresponds with the predetermined schedule.Type: ApplicationFiled: April 19, 2011Publication date: October 10, 2013Applicant: TTE Systems LimitedInventor: Michael Pont
-
Patent number: 8549261Abstract: Computational unit area selecting units, each of which is provided in individual multiple cores, sequentially select uncomputed computational unit areas in a computational area. Computing units, each of which is provided in the individual multiple cores, perform computation for the selected computational unit areas. In addition, the computing units write computational results in a memory device which is accessible from each of the multiple cores. Computational result transmitting unit of the core performs computational result acquisition and transmission processing in a different time period with respect to each of multiple computational result transmission areas. The computational result acquisition processing is for acquiring, from the memory device, computational results related to the computational result transmission areas.Type: GrantFiled: April 30, 2012Date of Patent: October 1, 2013Assignee: Fujitsu LimitedInventor: Yoshie Inada
-
Patent number: 8549260Abstract: Some embodiments comprise an apparatus for processing data, the apparatus having a second configurable processor configured to process data using second configuration data, and a configuration data re-manipulator configured to retrieve manipulated second configuration data and first data of a first processor, to re-manipulate the manipulated second configuration data depending on the first data, and to feed the re-manipulated second configuration data to the second configurable processor as the second configuration data.Type: GrantFiled: January 29, 2009Date of Patent: October 1, 2013Assignee: Infineon Technologies AGInventor: Steffen Marc Sonnekalb
-
Publication number: 20130254515Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.Type: ApplicationFiled: May 29, 2013Publication date: September 26, 2013Applicant: Coherent Logix, IncorporatedInventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
-
Patent number: 8543626Abstract: A method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.Type: GrantFiled: July 27, 2012Date of Patent: September 24, 2013Assignee: International Business Machines CorporationInventors: Hui Li, Bai Ling Wang
-
Publication number: 20130246736Abstract: A processor in which plural cores perform respective programs includes: a first own core execution point acquiring part configured to acquire first code block information if a first core executes an execution history recording instruction described at an execution history recording point in the program, the first code block information indicating, with a single address, a series of instructions executed by the first core; a first other core execution point acquiring part configured to acquire first execution address information of an instruction, the instruction being executed by a second core, if the first core executes the execution history recording instruction; and a first execution point information recording part configured to record the first code block information and the first execution address information in a shared memory in time series such that they are associated with each other.Type: ApplicationFiled: November 25, 2010Publication date: September 19, 2013Applicant: TOYOTA JIDOSHA KABUSHIKI KAISHAInventor: Kenji HONTANI
-
Publication number: 20130246719Abstract: A technique to increase memory bandwidth for throughput applications. In one embodiment, memory bandwidth can be increased, particularly for throughput applications, without increasing interconnect trace or pin count by pipelining pages between one or more memory storage areas on half cycles of a memory access clock.Type: ApplicationFiled: March 5, 2013Publication date: September 19, 2013Inventor: ERIC SPRANGLE
-
Patent number: 8539489Abstract: Improving the performance of multitasking processors are provided. For example, a subset of M processors within a Symmetric Multi-Processing System (SMP) with N processors is dedicated for a specific task. The M (M>0) of the N processors are dedicate to a task, thus, leaving (N?M) processors for running normal operating system (OS). The processors dedicated to the task may have their interrupt mechanism disabled to avoid interrupt handler switching overhead. Therefore, these processors run in an independent context and can communicate with the normal OS and cooperation with the normal OS to achieve higher network performance.Type: GrantFiled: May 7, 2012Date of Patent: September 17, 2013Assignee: Fortinet, Inc.Inventor: Jianzu Ding
-
Patent number: 8533719Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.Type: GrantFiled: April 5, 2010Date of Patent: September 10, 2013Assignee: Oracle International CorporationInventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
-
Patent number: 8527739Abstract: Distributing a computing operation among processes and for gathering results of the computing operation from the plurality of processes. An exemplary method includes the operations of pairing a plurality of processes such that each process has a maximum of one interaction partner, selecting half of the data located at a process, dividing the selected half of the data into a plurality of data segments, transmitting a first data segment resulting from the dividing operation from the process to the interaction partner of the process, receiving a second data segment at the process from the interaction partner, concurrently with the transferring and receiving operations, performing a computing operation on a third data segment previously received from a previous interaction partner and a fourth data segment from the data segments, and iterating over the transmitting, receiving and computing operations until all the data segments have been exchanged.Type: GrantFiled: October 9, 2011Date of Patent: September 3, 2013Assignee: International Business Machines CorporationInventor: Bin Jia
-
Publication number: 20130219148Abstract: An exemplary embodiment of the present disclosure illustrates a network on chip processor including multiple cores and a Kautz NoC. Each of the cores is assigned with an addressing string with L based-D words, and the addressing string does not have two neighboring identical words, wherein L present of an addressing string length is an integer larger than 1, D present of a word selection is an integer larger than 2. Each of the cores is unidirectionally link to other (D?1) cores through the Kautz NoC, and in the two connected cores, the last (L?1) words associated with the addressing string of one core are same as the first (L?1) words associated with the addressing string of the other core.Type: ApplicationFiled: August 30, 2012Publication date: August 22, 2013Applicant: NATIONAL TAIWAN UNIVERSITYInventors: LIANG-GEE CHEN, CHUAN-YUNG TSAI
-
Patent number: 8504800Abstract: Self-similar processing by unit processing cells may together solve a problem. A unit processing cell may include a processor, a memory and a plurality of Input/Output (IO) channels coupled to the processor. The memory may include a dictionary having one or more instructions that configure the processor to perform at least one function. The plurality of IO channels may be used to communicably couple the unit processing cell with a plurality of other unit processing cells each including their own respective dictionary. The unit processing cell and the plurality of other unit processing cells may be independent of one another and may perform together without a centralized control. The processor may update the dictionary so that the unit processing cell builds a different dictionary from the plurality of other unit processing cells, thereby being self-similar to the plurality of other unit processing cells.Type: GrantFiled: September 21, 2010Date of Patent: August 6, 2013Assignee: Hilbert Technology, Inc.Inventor: Bjorn J. Gruenwald
-
Publication number: 20130191613Abstract: Whether each of a plurality of processor cores is in a suspend state or operation state is detected. The processor utilization of a processor core of interest in the operation state is acquired. The number of processes assigned to the processor core of interest is obtained. The stop control or startup control of a processor core is performed based on the suspend state or operation state, the processor utilization, and the number of processes.Type: ApplicationFiled: December 12, 2012Publication date: July 25, 2013Applicant: CANON KABUSHIKI KAISHAInventor: CANON KABUSHIKI KAISHA
-
Publication number: 20130191612Abstract: Systems and methods are disclosed that share coprocessor resources between two or more applications in a computing cluster using a job selector to receive jobs from a job queue; a node selector coupled to the job selector; an off line profiler with an interference prediction model; a coprocessor dynamic interference detection module; and a coprocessor interference response module.Type: ApplicationFiled: October 6, 2012Publication date: July 25, 2013Applicant: NEC LABORATORIES AMERICA, INC.Inventors: Cheng-Hong Li, Srihari Cadambi, Srimat T. Chakradhar, Rajat Phull
-
Patent number: 8489859Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.Type: GrantFiled: May 28, 2010Date of Patent: July 16, 2013Assignee: International Business Machines CorporationInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
-
Patent number: 8484440Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.Type: GrantFiled: May 21, 2008Date of Patent: July 9, 2013Assignee: International Business Machines CorporationInventor: Ahmad Faraj
-
Publication number: 20130166879Abstract: The invention discloses a multiprocessor System and synchronous engine device thereof.Type: ApplicationFiled: August 30, 2011Publication date: June 27, 2013Inventors: Ninghui Sun, Fei Chen, Zheng Cao, Kai Wang, Xuejun An
-
Patent number: 8468534Abstract: Techniques are provided for dynamically re-ordering operation requests that have previously been submitted to a queue management unit. After the queue management unit has placed multiple requests in a queue to be executed in an order that is based on priorities that were assigned to the operations, the entity that requested the operations (the “requester”) sends one or more priority-change messages. The one or more priority-change messages include requests to perform operations that have already been queued. For at least one of the operations, the priority assigned to the operation in the subsequent request is different from the priority that was assigned to the same operation when that operation was initially queued for execution. Based on the change in priority, the operation whose priority has change is placed at a different location in the queue, relative to the other operations in the queue that were requested by the same requester.Type: GrantFiled: April 5, 2010Date of Patent: June 18, 2013Assignee: Apple Inc.Inventor: Brian R. Tunning
-
Publication number: 20130151814Abstract: A multi-core processor includes a monitored processor core whose process result is to be monitored; a monitoring processor core group including two or more monitoring processors which can perform a process for monitoring the monitored processor core; an evaluating part configured to evaluate a processing load of the monitoring processor core group; and a controlling part configured to make the monitoring processor core group perform the process for monitoring the monitored processor core in a distributed manner if the processing load of the monitoring processor core group evaluated by the evaluating part is low, and make the monitoring processor of the monitoring processor core group perform the process for monitoring the monitored processor core if the processing load of the monitoring processor core group evaluated by the evaluating part is high, the monitoring processor performing a process whose priority is relatively low.Type: ApplicationFiled: December 13, 2011Publication date: June 13, 2013Applicant: Toyota Jidosha Kabushiki KaishaInventor: Koji Ueda
-
Patent number: 8464026Abstract: A CPU may select a variable from a variable set as a dependent variable. The variable set may be part of the data structure that includes a plurality of vector values, a vector value associated with a variable set of n number of variables, and each variable of the variable set having a variable value. The number of dependent variable steps for the dependent variable may be determined. The number of the vector values in a dependent variable step is determined as being number of independent variables. A function is mapped to a plurality of thread processors, and each thread processor is assigned for the function to be performed on each one of the independent variables for each of the dependent variable steps.Type: GrantFiled: February 17, 2010Date of Patent: June 11, 2013Assignee: International Business Machines CorporationInventors: Rajesh Ramkrishna Bordawekar, Ravishankar Rao
-
Patent number: 8458244Abstract: A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.Type: GrantFiled: August 15, 2012Date of Patent: June 4, 2013Assignee: International Business Machines CorporationInventors: Michael A. Blocksome, Daniel A. Faraj
-
Publication number: 20130138920Abstract: An apparatus for packet processing is provided. The apparatus is to be implemented in a server and includes: a preprocessor and at least two processors which are respectively connected with the preprocessor. The preprocessor is to classify packets received externally from the server, and to distribute the classified packets to the respective processors, wherein packets in a same flow are distributed to a same processor. Each of the processors is to receive and process a packet distributed by the preprocessor.Type: ApplicationFiled: August 11, 2011Publication date: May 30, 2013Applicant: Hangzhou H3C Technologies, Co., Ltd.Inventor: Changzhong Ge
-
Patent number: 8453152Abstract: A scheduler receives at least one flexible reservation request for scheduling in a computing environment comprising consumable resources. The flexible reservation request specifies a duration and at least one required resource. The consumable resources comprise at least one machine resource and at least one floating resource. The scheduler creates a flexible job for the at least one flexible reservation request and places the flexible job in a prioritized job queue for scheduling, wherein the flexible job is prioritizes relative to at least one regular job in the prioritized job queue. The scheduler adds a reservation set to a waiting state for the at least one flexible reservation request.Type: GrantFiled: February 1, 2011Date of Patent: May 28, 2013Assignee: International Business Machines CorporationInventors: Alexander Druyan, Wei Li, Kailash N. Marthi, Yun T. Xiang, Linda C. Cham
-
Patent number: 8448174Abstract: An information processing device which has a plurality of process units for performing various kinds of processes includes a detecting unit that detects a processing loads of the process units; a determining unit that determines whether a total amount of the processing loads detected by the detecting unit is equal to or larger than a specific value; a designating unit that designates a process unit having a process state to be controlled, based on the processing loads of the process units detected by the detecting unit, when the determining unit determines that the total amount is equal to or larger than the specific value; a process identifying unit that identifies a process having an execution state to be controlled among processes being performed by the process unit designated by the designating unit; and a control unit that controls the execution state of the process identified by the process identifying unit.Type: GrantFiled: January 22, 2010Date of Patent: May 21, 2013Assignee: Fujitsu LimitedInventors: Ryo Miyamoto, Ryuichi Matsukura, Takashi Ohno
-
Patent number: 8443175Abstract: A microprocessor integrated circuit includes first and second processors, an internal memory accessible by the first and second processors, and a bus interface unit configured to interface to a bus external to the microprocessor for providing access to a memory external to the microprocessor. The bus interface unit, external bus, and external memory are accessible by the second processor but are inaccessible by the first processor. The first processor writes debug information to the internal memory. The first processor detects an event and provides a notification of the event to the second processor. The second processor, coupled to the bus interface unit, executes microcode in response to the event notification received from the first processor. The microcode reads the debug information from the internal memory and writes the debug information to the external memory via the bus interface unit and external bus for use in debugging the second processor.Type: GrantFiled: March 29, 2010Date of Patent: May 14, 2013Assignee: VIA Technologies, Inc.Inventors: G. Glenn Henry, Jui-Shuan Chen
-
Publication number: 20130117533Abstract: A coprocessor has: a processing unit for processing tasks in a data-processing system subject to at least one master processor; at least one storage module having memory areas, assignable in each case to the tasks, for storing data assigned to the tasks; and a buffer area for buffering instructions assigned to the tasks, the instructions including processing instructions, and upon retrieval of the processing instructions from the buffer area, the data stored in the storage module being processed on the basis of the processing instructions.Type: ApplicationFiled: April 6, 2011Publication date: May 9, 2013Inventor: Jan Hayek
-
Patent number: 8438404Abstract: The disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs using program code embodied as a set of virtualized control threads. The arrangement also enables MPEs delegate functionality to one or more groups of SPEs such that those group(s) of SPEs will act as pseudo MPEs. The pseudo MPEs will utilize pseudo virtualized control threads to control the behavior of other groups of SPEs. In a typical embodiment, the apparatus includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements.Type: GrantFiled: September 30, 2008Date of Patent: May 7, 2013Assignee: International Business Machines CorporationInventors: Karl J. Duvalsaint, Harm P. Hofstee, Daeik Kim, Moon J. Kim
-
Publication number: 20130103927Abstract: A processor link that couples a first processor and a second processor is selected for validation and a plurality of communication parameter settings associated with the first and the second processors is identified. The first and the second processors are successively configured with each of the communication parameter settings. One or more test data pattern(s) are provided from the first processor to the second processor in accordance with the communication parameter setting. Performance measurements associated with the selected processor link and with the communication parameter setting are determined based, at least in part, on the test data pattern as received at the second processor. One of the communication parameter settings that is associated with the highest performance measurements is selected. The selected communication parameter setting is applied to the first and the second processors for subsequent communication between the first and the second processors via the processor link.Type: ApplicationFiled: October 25, 2011Publication date: April 25, 2013Applicant: International Business Machines CorporationInventors: Robert W. Berry, JR., Anand Haridass, Prasanna Jayaraman
-
Publication number: 20130103928Abstract: With the progress toward multi-core processors, each core is can not readily ascertain the status of the other dies with respect to an idle or active status. A proposal for utilizing an interface to transmit core status among multiple cores in a multi-die microprocessor is discussed. Consequently, this facilitates thermal management by allowing an optimal setting for setting performance and frequency based on utilizing each core status.Type: ApplicationFiled: December 11, 2012Publication date: April 25, 2013Inventors: Jose P. Allarey, Varghese George, Sanjeev Jahagirdar, Oren Lamdan, Nathan Ofer, Tomer Ziv
-
Publication number: 20130097407Abstract: A method, system, and computer program product for maintaining reliability in a computer system. In an example embodiment, the method includes managing workloads on a first processor with a first processor architecture by an agent process executing on a second processor with a second processor architecture. The method proceeds by activating redundant computation on the second processor by the agent process. The method continues by performing a same computation from a workload of the workloads at least twice. Finally, the method includes comparing results of the same computation. In this embodiment the first processor is coupled the second processor by a network, and the first processor architecture and second processor architecture are different architectures.Type: ApplicationFiled: December 8, 2012Publication date: April 18, 2013Applicant: International Business Machines CorporationInventor: International Business Machines Corporation
-
Publication number: 20130097406Abstract: In some embodiments, a computer cluster system comprises a plurality of nodes and a software package comprising a user interface and a kernel for interpreting program code instructions. In certain embodiments, a cluster node module is configured to communicate with the kernel and other cluster node modules. The cluster node module can accept instructions from the user interface and can interpret at least some of the instructions such that several cluster node modules in communication with one another and with a kernel can act as a computer cluster.Type: ApplicationFiled: March 16, 2012Publication date: April 18, 2013Inventors: Zvi Tannenbaum, Dean E. Dauger
-
Patent number: 8423749Abstract: A computer-implemented method, system and computer program product for controlling an algorithm that is performed on a unit of work in a subsequent software pipeline stage in a Network On a Chip (NOC) is presented. In one embodiment, the method executes a first operation in a first node of the NOC. The first node generates payload, and then loads that payload into a message. The message with the payload is transmitted to a nanokernel that controls a second node in the NOC. The nanokernel calls an algorithm that is needed by a second operation in a second node in the NOC, which uses the algorithm to execute the second operation.Type: GrantFiled: October 22, 2008Date of Patent: April 16, 2013Assignee: International Business Machines CorporationInventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Publication number: 20130091341Abstract: A computation system for computing interactions in a multiple-body simulation includes an array of processing modules arranged into one or more serially interconnected processing groups of the processing modules. Each of the processing modules includes storage for data elements and includes circuitry for performing pairwise computations between data elements each associated with a spatial location. Each of the pairwise computations makes use of a data element from the storage of the processing module and a data element passing through the serially interconnected processing modules. Each of the processing modules includes circuitry for selecting the pairs of data elements according to separations between spatial locations associated with the data elements.Type: ApplicationFiled: November 19, 2012Publication date: April 11, 2013Applicant: D.E. Shaw Research LLCInventors: David E. Shaw, Martin M. Deneroff, Ron O. Dror, Richard H. Larson, John K. Salmon
-
Patent number: 8417919Abstract: A method of dynamic parallelization in a multi-processor identifies potentially independent computational operations, such as functions and methods, with a serializer that assigns a computational operation to a serialization set and a processor based on assessment of the data that the computational operation will be accessing upon execution.Type: GrantFiled: August 18, 2009Date of Patent: April 9, 2013Assignee: Wisconsin Alumni Research FoundationInventors: Matthew Allen, Gurindar S. Sohi
-
Publication number: 20130086355Abstract: A method, an apparatus and an article of manufacture for generating a distributed data scalable adaptive map-reduce framework for at least one multi-core cluster. The method includes partitioning a cluster into at least one computational group, determining at least one key-group leader within each computational group, performing a local combine operation at each computational group, performing a global combine operation at each of the at least one key-group leader within each computational group based on a result from the local combine operation, and performing a global map-reduce operation across the at least one key-group leader within each computational group.Type: ApplicationFiled: September 30, 2011Publication date: April 4, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ankur Narang, Jyothish Soman