Operation Patents (Class 712/30)
  • Publication number: 20120144157
    Abstract: There is disclosed a system and method for allocation of mainframe computing resources using distributed computing. In particular, the present application is directed to a system whereby a mainframe process intended for execution on a metered processor may be identified as executable on a non-metered processor. Thereafter, the mainframe computer may initiate execution of the remote process on the remote non-metered processor. If necessary, high-speed access to data available to the metered processor is provided to the non-metered processor. The process operates directly on data available to the metered processor. Once completed, the process signals the mainframe computer that the process is complete. Both metered and non-metered processor configuration and management may be accomplished using the administrative interface.
    Type: Application
    Filed: December 6, 2010
    Publication date: June 7, 2012
    Inventors: James Reginald Crew, Pradeep Kumar Reddy Gundavarapu, Balaji Swaminathan, William Donald Pagdin, Lary Edward Klein
  • Publication number: 20120124333
    Abstract: The present invention concerns a new category of integrated circuitry and a new methodology for adaptive or reconfigurable computing. The preferred IC embodiment includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real-time to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.
    Type: Application
    Filed: January 19, 2012
    Publication date: May 17, 2012
    Applicant: QST Holdings LLC
    Inventors: Paul L. Master, Eugene Hogenauer, Walter James Scheuermann
  • Patent number: 8171259
    Abstract: A dynamic reconfigurable circuit includes multiple clusters each including a group of reconfigurable processing elements. The dynamic reconfigurable circuit is capable of dynamically changing a configuration of the clusters according to a context including a description of processing of the processing elements and of connection between the processing elements. A first cluster among the clusters includes a signal generating circuit that when an instruction to change the context is received, generates a report signal indicative of the instruction to change the context; a signal adding circuit that adds the report signal generated by the signal generating circuit to output data that is to be transmitted from the first cluster to a second cluster; and a data clearing circuit that, when output data to which a report signal generated by the second cluster is added is received, performs a clearing process of clearing the output data received.
    Type: Grant
    Filed: February 27, 2009
    Date of Patent: May 1, 2012
    Assignee: Fujitsu Semiconductor Limited
    Inventors: Takashi Hanai, Shinichi Sutou
  • Publication number: 20120102299
    Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.
    Type: Application
    Filed: December 30, 2011
    Publication date: April 26, 2012
    Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
  • Patent number: 8161268
    Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
    Type: Grant
    Filed: May 21, 2008
    Date of Patent: April 17, 2012
    Assignee: International Business Machines Corporation
    Inventor: Ahmad Faraj
  • Patent number: 8161307
    Abstract: Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
    Type: Grant
    Filed: October 20, 2011
    Date of Patent: April 17, 2012
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Michael A. Blocksome, Amanda E. Peters, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20120089813
    Abstract: Provided are a computing apparatus based on a reconfigurable architecture and a memory dependence correction method thereof. In one general aspect, a computing apparatus has a reconfigurable architecture. The computing apparatus may include: a reconfiguration unit having processing elements configured to reconfigure data paths between one or more of the processing elements; a compiler configured to analyze instructions to generate reconfiguration information for reconfiguring one or more of the reconfigurable data paths; a configuration memory configured to store the reconfiguration information; and a processor configured to execute the instructions through the reconfiguration unit, and to correct at least one memory dependency among the processing elements.
    Type: Application
    Filed: July 7, 2011
    Publication date: April 12, 2012
    Inventors: Tai-Song Jin, Dong-Hoon Yoo, Bernhard Egger
  • Publication number: 20120089814
    Abstract: In a multiprocessor system, a primary processor may store an executable image for a secondary processor. A communication protocol assists the transfer of an image header and data segment(s) of the executable image from the primary processor to the secondary processor. Messages between the primary processor and secondary processor indicate successful receipt of transferred data, termination of a transfer process, and acknowledgement of same.
    Type: Application
    Filed: December 5, 2011
    Publication date: April 12, 2012
    Applicant: QUALCOMM INCORPORATED
    Inventors: Nitin Gupta, Daniel H. Kim, Igor Malamant, Steve Haehnichen
  • Patent number: 8151245
    Abstract: A distributed processing system is described that employs “application-based” specialization. In particular, the distributed processing system is constructed as a collection of computing nodes in which each computing node performs a particular processing role within the operation of the overall distributed processing system. Each of the computing nodes includes an operating system, such as the Linux operating system, and includes a plug-in software module to provide a distributed memory operating system that employs the role-based computing techniques. An administration node maintains a database that defines a plurality of application roles. Each role is associated with a software application, and specifies a set of software components necessary for execution of the software application. The administration node deploys the software components to the application nodes in accordance with the application roles associates with each of the application nodes.
    Type: Grant
    Filed: December 16, 2005
    Date of Patent: April 3, 2012
    Assignee: Computer Associates Think, Inc.
    Inventors: Steven M. Oberlin, David W. McAllister
  • Publication number: 20120079236
    Abstract: A processor comprises a plurality of processor units arranged to operate concurrently and in cooperation with one another, and control logic configured to direct the operation of the processor units. At least a given one of the processor units comprises a memory, an arithmetic engine and a switch fabric. The switch fabric provides controllable connectivity between the memory, the arithmetic engine and input and output ports of the given processor unit, and has control inputs driven by corresponding outputs of the control logic. In an illustrative embodiment, the processor units may be configured to perform computations associated with a key equation solver in a Reed-Solomon (RS) decoder or other type of forward error correction (FEC) decoder.
    Type: Application
    Filed: September 29, 2011
    Publication date: March 29, 2012
    Inventors: Dusan Suvakovic, Adriaan J. de Lind van Wijngaarden, Man Fai Lau
  • Publication number: 20120079235
    Abstract: Methods and apparatus to schedule applications in heterogeneous multiprocessor computing platforms are described. In one embodiment, information regarding performance (e.g., execution performance and/or power consumption performance) of a plurality of processor cores of a processor is stored (and tracked) in counters and/or tables. Logic in the processor determines which processor core should execute an application based on the stored information. Other embodiments are also claimed and disclosed.
    Type: Application
    Filed: September 25, 2010
    Publication date: March 29, 2012
    Inventors: Ravishankar Iyer, Sadagopan Srinivasan, Li Zhao, Rameshkumar G. Illikkal
  • Patent number: 8140828
    Abstract: There is disclosed a method and apparatus for handling transaction buffer overflow in a multi-processor system as well as a transaction memory system in a multi-processor system. The method comprises the steps of: when overflow occurs in a transaction buffer of one processor, disabling peer processors from entering transactions, and waiting for any processor having a current transaction to complete its current transaction; re-executing the transaction resulting in the transaction buffer overflow without using the transaction buffer; and when the transaction execution is completed, enabling the peer processors for entering transactions.
    Type: Grant
    Filed: December 1, 2008
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Xiaowei Shen, Hua Yong Wang, Kun Wang
  • Publication number: 20120060007
    Abstract: A method and apparatus for controlling traffic of multiprocessor system or multi-core system is provided. The traffic control apparatus of a multiprocessor system according to the present invention includes a request handler for processing a traffic request of a first processor, and a Quality of Service (QoS) manager for receiving a QoS guaranty start instruction for a second processor from the multiprocessor system, and for transmitting, when traffic of the second processor is detected, a traffic adjustment signal to the request handler. The request handler adjusts the traffic of the first processor according to the received traffic adjustment signal. The traffic control method and apparatus of the present invention is capable of adjusting the required bandwidths of individual technologies and guaranteeing the real-timeness in the multiprocessor system or multi-core system.
    Type: Application
    Filed: September 2, 2011
    Publication date: March 8, 2012
    Applicant: SAMSUNG ELECTRONICS CO. LTD.
    Inventors: Min Seung BAIK, Joong Baik KIM, Seung Wook LEE, Soon Wan KWON
  • Publication number: 20120047350
    Abstract: A processing apparatus for processing source code comprising a plurality of single line instructions to implement a desired processing function is described.
    Type: Application
    Filed: May 4, 2010
    Publication date: February 23, 2012
    Inventors: John Lancaster, Martin Whitaker
  • Publication number: 20120042150
    Abstract: A multiprocessor system includes a main memory and multiple processing cores that are configured to execute software that uses data stored in the main memory. In some embodiments, the multiprocessor system includes a data streaming unit, which is connected between the processing cores and the main memory and is configured to pre-fetch the data from the main memory for use by the multiple processing cores. In some embodiments, the multiprocessor system includes a scratch-pad processing unit, which is connected to the processing cores and is configured to execute, on behalf of the multiple processing cores, a selected part of the software that causes two or more of the processing cores to access concurrently a given item of data.
    Type: Application
    Filed: March 29, 2011
    Publication date: February 16, 2012
    Applicant: PRIMESENSE LTD.
    Inventor: Idan Saar
  • Publication number: 20120023309
    Abstract: Techniques for achieving high-availability using a single processor (CPU). In a system comprising a multi-core processor, at least two partitions may be configured with each partition being allocated one or more cores of the multiple cores. The partitions may be configured such that one partition operates in active mode while another partition operates in standby mode. In this manner, a single processor is able to provide active-standby functionality, thereby enhancing the availability of the system comprising the processor.
    Type: Application
    Filed: July 23, 2010
    Publication date: January 26, 2012
    Applicant: Brocade Communications Systems, Inc.
    Inventors: Vineet M. Abraham, Bill Ying Chin, William R. Mahoney, Aditya Saxena, Xupei Liang, Bill Jianqiang Zhou
  • Patent number: 8103856
    Abstract: In a processor having multiple clusters which operate in parallel, the number of clusters in use can be varied dynamically. At the start of each program phase, the configuration option for an interval is run to determine the optimal configuration, which is used until the next phase change is detected. The optimum instruction interval is determined by starting with a minimum interval and doubling it until a low stability factor is reached.
    Type: Grant
    Filed: January 12, 2009
    Date of Patent: January 24, 2012
    Assignee: University of Rochester
    Inventors: Rajeev Balasubramonian, Sandhya Dwarkadas, David Albonesi
  • Publication number: 20120017062
    Abstract: Methods are disclosed for improving data processing performance in a processor using on-chip local memory in multiple processing units. According to an embodiment, a method of processing data elements in a processor using a plurality of processing units, includes: launching, in each of the processing units, a first wavefront having a first type of thread followed by a second wavefront having a second type of thread, where the first wavefront reads as input a portion of the data elements from an off-chip shared memory and generates a first output; writing the first output to an on-chip local memory of the respective processing unit; and writing to the on-chip local memory a second output generated by the second wavefront, where input to the second wavefront comprises a first plurality of data elements from the first output. Corresponding system and computer program product embodiments are also disclosed.
    Type: Application
    Filed: July 19, 2011
    Publication date: January 19, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Vineet GOEL, Todd Martin, Mangesh Nijasure
  • Patent number: 8095811
    Abstract: Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
    Type: Grant
    Filed: May 29, 2008
    Date of Patent: January 10, 2012
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Michael A. Blocksome, Amanda A. Peters, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 8095759
    Abstract: A multiprocessor computer system comprises a plurality of processors and a plurality of nodes, each node comprising one or more processors. A local memory in each of the plurality of nodes is coupled to the processors in each node, and a hardware firewall comprising a part of one or more of the nodes is operable to prevent a write from an unauthorized processor from writing to the local memory.
    Type: Grant
    Filed: May 29, 2009
    Date of Patent: January 10, 2012
    Assignee: Cray Inc.
    Inventors: Dennis C. Abts, Steven L. Scott, Aaron F. Godfrey
  • Publication number: 20110320769
    Abstract: A computing section is provided with a plurality of computing units and correlatively stores entries of configuration information that describes configurations of the plurality of computing units with physical configuration numbers that represent the entries of configuration information and executes a computation in a configuration corresponding to a designated physical configuration number. A status management section designates a physical configuration number corresponding to a status to which the computing section needs to advance the next time for the computing section and outputs the status to which the computing section needs to advance the next time as a logical status number that uniquely identifies the status to which the computing section needs to advance the next time in an object code.
    Type: Application
    Filed: December 25, 2009
    Publication date: December 29, 2011
    Inventors: Takeshi Inuo, Kengo Nishino, Nobuki Kajihara
  • Publication number: 20110320767
    Abstract: Methods, systems, and media are provided for a dynamic batch strategy utilized in parallelization of online learning algorithms. The dynamic batch strategy provides a merge function on the basis of a threshold level difference between the original model state and an updated model state, rather than according to a constant or pre-determined batch size. The merging includes reading a batch of incoming streaming data, retrieving any missing model beliefs from partner processors, and training on the batch of incoming streaming data. The steps of reading, retrieving, and training are repeated until the measured difference in states exceeds a set threshold level. The measured differences which exceed the threshold level are merged for each of the plurality of processors according to attributes. The merged differences which exceed the threshold level are combined with the original partial model states to obtain an updated global model state.
    Type: Application
    Filed: June 24, 2010
    Publication date: December 29, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Taha Bekir Eren, Oleg Isakov, Weizhu Chen, Jeffrey Scott Dunn, Thomas Ivan Borchert, Joaquin Quinonero Candela, Thore Kurt Hartwig Graepel, Ralf Herbrich
  • Publication number: 20110320768
    Abstract: There is provided a method of, and apparatus for, processing a computation on a computing device comprising at least one processor and a memory, the method comprising: storing, in said memory, plural copies of a set of data, each copy of said set of data having a different compression ratio and/or compression scheme; selecting a copy of said set of data; and performing, on a processor, a computation using said selected copy of said set of data. By providing such a method, different compression ratios and/or compression schemes can be selected as appropriate. For example, if high precision is required in a computation, a copy of the set of data can be chosen which has a low compression ratio at the expense of processing time and memory transfer time. In the alternative, if low precision is acceptable, then the speed benefits of a high compression ratio and/or lossy compression scheme may be utilised.
    Type: Application
    Filed: June 25, 2010
    Publication date: December 29, 2011
    Applicant: MAXELER TECHNOLOGIES, LTD.
    Inventors: Oliver Pell, Stephen Girdlestone
  • Patent number: 8086828
    Abstract: Heterogeneous processors can cooperate for distributed processing tasks in a multiprocessor computing system. Each processor is operable in a “compatible” mode, in which all processors within a family accept the same baseline command set and produce identical results upon executing any command in the baseline command set. The processors also have a “native” mode of operation in which the command set and/or results may differ in at least some respects from the baseline command set and results. Heterogeneous processors with a compatible mode defined by reference to the same baseline can be used cooperatively for distributed processing by configuring each processor to operate in the compatible mode.
    Type: Grant
    Filed: March 25, 2009
    Date of Patent: December 27, 2011
    Assignee: NVIDIA Corporation
    Inventors: Henry Packard Moreton, Abraham B.de Waal
  • Publication number: 20110314257
    Abstract: A wireless communication system hosts a plurality of processes in accordance with a communication protocol. The system includes application specific instruction set processors (ASISPs) that provided computation support for the process. Each ASISP is capable of executing a subset of the functions of a communication protocol. A scheduler is used to schedule the ASISPs in a time-sliced algorithm so that each ASISP supports several processes. In this architecture, the ASISP actively performs computations for one of the supported processes (active process) at any given time. The state information of each process supported by a particular ASISP is stored in a memory bank that is uniquely associated with the ASISP. When a scheduler instructs an ASISP to change which process is the active process, the state information for the inactivated process is stored in the memory bank and the state information for the newly activated process is retrieved from the memory bank.
    Type: Application
    Filed: July 29, 2011
    Publication date: December 22, 2011
    Inventors: Song CHEN, Paul L. CHOU, Christopher C. WOODTHORPE, Venugopal BALASUBRAMONIAN, Keith RIEKEN
  • Publication number: 20110314256
    Abstract: Described herein are techniques for enabling a programmer to express a call for a data parallel call-site function in a way that is accessible and usable to the typical programmer. With some of the described techniques, an executable program is generated based upon expressions of those data parallel tasks. During execution of the executable program, data is exchanged between non-data parallel (non-DP) capable hardware and DP capable hardware for the invocation of data parallel functions.
    Type: Application
    Filed: June 18, 2010
    Publication date: December 22, 2011
    Applicant: Microsoft Corporation
    Inventors: Charles David Callahan, II, Paul F. Ringseth, Yosseff Levanoni, Weirong Zhu, Lingli Zhang
  • Publication number: 20110314255
    Abstract: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.
    Type: Application
    Filed: June 17, 2010
    Publication date: December 22, 2011
    Inventors: Tushar Krishna, Bradford M. Beckmann, Steven K. Reinhardt
  • Patent number: 8081181
    Abstract: The architecture implements A-buffer in hardware by extending hardware to efficiently store a variable amount of data for each pixel. In operation, a prepass is performed to generate the counts of the fragments per pixel in a count buffer, followed by a prefix sum pass on the generated count buffer to calculate locations in a fragment buffer in which to store all the fragments linearly. An index is generated for a given pixel in the prefix sum pass and stored in a location buffer. Access to the pixel fragments is then accomplished using the index. Linear storage of the data allows for a fast rendering pass that stores all the fragments to a memory buffer without needing to look at the contents of the fragments. This is then followed by a resolve pass on the fragment buffer to generate the final image.
    Type: Grant
    Filed: June 20, 2007
    Date of Patent: December 20, 2011
    Assignee: Microsoft Corporation
    Inventor: Craig Peeper
  • Patent number: 8074054
    Abstract: A processing system includes a group of processing units (“PUs”) arranged in a daisy chain configuration or a sequence capable of parallel processing. The processing system, in one embodiment, includes PUs, a demultiplexer (“demux”), and a multiplexer (“mux”). The PUs are connected or linked in a sequence or a daisy chain configuration wherein a first PU is located at the beginning of the sequence and a last digital PU is located at the end of the sequence. Each PU is configured to read an input data packet from a packet stream during a designated reading time frame. If the time frame is outside of the designated reading time frame, a PU allows a packet stream to pass through. The demux forwards a packet stream to the first digital processing unit. The mux receives a packet steam from the last digital processing unit.
    Type: Grant
    Filed: December 12, 2007
    Date of Patent: December 6, 2011
    Assignee: Tellabs San Jose, Inc.
    Inventors: Venkata Rangavajjhala, Naveen K. Jain
  • Publication number: 20110296139
    Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20110296137
    Abstract: A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 8065503
    Abstract: Methods, systems and computer programs for distributing a computing operation among a plurality of processes and for gathering results of the computing operation from the plurality of processes are described.
    Type: Grant
    Filed: December 15, 2006
    Date of Patent: November 22, 2011
    Assignee: International Business Machines Corporation
    Inventor: Bin Jia
  • Publication number: 20110283089
    Abstract: A method and system of modularized design for a microprocessor are disclosed. Embodiments disclose modularization techniques, whereby the overall design of the execution unit of the processor is split into different functional modules. The modules are configured to function independent of each other. The microprocessor comprises different components such as a cache logic (201), a clock generation unit (202), a dispatcher (203), a special asynchronous interface (204), an interrupt unit (205), a register file (206) and a multiplexer unit (207). Temporary storage of data in the register files is eliminated, and thus data fetch latency is eliminated. The asynchronous transfer triggered execution architecture increases speed of execution.
    Type: Application
    Filed: August 10, 2009
    Publication date: November 17, 2011
    Inventor: Harshal Ingale
  • Publication number: 20110283086
    Abstract: A circuit arrangement, program product and method stream level of detail components between hardware threads in a multithreaded circuit arrangement to perform physics collision detection. Typically, a master hardware thread, e.g., a component loader hardware thread, is used to retrieve level of detail data for an object from a memory and stream the data to one or more slave hardware threads, e.g., collision detection hardware threads, to perform the actual collision detection. Because the slave hardware threads receive the level of detail data from the master thread, typically the slave hardware threads are not required to load the data from the memory, thereby reducing memory bandwidth requirements and accelerating performance.
    Type: Application
    Filed: May 12, 2010
    Publication date: November 17, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Eric Oliver Mejdrich, Paul Emery Schardt, Robert Allen Shearer
  • Publication number: 20110283087
    Abstract: A first processing unit is implemented by executing a first application program by using an internal computer in an environment where a first operating system is operating. The first processing unit performs a first process or an external service call in accordance with instruction information describing a process to be executed. A second processing unit is implemented by executing a second application program by using the internal computer or an additional computer connected to the internal computer in an environment where a second operating system is operating. The second processing unit performs a second process when instructed by an external service call to execute the second process. When the instruction information includes information specifying the second process as the process to be executed, a transfer unit updates the information included in the instruction information, and transfers the updated instruction information to the first processing unit.
    Type: Application
    Filed: October 29, 2010
    Publication date: November 17, 2011
    Applicant: FUJI XEROX CO., LTD.
    Inventors: Tsuyoshi WATANABE, Yoshiaki TEZUKA, Kunihiko KOBAYASHI, Tomomichi ADEGAWA
  • Publication number: 20110283059
    Abstract: Various embodiments are disclosed for accelerating computations using field programmable gate arrays (FPGA). Various tree traversal techniques, architectures, and hardware implementations are disclosed. Various disclosed embodiments comprise hybrid architectures comprising a central processing unit (CPU), a graphics processor unit (GPU), a field programmable gate array (FPGA), and variations or combinations thereof, to implement raytracing techniques. Additional disclosed embodiments comprise depth-breadth search tree tracing techniques, blocking tree branch traversal techniques to avoid data explosion, compact data structure representations for ray and node representations, and multiplexed processing of multiple rays in a programming element (PE) to leverage pipeline bubble.
    Type: Application
    Filed: May 10, 2011
    Publication date: November 17, 2011
    Applicant: Progeniq Pte Ltd
    Inventors: Sundar Govindarajan, Vinod Ranganathan Iyer, Darran Nathan
  • Publication number: 20110283088
    Abstract: A data processing apparatus includes a connecting unit that distributes the plurality of processing modules over the stages, and connects the plurality of processing modules such that a plurality of partial data are processed in parallel. The data processing apparatus detects, with respect to at least a part of the stages, a ratio of an amount of data for which processing in the subsequent stage has been executed, as a passage rate, acquires a processing time for a data amount to be processed in each stage, for which the passage rate was detected, based on the passage rate, and determines the number of processing modules distributed to each stage based on the data amount.
    Type: Application
    Filed: May 6, 2011
    Publication date: November 17, 2011
    Applicant: CANON KABUSHIKI KAISHA
    Inventors: Ryoko Natori, Shinji Shiraga
  • Publication number: 20110264889
    Abstract: Systems, methods, and an article of manufacture for the reduction in process load experienced by a primary processor when executing an application by dynamically reassigning portions of the application to one or more secondary processors are shown and described. A second processing unit is queried for one or more characteristics. One or more performance characteristics of the second processor are measured. A portion of the application can be reassigned to the second processing unit based on the queried characteristics and performance measurements.
    Type: Application
    Filed: April 21, 2010
    Publication date: October 27, 2011
    Applicant: MIRICS SEMICONDUCTOR LIMITED
    Inventor: Christopher Stolarik
  • Publication number: 20110258245
    Abstract: A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
    Type: Application
    Filed: April 14, 2010
    Publication date: October 20, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael A. Blocksome, Daniel A. Faraj
  • Publication number: 20110252219
    Abstract: According to an aspect of the present invention, there is provided an information processing apparatus including: a first processor; a second processor that has an information processing capability and a power consumption higher than those of the first processor; a temperature monitoring module configured to acquire an operating temperature of the second processor; a throttle number determination module configured to determine whether the throttling control is performed a given number of times or more within a given time interval; and a processor switching control module configured to perform, when the operating temperature of the second processor is equal to or higher than a given temperature: stopping an operation of the second processor; causing the first processor to perform an information process; and prohibiting the operation of the second processor.
    Type: Application
    Filed: June 20, 2011
    Publication date: October 13, 2011
    Inventor: Hajime Sonobe
  • Publication number: 20110246995
    Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.
    Type: Application
    Filed: April 5, 2010
    Publication date: October 6, 2011
    Applicant: ORACLE INTERNATIONAL CORPORATION
    Inventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
  • Publication number: 20110246748
    Abstract: Illustrated is a system and method that includes a processor and service processor co-located on a common socket, the service processor to aggregate data from a distributed network of additional service processors and processors both of which are co-located on an additional common socket. The system and method also includes a first sensor to record the data from the processor. The system and method also includes a second sensor to record the data from a software stack. The system and method further includes a registry to store the data.
    Type: Application
    Filed: April 6, 2010
    Publication date: October 6, 2011
    Inventors: Vanish Talwar, Jeffrey R. Hilland, Vidhya Kannan, Sandeep KS, Prashanth V
  • Publication number: 20110238951
    Abstract: An image forming apparatus includes: plural processing units which execute plural processing functions that are different from each other; an execution-in-progress information acquiring unit which acquires execution-in-progress function information that is information about a first processing unit which is executing processing, of the plural processing units; a discrimination unit which discriminates a second processing unit that cannot execute processing when the first processing unit indicated by the execution-in-progress function information acquired by the execution-in-progress information acquiring unit is executing processing, from among the plural processing units; and an executability information generating unit which generates inexecutable function information that is information about the second processing unit, based on a result of determination by the discrimination unit.
    Type: Application
    Filed: March 23, 2011
    Publication date: September 29, 2011
    Applicants: KABUSHIKI KAISHA TOSHIBA, TOSHIBA TEC KABUSHIKI KAISHA
    Inventor: Kanako Asari
  • Publication number: 20110238949
    Abstract: Distributed administration of a lock for an operational group of compute nodes in a hierarchical tree structured network including assigning the root node of the operational group to send acknowledgments for lock requests, the root lock administration module comprising a module of automated computing machinery; receiving a lock request assigned to a particular node from a child node; determining whether another request from another child is directly ahead in an acknowledgement queue; if a request from another child is directly ahead in the acknowledgement queue, putting the lock request for the particular node in the acknowledgement queue until the lock request directly ahead in the acknowledgement queue is satisfied and when the lock request ahead in the queue is satisfied, sending the particular node for whom the lock request is assigned a message acknowledging the particular node has the lock; and if a request from another child is not directly ahead in a queue, sending to the particular node for whom the
    Type: Application
    Filed: March 29, 2010
    Publication date: September 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Publication number: 20110238950
    Abstract: Performing a scattery operation on a hierarchical tree network optimized for collective operations including receiving, by the scattery module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scattery module installed on the node, the portion of the data for the node; determining, by the scattery module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; and sending, by the scattery module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child.
    Type: Application
    Filed: March 29, 2010
    Publication date: September 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 8019978
    Abstract: A unit status reporting protocol may also be used for context switching, debugging, and removing deadlock conditions in a processing unit. A processing unit is in one of five states: empty, active, stalled, quiescent, and halted. The state that a processing unit is in is reported to a front end monitoring unit to enable the front end monitoring unit to determine when a context switch may be performed or when a deadlock condition exists. The front end monitoring unit can issue a halt command to perform a context switch or take action to remove a deadlock condition and allow processing to resume.
    Type: Grant
    Filed: August 13, 2007
    Date of Patent: September 13, 2011
    Assignee: NVIDIA Corporation
    Inventors: Michael C. Shebanow, Robert C. Keller, Richard A. Silkebakken
  • Publication number: 20110219211
    Abstract: A CPU core unlocking device applied to a computer system is provided. The core unlocking device includes a CPU having a plurality of signal terminals and a core unlocking executing unit having a plurality of GPIO ports connected with the corresponding signal terminals of the CPU. The GPIO ports of the core unlocking executing unit generate and transmit and transmit a combination of core unlocking signal to the signal terminals of the CPU to unlock the CPU core.
    Type: Application
    Filed: March 3, 2011
    Publication date: September 8, 2011
    Applicant: ASUSTeK COMPUTER INC.
    Inventors: Pei-Hua Sun, Pai-Ching Huang, Yi-Min Huang, Meng-Hsiung Lee, Nan-Kun Lo
  • Publication number: 20110213934
    Abstract: A data processing apparatus and method are provided for switching performance of a workload between two processing circuits. The data processing apparatus has first processing circuitry which is architecturally compatible with second processing circuitry, but with the first processing circuitry being micro-architecturally different from the second processing circuitry. At any point in time, a workload consisting of at least one application and at least one operating system for running that application is performed by one of the first processing circuitry and the second processing circuitry. A switch controller is responsive to a transfer stimulus to perform a handover operation to transfer performance of the workload from source processing circuitry to destination processing circuitry, with the source processing circuitry being one of the first and second processing circuitry and the destination processing circuitry being the other of the first and second processing circuitry.
    Type: Application
    Filed: March 1, 2010
    Publication date: September 1, 2011
    Applicant: ARM Limited
    Inventors: Peter Richard Greenhalgh, Richard Roy Grisenthwaite
  • Publication number: 20110213950
    Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.
    Type: Application
    Filed: May 25, 2010
    Publication date: September 1, 2011
    Inventors: John George Mathieson, Phil Carmack, Brian Smith
  • Publication number: 20110213935
    Abstract: A data processing apparatus and method are provided for switching performance of a workload between two processing circuits. The data processing apparatus has first processing circuitry which is architecturally compatible with second processing circuitry, but with the first processing circuitry being micro-architecturally different from the second processing circuitry. At any point in time, a workload consisting of at least one application and at least one operating system for running that application is performed by one of the first processing circuitry and the second processing circuitry. A switch controller is responsive to a transfer stimulus to perform a handover operation to transfer performance of the workload from source processing circuitry to destination processing circuitry, with the source processing circuitry being one of the first and second processing circuitry and the destination processing circuitry being the other of the first and second processing circuitry.
    Type: Application
    Filed: March 1, 2010
    Publication date: September 1, 2011
    Applicant: ARM Limited
    Inventors: Peter Richard Greenhalgh, Richard Roy Grisenthwaite