Processing Control For Data Transfer Patents (Class 712/225)
  • Patent number: 9588767
    Abstract: An aspect includes receiving a write request that includes a memory address and write data. Stored data is read from a memory location at the memory address. Based on determining that the memory location was not previously modified, the stored data is compared to the write data. Based on the stored data matching the write data, the write request is completed without writing the write data to the memory and a corresponding silent store bit, in a silent store bitmap is set. Based on the stored data not matching the write data, the write data is written to the memory location, the silent store bit is reset and a corresponding modified bit is set. At least one of an application and an operating system is provided access to the silent store bitmap.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: March 7, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pradip Bose, Chen-Yong Cher, Ravi Nair
  • Patent number: 9589314
    Abstract: Systems, methods, and apparatus for performing queries in a graphics processing system are disclosed. These systems, methods, and apparatus may be configured to read a running counter at the start of the query to determine a start value, wherein the running counter counts discrete graphical entities, read the running counter at the end of the query to determine an end value, and subtract the start value from the end value to determine a result.
    Type: Grant
    Filed: August 29, 2013
    Date of Patent: March 7, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Avinash Seetharamaiah, Hitendra Mohan Gangani, Nigel Terence Poole
  • Patent number: 9588768
    Abstract: An aspect includes receiving a write request that includes a memory address and write data. Stored data is read from a memory location at the memory address. Based on determining that the memory location was not previously modified, the stored data is compared to the write data. Based on the stored data matching the write data, the write request is completed without writing the write data to the memory and a corresponding silent store bit, in a silent store bitmap is set. Based on the stored data not matching the write data, the write data is written to the memory location, the silent store bit is reset and a corresponding modified bit is set. At least one of an application and an operating system is provided access to the silent store bitmap.
    Type: Grant
    Filed: November 23, 2015
    Date of Patent: March 7, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pradip Bose, Chen-Yong Cher, Ravi Nair
  • Patent number: 9552207
    Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.
    Type: Grant
    Filed: February 2, 2016
    Date of Patent: January 24, 2017
    Assignee: Intel Corporation
    Inventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
  • Patent number: 9530176
    Abstract: An image processing apparatus and image processing method thereof are disclosed. The image processing apparatus includes a first image processor which includes a memory, performs a first signal processing on image data, and stores the first signal processed image data in the memory; a second image processor which directly accesses the memory and receives the stored image data, and performs a second signal processing on the received image data; and an image outputter which outputs the image data on which the second signal processing has been performed. Accordingly, only actual image data area is received, reducing data transmission volume, reducing signal transmission lines, securing CPU resources, and improving timing errors. In addition, it is possible to remove an image transmitter such as an additional low voltage differential signaling (LVDS) block, making the image processing apparatus thinner and smaller.
    Type: Grant
    Filed: February 15, 2013
    Date of Patent: December 27, 2016
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ji-won Kim, Young-hun Choi
  • Patent number: 9471310
    Abstract: A method, computer program product, and system are provided for multi-input bitwise logical operations. The method includes the steps of receiving a multi-input bitwise logical operation instruction that specifies two or more input operands and a function operand, where a first input operand of the two or more input operands comprises a number of bits, each bit having a corresponding bit in each of the additional input operands in the two or more input operands. The function operand is written to a lookup table. Then, the lookup table is accessed for each set of corresponding input operand bits in the two or more input operands to generate an output for the multi-input bitwise logical operation instruction.
    Type: Grant
    Filed: November 26, 2012
    Date of Patent: October 18, 2016
    Assignee: NVIDIA Corporation
    Inventor: Alexey Yuryevich Panteleev
  • Patent number: 9448798
    Abstract: An aspect includes receiving a write request that includes a memory address and write data. Stored data is read from a memory location at the memory address. Based on determining that the memory location was not previously modified, the stored data is compared to the write data. Based on the stored data matching the write data, the write request is completed without writing the write data to the memory and a corresponding silent store bit, in a silent store bitmap is set. Based on the stored data not matching the write data, the write data is written to the memory location, the silent store bit is reset and a corresponding modified bit is set. At least one of an application and an operating system is provided access to the silent store bitmap.
    Type: Grant
    Filed: March 31, 2016
    Date of Patent: September 20, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pradip Bose, Chen-Yong Cher, Ravi Nair
  • Patent number: 9424032
    Abstract: Disclosed is a list vector processing apparatus (LVPA) or the like which can process the indirect reference at a high speed. The LVPA includes: a gather processing unit processing a first gather instruction to store a value of a storage area accessed by only a self information processing apparatus (SelfIPA) in a plurality of information processing apparatuses according to a list vector storing an address representing a storage area read from a storage apparatus into a register, and a process of generating reference access information indicating whether being a storage area accessed by both of the SelfIPA and another information processing apparatus; a communication unit for related information; an access information operating unit to calculate an area accessed by the information processing apparatus; and a scatter processing unit processing a first scatter instruction to store a value stored in the register into the storage area accessed by only the SelfIPA.
    Type: Grant
    Filed: February 24, 2014
    Date of Patent: August 23, 2016
    Assignee: NEC Corporation
    Inventor: Satoru Tagaya
  • Patent number: 9424046
    Abstract: Systems and methods for load canceling in a processor that is connected to an external interconnect fabric are disclosed. As a part of a method for load canceling in a processor that is connected to an external bus, and responsive to a flush request and a corresponding cancellation of pending speculative loads from a load queue, a type of one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor, is converted from load to prefetch. Data corresponding to one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor is accessed and returned to cache as prefetch data. The prefetch data is retired in a cache location of the processor.
    Type: Grant
    Filed: October 11, 2012
    Date of Patent: August 23, 2016
    Assignee: SOFT MACHINES INC.
    Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah
  • Patent number: 9384046
    Abstract: An information processing apparatus includes a computer configured to set respectively a storage location for each value of a common variable among threads of a thread group having write requests to write the values of the common variable of the threads in a given process, from a specific storage location defined in the write requests, to the storage locations respectively set for the threads; store, for each thread of the thread group, a value of the common variable to the storage location set for the thread; and read out in order of execution of the threads of the thread group defined in the given process and when all the threads in the thread group have ended, each value of the common variable stored at the first storing, and in the order of execution, overwrite a value in the specific storage location with each read value of the common variable.
    Type: Grant
    Filed: April 4, 2013
    Date of Patent: July 5, 2016
    Assignee: FUJITSU LIMITED
    Inventors: Koji Kurihara, Koichiro Yamashita, Hiromasa Yamauchi, Takahisa Suzuki
  • Patent number: 9383982
    Abstract: Data-parallel computation programs may be improved by, for example, determining the functional properties user defined functions (UDFs), eliminating unnecessary data-shuffling stages, and/or changing data-partition properties to cause desired data properties to appear after one or more user defined functions are applied.
    Type: Grant
    Filed: September 12, 2012
    Date of Patent: July 5, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jiaxing Zhang, Hucheng Zhou, Zhenyu Guo, Haoxiang Lin, Lidong Zhou
  • Patent number: 9378018
    Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path with of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
    Type: Grant
    Filed: August 17, 2015
    Date of Patent: June 28, 2016
    Assignee: MicroUnity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 9378019
    Abstract: A microprocessor instruction translator translates a conditional load instruction into at least two microinstructions. An out-of-order execution pipeline executes the microinstructions. To execute a first microinstruction, an execution unit receives source operands from the source registers of a register file and responsively generates a first result using the source operands. To execute a second the microinstruction, an execution unit receives a previous value of the destination register and the first result and responsively reads data from a memory location specified by the first result and provides a second result that is the data if a condition is satisfied and that is the previous destination register value if not. The previous value of the destination register comprises a result produced by execution of a microinstruction that is the most recent in-order previous writer of the destination register with respect to the second microinstruction.
    Type: Grant
    Filed: April 6, 2012
    Date of Patent: June 28, 2016
    Assignee: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker, Gerard M. Col, Colin Eddy
  • Patent number: 9361113
    Abstract: A method for reducing a pipeline stall in a multi-pipelined processor includes finding a store instruction having a same target address as a load instruction and having a store value of the store instruction not yet written according to the store instruction, when the store instruction is being concurrently processed in a different pipeline than the load instruction and the store instruction occurs before the load instruction in a program order. The method also includes associating a target rename register of the load instruction as well as the load instruction with the store instruction, responsive to the finding step. The method further includes writing the store value of the store instruction to the target rename register of the load instruction and finishing the load instruction without reissuing the load instruction, responsive to writing the store value of the store instruction according to the store instruction to finish the store instruction.
    Type: Grant
    Filed: April 24, 2013
    Date of Patent: June 7, 2016
    Assignee: GLOBALFOUNDRIES INC.
    Inventor: Takeshi Ogasawara
  • Patent number: 9311247
    Abstract: A method for detecting patterns of memory accesses in a computing system with out-of-order program execution is provided. The method comprises identifying a first memory operation instruction that is part of a memory stream that would benefit from memory prefetches, marking with program order a plurality of other memory operation instructions prior to execution that are part of the same memory stream as the first memory operation instruction while the plurality of other memory operation instructions are in program order, and, subsequent to out of program order execution of at least two of the plurality of marked memory operation instructions but before execution of all of the plurality of marked memory operation instructions, determining an expected offset value between memory addresses to be accessed by consecutively marked memory operation instructions using the marked memory operation instructions that have executed.
    Type: Grant
    Filed: March 19, 2013
    Date of Patent: April 12, 2016
    Assignee: MARVELL INTERNATIONAL LTD.
    Inventor: Kim Schuttenberg
  • Patent number: 9292294
    Abstract: Method and apparatus to efficiently detect violations of data dependency relationships. A memory address associated with a computer instruction may be obtained. A current state of the memory address may be identified. The current state may include whether the memory address is associated with a read or a store instruction, and whether the memory address is associated with a set or a check. A previously accumulated state associated with the memory address may be retrieved from a data structure. The previously accumulated state may include whether the memory address was previously associated with a read or a store instruction, and whether the memory address was previously associated with a set or a check. If a transition from the previously accumulated state to the current state is invalid, a failure condition may be signaled.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: March 22, 2016
    Assignee: Intel Corporation
    Inventors: Muawya M. Al-Otoom, Paul Caprioli, Ryan Carlson, Ho-Seop Kim, Omar Shaikh
  • Patent number: 9250906
    Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.
    Type: Grant
    Filed: August 30, 2015
    Date of Patent: February 2, 2016
    Assignee: Intel Corporation
    Inventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
  • Patent number: 9251073
    Abstract: A multi core processor implements a cash coherency protocol in which probe messages are address-ordered on a probe channel while responses are un-ordered on a response channel. When a first core generates a read of an address that misses in the first core's cache, a line fill is initiated. If a second core is writing the same address, the second core generates an update on the addressed ordered probe channel. The second core's update may arrive before or after the first core's line fill returns. If the update arrived before the fill returned, a mask is maintained to indicate which portions of the line were modified by the update so that the late arriving line fill only modifies portions of the line that were unaffected by the earlier-arriving update.
    Type: Grant
    Filed: December 31, 2012
    Date of Patent: February 2, 2016
    Assignee: Intel Corporation
    Inventors: Simon C. Steely, William C. Hasenplaugh
  • Patent number: 9244687
    Abstract: Receive packed data operation mask comparison instruction indicating first packed data operation mask having first packed data operation mask bits and second packed data operation mask having second packed data operation mask bits. Each packed data operation mask bit of first mask corresponds to a packed data operation mask bit of second mask in corresponding position. Modify first flag to first value if bitwise AND of each packed data operation mask bit of first mask with each corresponding packed data operation mask bit of second mask is zero. Otherwise modify first flag to second value. Modify second flag to third value if bitwise AND of each packed data operation mask bit of first mask with bitwise NOT of each corresponding packed data operation mask bit of second mask is zero. Otherwise modify second flag to fourth value.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: January 26, 2016
    Assignee: Intel Corporation
    Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Mark Charney
  • Patent number: 9229713
    Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
    Type: Grant
    Filed: August 22, 2012
    Date of Patent: January 5, 2016
    Assignee: MicroUnity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 9223578
    Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.
    Type: Grant
    Filed: September 21, 2010
    Date of Patent: December 29, 2015
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
  • Patent number: 9218185
    Abstract: Embodiments relate to multithreading capability information retrieval. An aspect is a computer system includes a configuration with one or more cores configurable between a single thread (ST) mode and a multithreading (MT) mode. The ST mode addresses a primary thread and the MT mode addresses the primary thread and one or more secondary threads on shared resources of each core. The computer system also includes a multithreading facility configured to control utilization of the configuration to perform a method that includes executing, by the core, a retrieve multithreading capability information instruction. The execution includes obtaining thread identification information that identifies multithreading capability of the configuration, and storing the obtained thread identification information.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: December 22, 2015
    Assignee: International Business Machines Corporation
    Inventors: Jonathan D. Bradbury, Fadi Y. Busaba, Mark S. Farrell, Charles W. Gainey, Jr., Dan F. Greiner, Lisa Cranton Heller, Jeffrey P. Kubala, Damian L. Osisek, Donald W. Schmidt, Timothy J. Slegel
  • Patent number: 9189432
    Abstract: A data processing apparatus comprises processing circuitry and a plurality of storage units. When the processing circuitry executes a data access instruction, then a storage controller identifies based on a target storage address of the data access instruction, which of the storage units includes the target storage location identified by the target storage address. Prediction circuitry is provided to predict a predicted storage unit predicted to include the target storage location, so that retrieval of the data value from the predicted storage unit can be initiated before the storage controller has identified the target storage unit. The prediction circuitry makes the prediction based on the type of the data access instruction executed by the processing circuitry.
    Type: Grant
    Filed: November 15, 2010
    Date of Patent: November 17, 2015
    Assignee: ARM Limited
    Inventors: Melanie Emanuelle Lucie Teyssier, Florent Begon, Jocelyn Francois Orion Jaubert, Nicolas Jean Phillippe Huot
  • Patent number: 9170819
    Abstract: A data processing apparatus comprises first and second processing circuitry. A conditional instruction executed by the second processing circuitry may have an outcome which is dependent on one of a plurality of sets of condition information maintained by the first processing circuitry. A first forwarding path can forward the sets of condition information from the first processing circuitry to a predetermined pipeline stage of a processing pipeline of the second processing circuitry. A request path can transmit a request signal from the second processing circuitry to the first processing circuitry, the request signal indicating a requested set of condition information which was not yet valid when a conditional instruction was at the predetermined pipeline stage. A second forwarding path may forward the requested set of condition information to a subsequent pipeline stage when the information becomes valid.
    Type: Grant
    Filed: January 9, 2013
    Date of Patent: October 27, 2015
    Assignee: ARM Limited
    Inventors: Nicolas Chaussade, Luca Scalabrino, Frederic Jean Denis Arsanto, Cedric Denis Robert Airaud
  • Patent number: 9164761
    Abstract: A pipelined processor including one or more units having storage locations not directly accessible by software instructions. The processor includes a load-store unit (LSU) in direct communication with the one or more units for accessing the storage locations in response to special instructions. The processor also includes a requesting unit for receiving a special instruction from a requestor and a mechanism for performing a method. The method includes broadcasting storage location information from the special instruction to one or more of the units to determine a corresponding unit having the storage location specified by the special instruction. Execution of the special instruction is initiated at the corresponding unit. If the unit executing the special instruction is not the LSU, the data is sent to the LSU. The data is received from the LSU as a result of the execution of the special instruction. The data is provided to the requester.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: October 20, 2015
    Assignee: International Business Machines Corporation
    Inventors: Aaron Tsai, Bruce C. Giamei, Chung-Lung Kevin Shum, Scott B. Swaney
  • Patent number: 9122474
    Abstract: A technique for minimizing overhead caused by copying or moving a value from one cluster to another cluster is provided. A number of operations, for example, a mov operation for moving or copying a value from one cluster to another cluster and a normal operation may be executed concurrently. Accordingly, access to a register file outside of the cluster may be reduced and the performance of code may be improved.
    Type: Grant
    Filed: July 11, 2012
    Date of Patent: September 1, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Min-Wook Ahn, Tai-Song Jin, Hee-Jin Ahn
  • Patent number: 9092236
    Abstract: A method and system of prefetching and fetching processor instructions is designed for reduced code fraction, for scaled packed instructions before runtime, and for adaptive, concurrent instruction prefetch and fetch at runtime. The invention is designed for reducing energy consumption of the instruction cache memories by accurately accessing the instructions that will be executed and by terminating instruction prefetch after prefetching instructions from the possible paths. The invention is also designed for improving performance by reducing branch instructions and by prefetching and fetching instructions adaptively. In particular, compiled native instructions are converted to mixed packed nonnative and non-packed native instructions for generating more streamlined code and storing the native instructions of the packed instructions in dedicated, separate regions of distinct addresses in the concurrent accessible instruction cache and main memories.
    Type: Grant
    Filed: June 4, 2012
    Date of Patent: July 28, 2015
    Inventor: Yong-Kyu Jung
  • Patent number: 9075721
    Abstract: A program for causing an information processing apparatus to execute a process of a virtual calculator, the process including judging, when a switching of a virtual address space being a processing target of a virtual calculation apparatus occurs, whether or not a there exits physical calculation apparatus in which cache information of a physical address space corresponding to a virtual address space of a switching destination is accumulated; selecting the physical calculation apparatus when there exists a physical calculation apparatus in which the cache information of the physical address space is accumulated, and selecting the physical calculation apparatus in which cache information itself is not accumulated when there exists no physical calculation apparatus in which the cache information is accumulated; and assigning the selected physical calculation apparatus to the virtual calculation apparatus in which the switching of the virtual address space being a processing target has occurred.
    Type: Grant
    Filed: February 25, 2013
    Date of Patent: July 7, 2015
    Assignee: FUJITSU LIMITED
    Inventor: Hidetaka Tamura
  • Patent number: 9043583
    Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: May 26, 2015
    Assignee: Intel Corporation
    Inventor: Patrice Roussel
  • Patent number: 9043576
    Abstract: System and method for conversion of virtual machine files without requiring copying of the virtual machine payload (data) from one location to another location. By eliminating this step, applicant's invention significantly enhances the efficiency of the conversion process. In one embodiment, a file system or storage system provides indirections to locations of data elements stored on a persistent storage media. A source virtual machine file includes hypervisor metadata (HM) data elements in one hypervisor file format, and virtual machine payload (VMP) data elements.
    Type: Grant
    Filed: August 21, 2013
    Date of Patent: May 26, 2015
    Assignee: SimpliVity Corporation
    Inventors: Jesse St. Laurent, James E. King, III
  • Patent number: 9037813
    Abstract: A data accessing method, and a storage system and a controller using the same are provided. The data accessing method is suitable for a flash memory storage system having a data perturbation module. The data accessing method includes receiving a read command from a host and obtaining a logical block to be read and a page to be read from the read command. The data accessing method also includes determining whether a physical block in a data area corresponding to the logical block to be read is a new block and transmitting a predetermined data to the host when the physical block corresponding to the logical block to be read is a new block. Thereby, the host is prevented from reading garbled code from the flash memory storage system having the data perturbation module.
    Type: Grant
    Filed: November 5, 2013
    Date of Patent: May 19, 2015
    Assignee: PHISON ELECTRONICS CORP.
    Inventors: Chien-Hua Chu, Chih-Kang Yeh
  • Patent number: 9037838
    Abstract: A multiprocessor system includes a first microprocessor and a second microprocessor. A first signaling pathway is configured to send message transmission coordination signals from the first microprocessor to the second microprocessor. The first signaling pathway may be coupled to at least two flag registers associated with the second microprocessor. A second signaling pathway is configured to send message transmission coordination signals from the second microprocessor to the first microprocessor. The second signaling pathway may be coupled to at least two flag registers associated with the first microprocessor. The first signaling pathway is independent of the second signaling pathway.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: May 19, 2015
    Assignee: EMC Corporation
    Inventor: Paul A. Shubel
  • Patent number: 9037836
    Abstract: An array of a plurality of processing elements (PEs) are in a data packet-switched network interconnecting the PEs and memory to enable any of the PEs to access the memory. The network connects the PEs and their local memories to a common controller. The common controller may include a shared load/store (SLS) unit and an array control unit. A shared read may be addressed to an external device via the common controller. The SLS unit can continue activity as if a normal shared read operation has taken place, except that the transactions that have been sent externally may take more cycles to complete than the local shared reads. Hence, a number of transaction-enabled flags may not have been deactivated even though there is no more bus activity. The SLS unit can use this state to indicate to the array control unit that a thread switch may now take place.
    Type: Grant
    Filed: January 11, 2011
    Date of Patent: May 19, 2015
    Assignee: Rambus Inc.
    Inventor: Ray McConnell
  • Publication number: 20150134938
    Abstract: An image processing device includes an operation unit and is able to receive a plurality of operation instructions in parallel from the operation unit and a portable information processing terminal. The image processing device includes: an instruction processing unit that executes processing according to the received operation instructions.
    Type: Application
    Filed: November 10, 2014
    Publication date: May 14, 2015
    Inventors: Tamotsu HOSONO, Takahiro Ide
  • Publication number: 20150134937
    Abstract: Vector single instruction multiple data (SIMD) shift and rotate instructions are provided specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, and a second vector register. Vector data fields of a first element size are duplicated. Duplicate vector data fields are stored as corresponding data fields of twice the first element size. Control logic receives an element size for performing a SIMD shift or rotation operation. Through selectors corresponding to a vector element, portions are selected from the duplicated data fields, the selectors corresponding to any particular vector element select all portions similarly from the duplicated data fields for that particular vector element responsive to the first element size, but selectors corresponding to any particular vector element select at least two portions from the duplicated data fields differently for that particular vector element responsive to a second element size.
    Type: Application
    Filed: December 30, 2011
    Publication date: May 14, 2015
    Inventors: Asaf Rubinstein, Tom Aviram
  • Patent number: 9032174
    Abstract: A processor determines whether a first program is under execution when a second program is executed, and changes a setting of a memory management unit based on access prohibition information so that a fault occurs when the second program makes an access to a memory when the first program is under execution. Then, the processor determines whether an access from the second program to a memory area used by the first program is permitted based on memory restriction information when the fault occurs while the first program and the second program are under execution, and changes the setting of the memory management unit so that the fault does not occur when the access to the memory area is permitted.
    Type: Grant
    Filed: February 11, 2013
    Date of Patent: May 12, 2015
    Assignee: Fujitsu Limited
    Inventor: Naoki Nishiguchi
  • Publication number: 20150127927
    Abstract: Embodiments of the disclosure provide efficient hardware dispatching of concurrent functions in multicore processors, and related processor systems, methods, and computer-readable media. In one embodiment, a first instruction indicating an operation requesting a concurrent transfer of program control is detected in a first hardware thread of a multicore processor. A request for the concurrent transfer of program control is enqueued in a hardware first-in-first-out (FIFO) queue. A second instruction indicating an operation dispatching the request for the concurrent transfer of program control in the hardware FIFO queue is detected in a second hardware thread of the multicore processor. The request for the concurrent transfer of program control is dequeued from the hardware FIFO queue, and the concurrent transfer of program control is executed in the second hardware thread.
    Type: Application
    Filed: March 25, 2014
    Publication date: May 7, 2015
    Applicant: QUALCOMM INCORPORATED
    Inventors: Michael William Paddon, Erik Asmussen de Castro Lopo, Matthew Christian Duggan, Kento Tarui, Craig Matthew Brown
  • Publication number: 20150121045
    Abstract: A read operation is initiated to obtain a wide input operand. Based on the initiating, a determination is made as to whether the wide input operand is available in a wide register or in two narrow registers. Based on determining the wide input operand is not available in the wide register, merging at least a portion of contents of the two narrow registers to obtain merged contents, writing the merged contents into the wide register, and continuing the read operation to obtain the wide input operand. Based on determining the wide input operand is available in the wide register, obtaining the wide input operand from the wide register.
    Type: Application
    Filed: October 31, 2013
    Publication date: April 30, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind
  • Publication number: 20150121046
    Abstract: The present invention provides a method and apparatus for supporting embodiments of an out-of-order load to load queue structure. One embodiment of the apparatus includes a load queue for storing memory operations adapted to be executed out-of-order with respect to other memory operations. The apparatus also includes a load order queue for cacheable operations that ordered for a particular address.
    Type: Application
    Filed: October 24, 2014
    Publication date: April 30, 2015
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Thomas Kunjan, Scott T. Bingham, Marius Evers, James D. Williams
  • Patent number: 9021233
    Abstract: A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by elements of earlier and a later vector instructions, one being a write instruction. An element indicating the next data access for each of the instructions is determined. The next data accesses for the earlier and the later instructions may be reordered. The next data access of the earlier instruction is selected if the position of the earlier instruction's next data element is less than or equal to the position of the later instruction's next data element minus a predetermined value. The next data access of the later instruction may be selected if the position of the earlier instruction's next data element is higher than the position of the later instruction's next data element minus a predetermined value. Thus data accesses from earlier and later instructions are partially interleaved.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: April 28, 2015
    Assignee: ARM Limited
    Inventor: Alastair David Reid
  • Patent number: 9021237
    Abstract: A method and circuit arrangement utilize a low latency variable transfer network between the register files of multiple processing cores in a multi-core processor chip to support fine grained parallelism of virtual threads across multiple hardware threads. The communication of a variable over the variable transfer network may be initiated by a move from a local register in a register file of a source processing core to a variable register that is allocated to a destination hardware thread in a destination processing core, so that the destination hardware thread can then move the variable from the variable register to a local register in the destination processing core.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: April 28, 2015
    Assignee: International Business Machines Corporation
    Inventors: Miguel Comparan, Russell D. Hoover, Robert A. Shearer, Alfred T. Watson, III
  • Publication number: 20150113254
    Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.
    Type: Application
    Filed: October 23, 2013
    Publication date: April 23, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: David Conrad TANNENBAUM, Srinivasan (Vasu) IYER, Stuart F. OBERMAN, Ming Y. SIU, Michael Alan FETTERMAN, John Matthew BURGESS, Shirish GADRE
  • Patent number: 9015453
    Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: April 21, 2015
    Assignee: Intel Corporation
    Inventors: Alexander D. Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Patent number: 9015451
    Abstract: A processor and a memory management method are provided. The processor includes a processor core, a cache which transceives data to/from the processor core via a single port, and stores the data accessed by the processor core, and a Scratch Pad Memory (SPM) which transceives the data to/from the processor core via at least one of a plurality of multi ports.
    Type: Grant
    Filed: March 14, 2008
    Date of Patent: April 21, 2015
    Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation
    Inventors: Il Hyun Park, Soojung Ryu, Dong-Hoon Yoo, Dong Kwan Suh, Jeongwook Kim, Choon Ki Jang
  • Publication number: 20150106599
    Abstract: Optimizations are provided for frame management operations, including a clear operation and/or a set storage key operation, requested by pageable guests. The operations are performed, absent host intervention, on frames not resident in host memory. The operations may be specified in an instruction issued by the pageable guests.
    Type: Application
    Filed: December 19, 2014
    Publication date: April 16, 2015
    Inventors: Charles W. Gainey, JR., Dan F. Greiner, Lisa C. Heller, Damian L. Osisek, Gustav E. Sittmann, III
  • Publication number: 20150106598
    Abstract: A computer processor is provided with a plurality of functional units that performs operations specified by the at least one instruction over the multiple machine cycles, wherein the operations produce result operands. The processor also includes circuitry that generates result tags dynamically according to the number of operations that produce result operands in a given machine cycle. A bypass network is configured to provide data paths for transfer of operand data between the plurality of functional units according to the result tags.
    Type: Application
    Filed: October 15, 2014
    Publication date: April 16, 2015
    Applicant: Mill Computing, Inc.
    Inventor: Arthur David Kahlich
  • Publication number: 20150106597
    Abstract: A computer processor and corresponding method of operation employs execution logic that includes at least one functional unit and operand storage that stores data that is produced and consumed by the at least one functional unit. The at least one functional unit is configured to execute a deferred operation whose execution produces result data. The execution logic further includes a retire station that is configured to store and retire the result data of the deferred operation in order to store such result data in the operand storage, wherein the retire of such result data occurs at a machine cycle following issue of the deferred operation as controlled by statically-assigned parameter data included in the encoding of the deferred operation.
    Type: Application
    Filed: October 15, 2014
    Publication date: April 16, 2015
    Applicant: Mill Computing, Inc.
    Inventors: Roger Rawson Godard, Arthur David Kahlich, Nachum Kanovsky
  • Patent number: 9009450
    Abstract: A data processing system 2 includes a processor core 4 and a memory 6. The processor core 4 includes processing circuitry 12, 14, 16, 18, 26 controlled by control signals generated by decoder circuitry 24 which decodes program instructions. The program instructions include mixed operand size instructions (either load/store instructions or arithmetic instructions) which have a first input operand of a first operand size and a second input operand of a second input operand size where the second operand size is smaller than the first operand size. The processing performed first converts the second operand so as to have the first operand size. The processing then generates a third operand using as inputs the first operand of the first operand size and the second operand now converted to have the first operand size.
    Type: Grant
    Filed: January 19, 2012
    Date of Patent: April 14, 2015
    Assignee: ARM Limited
    Inventors: Nigel John Stephens, David James Seal
  • Patent number: 9009452
    Abstract: A computing system processes memory transactions for parallel processing of multiple threads of execution with millicode assists. The computing system transactional memory support provides a Transaction Table in memory and a method of fast detection of potential conflicts between multiple transactions. Special instructions may mark the boundaries of a transaction and identify memory locations applicable to a transaction. A ‘private to transaction’ (PTRAN) tag, directly addressable as part of the main data storage memory location, enables a quick detection of potential conflicts with other transactions that are concurrently executing on another thread of said computing system. The tag indicates whether (or not) a data entry in memory is part of a speculative memory state of an uncommitted transaction that is currently active in the system.
    Type: Grant
    Filed: October 30, 2007
    Date of Patent: April 14, 2015
    Assignee: International Business Machines Corporation
    Inventor: Thomas J. Heller, Jr.
  • Patent number: 9009444
    Abstract: A method, computer program product, and computing system for receiving a reservation for a LUN from Host A, wherein the LUN is defined within a data array. A lock for the LUN is defined as Host A. A write request is received for the LUN from Host B. The lock for the LUN is defined as Transitioning A to B. The write request is delayed for a defined period of time.
    Type: Grant
    Filed: September 29, 2012
    Date of Patent: April 14, 2015
    Assignee: EMC Corporation
    Inventors: Philip Derbeko, Arieh Don, Anat Eyal, Kevin F. Martin, Richard A. Trabing