Processing Control For Data Transfer Patents (Class 712/225)
-
Patent number: 9588767Abstract: An aspect includes receiving a write request that includes a memory address and write data. Stored data is read from a memory location at the memory address. Based on determining that the memory location was not previously modified, the stored data is compared to the write data. Based on the stored data matching the write data, the write request is completed without writing the write data to the memory and a corresponding silent store bit, in a silent store bitmap is set. Based on the stored data not matching the write data, the write data is written to the memory location, the silent store bit is reset and a corresponding modified bit is set. At least one of an application and an operating system is provided access to the silent store bitmap.Type: GrantFiled: June 25, 2015Date of Patent: March 7, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Pradip Bose, Chen-Yong Cher, Ravi Nair
-
Patent number: 9589314Abstract: Systems, methods, and apparatus for performing queries in a graphics processing system are disclosed. These systems, methods, and apparatus may be configured to read a running counter at the start of the query to determine a start value, wherein the running counter counts discrete graphical entities, read the running counter at the end of the query to determine an end value, and subtract the start value from the end value to determine a result.Type: GrantFiled: August 29, 2013Date of Patent: March 7, 2017Assignee: QUALCOMM IncorporatedInventors: Avinash Seetharamaiah, Hitendra Mohan Gangani, Nigel Terence Poole
-
Patent number: 9588768Abstract: An aspect includes receiving a write request that includes a memory address and write data. Stored data is read from a memory location at the memory address. Based on determining that the memory location was not previously modified, the stored data is compared to the write data. Based on the stored data matching the write data, the write request is completed without writing the write data to the memory and a corresponding silent store bit, in a silent store bitmap is set. Based on the stored data not matching the write data, the write data is written to the memory location, the silent store bit is reset and a corresponding modified bit is set. At least one of an application and an operating system is provided access to the silent store bitmap.Type: GrantFiled: November 23, 2015Date of Patent: March 7, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Pradip Bose, Chen-Yong Cher, Ravi Nair
-
Patent number: 9552207Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.Type: GrantFiled: February 2, 2016Date of Patent: January 24, 2017Assignee: Intel CorporationInventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
-
Patent number: 9530176Abstract: An image processing apparatus and image processing method thereof are disclosed. The image processing apparatus includes a first image processor which includes a memory, performs a first signal processing on image data, and stores the first signal processed image data in the memory; a second image processor which directly accesses the memory and receives the stored image data, and performs a second signal processing on the received image data; and an image outputter which outputs the image data on which the second signal processing has been performed. Accordingly, only actual image data area is received, reducing data transmission volume, reducing signal transmission lines, securing CPU resources, and improving timing errors. In addition, it is possible to remove an image transmitter such as an additional low voltage differential signaling (LVDS) block, making the image processing apparatus thinner and smaller.Type: GrantFiled: February 15, 2013Date of Patent: December 27, 2016Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Ji-won Kim, Young-hun Choi
-
Patent number: 9471310Abstract: A method, computer program product, and system are provided for multi-input bitwise logical operations. The method includes the steps of receiving a multi-input bitwise logical operation instruction that specifies two or more input operands and a function operand, where a first input operand of the two or more input operands comprises a number of bits, each bit having a corresponding bit in each of the additional input operands in the two or more input operands. The function operand is written to a lookup table. Then, the lookup table is accessed for each set of corresponding input operand bits in the two or more input operands to generate an output for the multi-input bitwise logical operation instruction.Type: GrantFiled: November 26, 2012Date of Patent: October 18, 2016Assignee: NVIDIA CorporationInventor: Alexey Yuryevich Panteleev
-
Patent number: 9448798Abstract: An aspect includes receiving a write request that includes a memory address and write data. Stored data is read from a memory location at the memory address. Based on determining that the memory location was not previously modified, the stored data is compared to the write data. Based on the stored data matching the write data, the write request is completed without writing the write data to the memory and a corresponding silent store bit, in a silent store bitmap is set. Based on the stored data not matching the write data, the write data is written to the memory location, the silent store bit is reset and a corresponding modified bit is set. At least one of an application and an operating system is provided access to the silent store bitmap.Type: GrantFiled: March 31, 2016Date of Patent: September 20, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Pradip Bose, Chen-Yong Cher, Ravi Nair
-
Patent number: 9424032Abstract: Disclosed is a list vector processing apparatus (LVPA) or the like which can process the indirect reference at a high speed. The LVPA includes: a gather processing unit processing a first gather instruction to store a value of a storage area accessed by only a self information processing apparatus (SelfIPA) in a plurality of information processing apparatuses according to a list vector storing an address representing a storage area read from a storage apparatus into a register, and a process of generating reference access information indicating whether being a storage area accessed by both of the SelfIPA and another information processing apparatus; a communication unit for related information; an access information operating unit to calculate an area accessed by the information processing apparatus; and a scatter processing unit processing a first scatter instruction to store a value stored in the register into the storage area accessed by only the SelfIPA.Type: GrantFiled: February 24, 2014Date of Patent: August 23, 2016Assignee: NEC CorporationInventor: Satoru Tagaya
-
Patent number: 9424046Abstract: Systems and methods for load canceling in a processor that is connected to an external interconnect fabric are disclosed. As a part of a method for load canceling in a processor that is connected to an external bus, and responsive to a flush request and a corresponding cancellation of pending speculative loads from a load queue, a type of one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor, is converted from load to prefetch. Data corresponding to one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor is accessed and returned to cache as prefetch data. The prefetch data is retired in a cache location of the processor.Type: GrantFiled: October 11, 2012Date of Patent: August 23, 2016Assignee: SOFT MACHINES INC.Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah
-
Patent number: 9384046Abstract: An information processing apparatus includes a computer configured to set respectively a storage location for each value of a common variable among threads of a thread group having write requests to write the values of the common variable of the threads in a given process, from a specific storage location defined in the write requests, to the storage locations respectively set for the threads; store, for each thread of the thread group, a value of the common variable to the storage location set for the thread; and read out in order of execution of the threads of the thread group defined in the given process and when all the threads in the thread group have ended, each value of the common variable stored at the first storing, and in the order of execution, overwrite a value in the specific storage location with each read value of the common variable.Type: GrantFiled: April 4, 2013Date of Patent: July 5, 2016Assignee: FUJITSU LIMITEDInventors: Koji Kurihara, Koichiro Yamashita, Hiromasa Yamauchi, Takahisa Suzuki
-
Patent number: 9383982Abstract: Data-parallel computation programs may be improved by, for example, determining the functional properties user defined functions (UDFs), eliminating unnecessary data-shuffling stages, and/or changing data-partition properties to cause desired data properties to appear after one or more user defined functions are applied.Type: GrantFiled: September 12, 2012Date of Patent: July 5, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Jiaxing Zhang, Hucheng Zhou, Zhenyu Guo, Haoxiang Lin, Lidong Zhou
-
Patent number: 9378018Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path with of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.Type: GrantFiled: August 17, 2015Date of Patent: June 28, 2016Assignee: MicroUnity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Patent number: 9378019Abstract: A microprocessor instruction translator translates a conditional load instruction into at least two microinstructions. An out-of-order execution pipeline executes the microinstructions. To execute a first microinstruction, an execution unit receives source operands from the source registers of a register file and responsively generates a first result using the source operands. To execute a second the microinstruction, an execution unit receives a previous value of the destination register and the first result and responsively reads data from a memory location specified by the first result and provides a second result that is the data if a condition is satisfied and that is the previous destination register value if not. The previous value of the destination register comprises a result produced by execution of a microinstruction that is the most recent in-order previous writer of the destination register with respect to the second microinstruction.Type: GrantFiled: April 6, 2012Date of Patent: June 28, 2016Assignee: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker, Gerard M. Col, Colin Eddy
-
Patent number: 9361113Abstract: A method for reducing a pipeline stall in a multi-pipelined processor includes finding a store instruction having a same target address as a load instruction and having a store value of the store instruction not yet written according to the store instruction, when the store instruction is being concurrently processed in a different pipeline than the load instruction and the store instruction occurs before the load instruction in a program order. The method also includes associating a target rename register of the load instruction as well as the load instruction with the store instruction, responsive to the finding step. The method further includes writing the store value of the store instruction to the target rename register of the load instruction and finishing the load instruction without reissuing the load instruction, responsive to writing the store value of the store instruction according to the store instruction to finish the store instruction.Type: GrantFiled: April 24, 2013Date of Patent: June 7, 2016Assignee: GLOBALFOUNDRIES INC.Inventor: Takeshi Ogasawara
-
Patent number: 9311247Abstract: A method for detecting patterns of memory accesses in a computing system with out-of-order program execution is provided. The method comprises identifying a first memory operation instruction that is part of a memory stream that would benefit from memory prefetches, marking with program order a plurality of other memory operation instructions prior to execution that are part of the same memory stream as the first memory operation instruction while the plurality of other memory operation instructions are in program order, and, subsequent to out of program order execution of at least two of the plurality of marked memory operation instructions but before execution of all of the plurality of marked memory operation instructions, determining an expected offset value between memory addresses to be accessed by consecutively marked memory operation instructions using the marked memory operation instructions that have executed.Type: GrantFiled: March 19, 2013Date of Patent: April 12, 2016Assignee: MARVELL INTERNATIONAL LTD.Inventor: Kim Schuttenberg
-
Patent number: 9292294Abstract: Method and apparatus to efficiently detect violations of data dependency relationships. A memory address associated with a computer instruction may be obtained. A current state of the memory address may be identified. The current state may include whether the memory address is associated with a read or a store instruction, and whether the memory address is associated with a set or a check. A previously accumulated state associated with the memory address may be retrieved from a data structure. The previously accumulated state may include whether the memory address was previously associated with a read or a store instruction, and whether the memory address was previously associated with a set or a check. If a transition from the previously accumulated state to the current state is invalid, a failure condition may be signaled.Type: GrantFiled: September 27, 2012Date of Patent: March 22, 2016Assignee: Intel CorporationInventors: Muawya M. Al-Otoom, Paul Caprioli, Ryan Carlson, Ho-Seop Kim, Omar Shaikh
-
Patent number: 9250906Abstract: Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory.Type: GrantFiled: August 30, 2015Date of Patent: February 2, 2016Assignee: Intel CorporationInventors: Gadi Haber, Konstantin Kostya Levit-Gurevich, Esfir Natanzon, Boris Ginzburg, Aya Elhanan, Moshe Maury Bach, Igor Breger
-
Patent number: 9251073Abstract: A multi core processor implements a cash coherency protocol in which probe messages are address-ordered on a probe channel while responses are un-ordered on a response channel. When a first core generates a read of an address that misses in the first core's cache, a line fill is initiated. If a second core is writing the same address, the second core generates an update on the addressed ordered probe channel. The second core's update may arrive before or after the first core's line fill returns. If the update arrived before the fill returned, a mask is maintained to indicate which portions of the line were modified by the update so that the late arriving line fill only modifies portions of the line that were unaffected by the earlier-arriving update.Type: GrantFiled: December 31, 2012Date of Patent: February 2, 2016Assignee: Intel CorporationInventors: Simon C. Steely, William C. Hasenplaugh
-
Patent number: 9244687Abstract: Receive packed data operation mask comparison instruction indicating first packed data operation mask having first packed data operation mask bits and second packed data operation mask having second packed data operation mask bits. Each packed data operation mask bit of first mask corresponds to a packed data operation mask bit of second mask in corresponding position. Modify first flag to first value if bitwise AND of each packed data operation mask bit of first mask with each corresponding packed data operation mask bit of second mask is zero. Otherwise modify first flag to second value. Modify second flag to third value if bitwise AND of each packed data operation mask bit of first mask with bitwise NOT of each corresponding packed data operation mask bit of second mask is zero. Otherwise modify second flag to fourth value.Type: GrantFiled: December 29, 2011Date of Patent: January 26, 2016Assignee: Intel CorporationInventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Mark Charney
-
Patent number: 9229713Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.Type: GrantFiled: August 22, 2012Date of Patent: January 5, 2016Assignee: MicroUnity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Patent number: 9223578Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.Type: GrantFiled: September 21, 2010Date of Patent: December 29, 2015Assignee: NVIDIA CorporationInventors: John R. Nickolls, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
-
Patent number: 9218185Abstract: Embodiments relate to multithreading capability information retrieval. An aspect is a computer system includes a configuration with one or more cores configurable between a single thread (ST) mode and a multithreading (MT) mode. The ST mode addresses a primary thread and the MT mode addresses the primary thread and one or more secondary threads on shared resources of each core. The computer system also includes a multithreading facility configured to control utilization of the configuration to perform a method that includes executing, by the core, a retrieve multithreading capability information instruction. The execution includes obtaining thread identification information that identifies multithreading capability of the configuration, and storing the obtained thread identification information.Type: GrantFiled: March 27, 2014Date of Patent: December 22, 2015Assignee: International Business Machines CorporationInventors: Jonathan D. Bradbury, Fadi Y. Busaba, Mark S. Farrell, Charles W. Gainey, Jr., Dan F. Greiner, Lisa Cranton Heller, Jeffrey P. Kubala, Damian L. Osisek, Donald W. Schmidt, Timothy J. Slegel
-
Patent number: 9189432Abstract: A data processing apparatus comprises processing circuitry and a plurality of storage units. When the processing circuitry executes a data access instruction, then a storage controller identifies based on a target storage address of the data access instruction, which of the storage units includes the target storage location identified by the target storage address. Prediction circuitry is provided to predict a predicted storage unit predicted to include the target storage location, so that retrieval of the data value from the predicted storage unit can be initiated before the storage controller has identified the target storage unit. The prediction circuitry makes the prediction based on the type of the data access instruction executed by the processing circuitry.Type: GrantFiled: November 15, 2010Date of Patent: November 17, 2015Assignee: ARM LimitedInventors: Melanie Emanuelle Lucie Teyssier, Florent Begon, Jocelyn Francois Orion Jaubert, Nicolas Jean Phillippe Huot
-
Patent number: 9170819Abstract: A data processing apparatus comprises first and second processing circuitry. A conditional instruction executed by the second processing circuitry may have an outcome which is dependent on one of a plurality of sets of condition information maintained by the first processing circuitry. A first forwarding path can forward the sets of condition information from the first processing circuitry to a predetermined pipeline stage of a processing pipeline of the second processing circuitry. A request path can transmit a request signal from the second processing circuitry to the first processing circuitry, the request signal indicating a requested set of condition information which was not yet valid when a conditional instruction was at the predetermined pipeline stage. A second forwarding path may forward the requested set of condition information to a subsequent pipeline stage when the information becomes valid.Type: GrantFiled: January 9, 2013Date of Patent: October 27, 2015Assignee: ARM LimitedInventors: Nicolas Chaussade, Luca Scalabrino, Frederic Jean Denis Arsanto, Cedric Denis Robert Airaud
-
Patent number: 9164761Abstract: A pipelined processor including one or more units having storage locations not directly accessible by software instructions. The processor includes a load-store unit (LSU) in direct communication with the one or more units for accessing the storage locations in response to special instructions. The processor also includes a requesting unit for receiving a special instruction from a requestor and a mechanism for performing a method. The method includes broadcasting storage location information from the special instruction to one or more of the units to determine a corresponding unit having the storage location specified by the special instruction. Execution of the special instruction is initiated at the corresponding unit. If the unit executing the special instruction is not the LSU, the data is sent to the LSU. The data is received from the LSU as a result of the execution of the special instruction. The data is provided to the requester.Type: GrantFiled: February 19, 2008Date of Patent: October 20, 2015Assignee: International Business Machines CorporationInventors: Aaron Tsai, Bruce C. Giamei, Chung-Lung Kevin Shum, Scott B. Swaney
-
Patent number: 9122474Abstract: A technique for minimizing overhead caused by copying or moving a value from one cluster to another cluster is provided. A number of operations, for example, a mov operation for moving or copying a value from one cluster to another cluster and a normal operation may be executed concurrently. Accordingly, access to a register file outside of the cluster may be reduced and the performance of code may be improved.Type: GrantFiled: July 11, 2012Date of Patent: September 1, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Min-Wook Ahn, Tai-Song Jin, Hee-Jin Ahn
-
Patent number: 9092236Abstract: A method and system of prefetching and fetching processor instructions is designed for reduced code fraction, for scaled packed instructions before runtime, and for adaptive, concurrent instruction prefetch and fetch at runtime. The invention is designed for reducing energy consumption of the instruction cache memories by accurately accessing the instructions that will be executed and by terminating instruction prefetch after prefetching instructions from the possible paths. The invention is also designed for improving performance by reducing branch instructions and by prefetching and fetching instructions adaptively. In particular, compiled native instructions are converted to mixed packed nonnative and non-packed native instructions for generating more streamlined code and storing the native instructions of the packed instructions in dedicated, separate regions of distinct addresses in the concurrent accessible instruction cache and main memories.Type: GrantFiled: June 4, 2012Date of Patent: July 28, 2015Inventor: Yong-Kyu Jung
-
Patent number: 9075721Abstract: A program for causing an information processing apparatus to execute a process of a virtual calculator, the process including judging, when a switching of a virtual address space being a processing target of a virtual calculation apparatus occurs, whether or not a there exits physical calculation apparatus in which cache information of a physical address space corresponding to a virtual address space of a switching destination is accumulated; selecting the physical calculation apparatus when there exists a physical calculation apparatus in which the cache information of the physical address space is accumulated, and selecting the physical calculation apparatus in which cache information itself is not accumulated when there exists no physical calculation apparatus in which the cache information is accumulated; and assigning the selected physical calculation apparatus to the virtual calculation apparatus in which the switching of the virtual address space being a processing target has occurred.Type: GrantFiled: February 25, 2013Date of Patent: July 7, 2015Assignee: FUJITSU LIMITEDInventor: Hidetaka Tamura
-
Patent number: 9043583Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.Type: GrantFiled: March 15, 2013Date of Patent: May 26, 2015Assignee: Intel CorporationInventor: Patrice Roussel
-
Patent number: 9043576Abstract: System and method for conversion of virtual machine files without requiring copying of the virtual machine payload (data) from one location to another location. By eliminating this step, applicant's invention significantly enhances the efficiency of the conversion process. In one embodiment, a file system or storage system provides indirections to locations of data elements stored on a persistent storage media. A source virtual machine file includes hypervisor metadata (HM) data elements in one hypervisor file format, and virtual machine payload (VMP) data elements.Type: GrantFiled: August 21, 2013Date of Patent: May 26, 2015Assignee: SimpliVity CorporationInventors: Jesse St. Laurent, James E. King, III
-
Patent number: 9037813Abstract: A data accessing method, and a storage system and a controller using the same are provided. The data accessing method is suitable for a flash memory storage system having a data perturbation module. The data accessing method includes receiving a read command from a host and obtaining a logical block to be read and a page to be read from the read command. The data accessing method also includes determining whether a physical block in a data area corresponding to the logical block to be read is a new block and transmitting a predetermined data to the host when the physical block corresponding to the logical block to be read is a new block. Thereby, the host is prevented from reading garbled code from the flash memory storage system having the data perturbation module.Type: GrantFiled: November 5, 2013Date of Patent: May 19, 2015Assignee: PHISON ELECTRONICS CORP.Inventors: Chien-Hua Chu, Chih-Kang Yeh
-
Patent number: 9037838Abstract: A multiprocessor system includes a first microprocessor and a second microprocessor. A first signaling pathway is configured to send message transmission coordination signals from the first microprocessor to the second microprocessor. The first signaling pathway may be coupled to at least two flag registers associated with the second microprocessor. A second signaling pathway is configured to send message transmission coordination signals from the second microprocessor to the first microprocessor. The second signaling pathway may be coupled to at least two flag registers associated with the first microprocessor. The first signaling pathway is independent of the second signaling pathway.Type: GrantFiled: September 30, 2011Date of Patent: May 19, 2015Assignee: EMC CorporationInventor: Paul A. Shubel
-
Patent number: 9037836Abstract: An array of a plurality of processing elements (PEs) are in a data packet-switched network interconnecting the PEs and memory to enable any of the PEs to access the memory. The network connects the PEs and their local memories to a common controller. The common controller may include a shared load/store (SLS) unit and an array control unit. A shared read may be addressed to an external device via the common controller. The SLS unit can continue activity as if a normal shared read operation has taken place, except that the transactions that have been sent externally may take more cycles to complete than the local shared reads. Hence, a number of transaction-enabled flags may not have been deactivated even though there is no more bus activity. The SLS unit can use this state to indicate to the array control unit that a thread switch may now take place.Type: GrantFiled: January 11, 2011Date of Patent: May 19, 2015Assignee: Rambus Inc.Inventor: Ray McConnell
-
Publication number: 20150134938Abstract: An image processing device includes an operation unit and is able to receive a plurality of operation instructions in parallel from the operation unit and a portable information processing terminal. The image processing device includes: an instruction processing unit that executes processing according to the received operation instructions.Type: ApplicationFiled: November 10, 2014Publication date: May 14, 2015Inventors: Tamotsu HOSONO, Takahiro Ide
-
Publication number: 20150134937Abstract: Vector single instruction multiple data (SIMD) shift and rotate instructions are provided specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, and a second vector register. Vector data fields of a first element size are duplicated. Duplicate vector data fields are stored as corresponding data fields of twice the first element size. Control logic receives an element size for performing a SIMD shift or rotation operation. Through selectors corresponding to a vector element, portions are selected from the duplicated data fields, the selectors corresponding to any particular vector element select all portions similarly from the duplicated data fields for that particular vector element responsive to the first element size, but selectors corresponding to any particular vector element select at least two portions from the duplicated data fields differently for that particular vector element responsive to a second element size.Type: ApplicationFiled: December 30, 2011Publication date: May 14, 2015Inventors: Asaf Rubinstein, Tom Aviram
-
Patent number: 9032174Abstract: A processor determines whether a first program is under execution when a second program is executed, and changes a setting of a memory management unit based on access prohibition information so that a fault occurs when the second program makes an access to a memory when the first program is under execution. Then, the processor determines whether an access from the second program to a memory area used by the first program is permitted based on memory restriction information when the fault occurs while the first program and the second program are under execution, and changes the setting of the memory management unit so that the fault does not occur when the access to the memory area is permitted.Type: GrantFiled: February 11, 2013Date of Patent: May 12, 2015Assignee: Fujitsu LimitedInventor: Naoki Nishiguchi
-
Publication number: 20150127927Abstract: Embodiments of the disclosure provide efficient hardware dispatching of concurrent functions in multicore processors, and related processor systems, methods, and computer-readable media. In one embodiment, a first instruction indicating an operation requesting a concurrent transfer of program control is detected in a first hardware thread of a multicore processor. A request for the concurrent transfer of program control is enqueued in a hardware first-in-first-out (FIFO) queue. A second instruction indicating an operation dispatching the request for the concurrent transfer of program control in the hardware FIFO queue is detected in a second hardware thread of the multicore processor. The request for the concurrent transfer of program control is dequeued from the hardware FIFO queue, and the concurrent transfer of program control is executed in the second hardware thread.Type: ApplicationFiled: March 25, 2014Publication date: May 7, 2015Applicant: QUALCOMM INCORPORATEDInventors: Michael William Paddon, Erik Asmussen de Castro Lopo, Matthew Christian Duggan, Kento Tarui, Craig Matthew Brown
-
Publication number: 20150121045Abstract: A read operation is initiated to obtain a wide input operand. Based on the initiating, a determination is made as to whether the wide input operand is available in a wide register or in two narrow registers. Based on determining the wide input operand is not available in the wide register, merging at least a portion of contents of the two narrow registers to obtain merged contents, writing the merged contents into the wide register, and continuing the read operation to obtain the wide input operand. Based on determining the wide input operand is available in the wide register, obtaining the wide input operand from the wide register.Type: ApplicationFiled: October 31, 2013Publication date: April 30, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Michael K. Gschwind
-
Publication number: 20150121046Abstract: The present invention provides a method and apparatus for supporting embodiments of an out-of-order load to load queue structure. One embodiment of the apparatus includes a load queue for storing memory operations adapted to be executed out-of-order with respect to other memory operations. The apparatus also includes a load order queue for cacheable operations that ordered for a particular address.Type: ApplicationFiled: October 24, 2014Publication date: April 30, 2015Applicant: Advanced Micro Devices, Inc.Inventors: Thomas Kunjan, Scott T. Bingham, Marius Evers, James D. Williams
-
Patent number: 9021233Abstract: A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by elements of earlier and a later vector instructions, one being a write instruction. An element indicating the next data access for each of the instructions is determined. The next data accesses for the earlier and the later instructions may be reordered. The next data access of the earlier instruction is selected if the position of the earlier instruction's next data element is less than or equal to the position of the later instruction's next data element minus a predetermined value. The next data access of the later instruction may be selected if the position of the earlier instruction's next data element is higher than the position of the later instruction's next data element minus a predetermined value. Thus data accesses from earlier and later instructions are partially interleaved.Type: GrantFiled: September 28, 2011Date of Patent: April 28, 2015Assignee: ARM LimitedInventor: Alastair David Reid
-
Patent number: 9021237Abstract: A method and circuit arrangement utilize a low latency variable transfer network between the register files of multiple processing cores in a multi-core processor chip to support fine grained parallelism of virtual threads across multiple hardware threads. The communication of a variable over the variable transfer network may be initiated by a move from a local register in a register file of a source processing core to a variable register that is allocated to a destination hardware thread in a destination processing core, so that the destination hardware thread can then move the variable from the variable register to a local register in the destination processing core.Type: GrantFiled: December 20, 2011Date of Patent: April 28, 2015Assignee: International Business Machines CorporationInventors: Miguel Comparan, Russell D. Hoover, Robert A. Shearer, Alfred T. Watson, III
-
Publication number: 20150113254Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.Type: ApplicationFiled: October 23, 2013Publication date: April 23, 2015Applicant: NVIDIA CORPORATIONInventors: David Conrad TANNENBAUM, Srinivasan (Vasu) IYER, Stuart F. OBERMAN, Ming Y. SIU, Michael Alan FETTERMAN, John Matthew BURGESS, Shirish GADRE
-
Patent number: 9015453Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.Type: GrantFiled: December 29, 2012Date of Patent: April 21, 2015Assignee: Intel CorporationInventors: Alexander D. Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
-
Patent number: 9015451Abstract: A processor and a memory management method are provided. The processor includes a processor core, a cache which transceives data to/from the processor core via a single port, and stores the data accessed by the processor core, and a Scratch Pad Memory (SPM) which transceives the data to/from the processor core via at least one of a plurality of multi ports.Type: GrantFiled: March 14, 2008Date of Patent: April 21, 2015Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB FoundationInventors: Il Hyun Park, Soojung Ryu, Dong-Hoon Yoo, Dong Kwan Suh, Jeongwook Kim, Choon Ki Jang
-
Publication number: 20150106599Abstract: Optimizations are provided for frame management operations, including a clear operation and/or a set storage key operation, requested by pageable guests. The operations are performed, absent host intervention, on frames not resident in host memory. The operations may be specified in an instruction issued by the pageable guests.Type: ApplicationFiled: December 19, 2014Publication date: April 16, 2015Inventors: Charles W. Gainey, JR., Dan F. Greiner, Lisa C. Heller, Damian L. Osisek, Gustav E. Sittmann, III
-
Publication number: 20150106598Abstract: A computer processor is provided with a plurality of functional units that performs operations specified by the at least one instruction over the multiple machine cycles, wherein the operations produce result operands. The processor also includes circuitry that generates result tags dynamically according to the number of operations that produce result operands in a given machine cycle. A bypass network is configured to provide data paths for transfer of operand data between the plurality of functional units according to the result tags.Type: ApplicationFiled: October 15, 2014Publication date: April 16, 2015Applicant: Mill Computing, Inc.Inventor: Arthur David Kahlich
-
Publication number: 20150106597Abstract: A computer processor and corresponding method of operation employs execution logic that includes at least one functional unit and operand storage that stores data that is produced and consumed by the at least one functional unit. The at least one functional unit is configured to execute a deferred operation whose execution produces result data. The execution logic further includes a retire station that is configured to store and retire the result data of the deferred operation in order to store such result data in the operand storage, wherein the retire of such result data occurs at a machine cycle following issue of the deferred operation as controlled by statically-assigned parameter data included in the encoding of the deferred operation.Type: ApplicationFiled: October 15, 2014Publication date: April 16, 2015Applicant: Mill Computing, Inc.Inventors: Roger Rawson Godard, Arthur David Kahlich, Nachum Kanovsky
-
Patent number: 9009450Abstract: A data processing system 2 includes a processor core 4 and a memory 6. The processor core 4 includes processing circuitry 12, 14, 16, 18, 26 controlled by control signals generated by decoder circuitry 24 which decodes program instructions. The program instructions include mixed operand size instructions (either load/store instructions or arithmetic instructions) which have a first input operand of a first operand size and a second input operand of a second input operand size where the second operand size is smaller than the first operand size. The processing performed first converts the second operand so as to have the first operand size. The processing then generates a third operand using as inputs the first operand of the first operand size and the second operand now converted to have the first operand size.Type: GrantFiled: January 19, 2012Date of Patent: April 14, 2015Assignee: ARM LimitedInventors: Nigel John Stephens, David James Seal
-
Patent number: 9009452Abstract: A computing system processes memory transactions for parallel processing of multiple threads of execution with millicode assists. The computing system transactional memory support provides a Transaction Table in memory and a method of fast detection of potential conflicts between multiple transactions. Special instructions may mark the boundaries of a transaction and identify memory locations applicable to a transaction. A ‘private to transaction’ (PTRAN) tag, directly addressable as part of the main data storage memory location, enables a quick detection of potential conflicts with other transactions that are concurrently executing on another thread of said computing system. The tag indicates whether (or not) a data entry in memory is part of a speculative memory state of an uncommitted transaction that is currently active in the system.Type: GrantFiled: October 30, 2007Date of Patent: April 14, 2015Assignee: International Business Machines CorporationInventor: Thomas J. Heller, Jr.
-
Patent number: 9009444Abstract: A method, computer program product, and computing system for receiving a reservation for a LUN from Host A, wherein the LUN is defined within a data array. A lock for the LUN is defined as Host A. A write request is received for the LUN from Host B. The lock for the LUN is defined as Transitioning A to B. The write request is delayed for a defined period of time.Type: GrantFiled: September 29, 2012Date of Patent: April 14, 2015Assignee: EMC CorporationInventors: Philip Derbeko, Arieh Don, Anat Eyal, Kevin F. Martin, Richard A. Trabing