Of Multiple Instructions Simultaneously Patents (Class 712/206)
  • Patent number: 11960893
    Abstract: A method, programming product, and/or system for prefetching instructions includes an instruction prefetch table that has a plurality of entries, each entry for storing a first portion of an indirect branch instruction address and a target address, wherein the indirect branch instruction has multiple target addresses and the instruction prefetch table is accessed by an index obtained by hashing a second portion of bits of the indirect branch instruction address with an information vector of the indirect branch instruction. A further embodiment includes a first prefetch table for uni-target branch instructions and a second prefetch table for multi-target branch instructions. In operation it is determined whether a branch instruction hits in one of the multiple prefetch tables; a target address for the branch instruction is read from the respective prefetch table in which the branch instruction hit; and the branch instruction is prefetched to an instruction cache.
    Type: Grant
    Filed: December 29, 2021
    Date of Patent: April 16, 2024
    Assignee: International Business Machines Corporation
    Inventors: Naga P. Gorti, Mohit Karve
  • Patent number: 11960567
    Abstract: A method for performing a fundamental computational primitive in a device is provided, where the device includes a processor and a matrix multiplication accelerator (MMA). The method includes configuring a streaming engine in the device to stream data for the fundamental computational primitive from memory, configuring the MMA to format the data, and executing the fundamental computational primitive by the device.
    Type: Grant
    Filed: July 4, 2021
    Date of Patent: April 16, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: Arthur John Redfern, Timothy David Anderson, Kai Chirca, Chenchi Luo, Zhenhua Yu
  • Patent number: 11860820
    Abstract: Processing data through a storage system in a data pipeline including receiving, by the storage system, a dataset from a collector on a data producer, wherein the dataset is disaggregated from metadata for the dataset by the collector; storing the dataset on the storage system; receiving, by the storage system from a data indexer, a request for data from the dataset, wherein the request for the data comprises the metadata gathered by the collector on the data producer; servicing, by the storage system, the request for the data by locating the data using the metadata gathered by the collector on the data producer and received in the request for the data; and receiving, from the data indexer, indexed data indexed using the metadata gathered by the collector on the data producer.
    Type: Grant
    Filed: April 3, 2019
    Date of Patent: January 2, 2024
    Assignee: PURE STORAGE, INC.
    Inventors: Ivan Jibaja, Curtis Pullen, Stefan Dorsett, Srinivas Chellappa, Prashant Jaikumar
  • Patent number: 11620153
    Abstract: Instruction interrupt suppression for an overflow condition. An instruction is executed, and a determination is made that an overflow condition occurred. Based on a per-instruction overflow interrupt indicator being set to a defined value, interrupt processing for the overflow condition is performed, and based on the per-instruction overflow interrupt indicator being set to another defined value, the interrupt processing for the overflow condition is bypassed.
    Type: Grant
    Filed: February 4, 2019
    Date of Patent: April 4, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Cedric Lichtenau, Jonathan D. Bradbury, Reid Copeland, Petra Leber
  • Patent number: 11562219
    Abstract: An integrated circuit chip apparatus and a processing method performed by an integrated circuit chip apparatus are disclosed. The disclosed integrated circuit chip apparatus and processing method are used for executing a multiplication operation, a convolution operation, or a training operation of a neural network. The present technical solution has the advantages of a reduced computational cost and low power consumption.
    Type: Grant
    Filed: September 2, 2020
    Date of Patent: January 24, 2023
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Shaoli Liu, Xinkai Song, Bingrui Wang, Yao Zhang, Shuai Hu
  • Patent number: 11562216
    Abstract: Provided are an integrated circuit chip apparatus and a related product, the integrated circuit chip apparatus being used for executing a multiplication operation, a convolution operation or a training operation of a neural network. The present technical solution has the advantages of a small amount of calculation and low power consumption.
    Type: Grant
    Filed: December 19, 2019
    Date of Patent: January 24, 2023
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Shaoli Liu, Xinkai Song, Bingrui Wang, Yao Zhang, Shuai Hu
  • Patent number: 11403254
    Abstract: A methodology for populating multiple instruction words is provided. The methodology includes: creating a dependency graph of instruction nodes, each instruction node including at least one instruction operation; first assigning a first instruction node to a first instruction word; identifying a dependent instruction node that is directly dependent upon a result of the first instruction node; first determining whether the dependent instruction node requires any input from two or more sources that are outside of a predefined physical range of each other, the range being smaller than the full extent of the data path; and second assigning, in response to satisfaction of at least one predetermined criteria including a negative result of the first determining, the dependent instruction node to the first instruction word.
    Type: Grant
    Filed: August 14, 2019
    Date of Patent: August 2, 2022
    Assignee: TACHYUM LTD.
    Inventor: Radoslav Danilak
  • Patent number: 11360808
    Abstract: A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.
    Type: Grant
    Filed: April 9, 2017
    Date of Patent: June 14, 2022
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Rajkishore Barik, Eriko Nurvitadhi, Nicolas Galoppo Von Borries, Tsung-Han Lin, Sanjeev Jahagirdar, Vasanth Ranganathan
  • Patent number: 11307858
    Abstract: A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.
    Type: Grant
    Filed: March 24, 2020
    Date of Patent: April 19, 2022
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Raymond Michael Zbiciak, Timothy David Anderson, Jonathan (Son) Hung Tran, Kai Chirca, Daniel Wu, Abhijeet Ashok Chachad, David M. Thompson
  • Patent number: 11237952
    Abstract: The present disclosure provides a mutation test manager configured to initialize multiple computing threads configuring a computing host to perform parallel computation; mutate class files within context of each computing thread; recompile mutated class files independently in each respective computing thread to generate heterogeneous mutants; and execute pending unit tests against heterogeneous mutants independently in each respective computing thread. Consequently, the mutation testing process is decoupled from computational bottlenecks which would result from linear, sequential generation, compilation, and testing of each mutation, especially in the context of JVM® programming languages configured to generate class-rich object code.
    Type: Grant
    Filed: April 7, 2021
    Date of Patent: February 1, 2022
    Assignee: State Farm Mutual Automobile Insurance Company
    Inventors: Andrew L Pearson, Nate Shepherd
  • Patent number: 11182164
    Abstract: Support for instruction fusion is provided. An indication whether an instruction is a paired instruction is received from an instruction decoder. Based on the indication, one dispatch slot or a paired dispatch slot is allocated in the instruction dispatcher queue. A mapper converts logical addresses of sources and targets of the instruction to physical addresses. Either one issue slot or a paired issue slot is allocated in an issue queue based on the indication from the instruction decoder. The instruction execution environment is loaded into the issue queue and issued to an execution unit.
    Type: Grant
    Filed: July 23, 2020
    Date of Patent: November 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Brian D. Barrick, John B. Griswell, Jr., Dung Q. Nguyen, Brian W. Thompto
  • Patent number: 11171881
    Abstract: A device configured to receive a data set and instructions for processing the data set from a network device. The device is further configured to parse the data set into a plurality of data segments to be processed, and generate a plurality of instruction segments from the received instructions. The device is further configured to assign each instruction segment to a resource unit, and to generate control information with instructions for combining processed data segments from the resource units. The device is further configured to receive processed data segments from the resource units, to generate the processed data set, and to output the processed data set to the network device.
    Type: Grant
    Filed: January 28, 2021
    Date of Patent: November 9, 2021
    Assignee: Bank of America Corporation
    Inventors: Manu J. Kurian, Sasidhar Purushothaman, Rajesh Narayanan
  • Patent number: 11150906
    Abstract: An apparatus and method system and method for increasing performance in a processor or other instruction execution device while minimizing energy consumption. A processor includes a first execution pipeline and a second execution pipeline. The first execution pipeline includes a first decode unit and a first execution control unit coupled to the first decode unit. The first execution control unit is configured to control execution of all instructions executable by the processor. The second execution pipeline includes a second decode unit, and a second execution control unit coupled to the second decode unit. The second execution control unit is configured to control execution of a subset of the instructions executable via the first execution control unit.
    Type: Grant
    Filed: October 7, 2019
    Date of Patent: October 19, 2021
    Assignee: Texas Instmments Incorporated
    Inventors: Christian Wiencke, Shrey Bhatia
  • Patent number: 11144324
    Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: October 12, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Matthew T. Sobel, Joshua James Lindner, Neil N. Marketkar, Kai Troester, Emil Talpes, Ashok Tirupathy Venkatachar
  • Patent number: 11144353
    Abstract: Techniques for use in a microprocessor core for soft watermarking in thread shared resources implemented through thread mediation. A thread is removed from a thread mediation decision involving multiple threads competing or requesting to use a shared resource at a current clock cycle based on a number of entries in the shared resource that the thread is estimated to have allocated to it at the current clock cycle. By removing the thread from the thread mediation decision, the thread is stalled from allocating additional entries in the shared resource.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: October 12, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Kai Troester
  • Patent number: 11080194
    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.
    Type: Grant
    Filed: December 27, 2018
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
  • Patent number: 10901744
    Abstract: Aspects of the invention include buffered instruction dispatching to an issue queue. A non-limiting example includes dispatching from a dispatch unit of a processor a first group of instructions selected from a first plurality of instructions to a first issue queue partition of the processor in a first cycle. A second group of instructions selected from the first plurality of instructions is passed to an issue queue buffer of the processor in the first cycle. The second group of instructions is passed from the issue queue buffer to the first issue queue partition in a second cycle. A third group of instructions selected from a second plurality of instructions is dispatched to a second issue queue partition in the second cycle.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: January 26, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mohit S. Karve, Joel A. Silberman, Balaram Sinharoy
  • Patent number: 10846097
    Abstract: The present disclosure includes a mispredict recovery apparatus, which may comprise an instruction execution unit, a branch predictor, and a misprediction recovery unit (MRU). The MRU may provide discrete cycle predictions after a misprediction redirect from the instruction execution unit. The MRU may include a branch confidence filter to generate prediction confidence information for predicted branches. The MRU may include a tag content-addressable memory (CAM). The tag CAM may store frequently mispredicting low-confidence branches, probe the misprediction redirect, and obtain the prediction confidence information from the branch confidence filter. The MRU may include a mispredict recovery buffer (MRB) to store an alternate path for frequently mispredicting low-confidence branches present in the tag CAM without storing the instructions themselves. Also disclosed is a method for recovering from mispredicts associated with the instruction fetch pipeline.
    Type: Grant
    Filed: February 20, 2019
    Date of Patent: November 24, 2020
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Reshma C. Jumani, Fuzhou Zou, Monika Tkaczyk, Eric C. Quinnell
  • Patent number: 10698691
    Abstract: Disclosed are a method and a processing device directed to determining global branch history for branch prediction. The method includes shifting first bits of a branch signature into a current global branch history and performing a bitwise exclusive-or (XOR) function on second bits of the branch signature and shifted bits of the current global branch history. In this way, the current global branch history is updated. The processing device implements the method using a shift logic configured to store and shift bits representing a current global branch history, a register configured to store the current global branch history, decision circuitry configured to determine whether or not a branch is taken, and XOR gates.
    Type: Grant
    Filed: August 30, 2016
    Date of Patent: June 30, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Steven R. Havlir
  • Patent number: 10630682
    Abstract: A network protocol provides mutual authentication of network-connected devices that are parties to a communication channel in environments where the amount of memory and processing power available to the network-connected devices is constrained. When a new device is added to a network, the device contacts a registration service and provides authentication information that proves the authenticity of the device. After verifying the authenticity of the device, the registration service generates a token that can be used to by the device to authenticate with other network entities, and provides the token to the device. The registration service publishes the token using a directory service. When the device connects to another network entity, the device provides the token to the other network entity, and the other network entity authenticates the device by verifying the token using the directory service.
    Type: Grant
    Filed: November 23, 2016
    Date of Patent: April 21, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Ramkishore Bhattacharyya, Amit Mhatre, Ashutosh Thakur, Atulya S. Beheray, Rameez Loladia
  • Patent number: 10606602
    Abstract: An electronic apparatus is provided for obtaining compiling data used in an external processor including a function unit including a plurality of input ports. The electronic apparatus includes a storage configured to store a plurality of instructions, and a processor configured to schedule each of the plurality of instructions in a plurality of cycles, assign a plurality of input data corresponding to the plurality of instructions to the plurality of input ports in a corresponding cycle, and if an unassigned input port among the plurality of input ports is present in a first cycle, assign a part of input data corresponding to an instruction scheduled in a second cycle after the first cycle to the unassigned input port in the first cycle, and obtain the compiling data by assigning remaining data of the input data corresponding the instruction to one of the plurality of input ports in the second cycle.
    Type: Grant
    Filed: July 20, 2017
    Date of Patent: March 31, 2020
    Assignee: Samsung Electronics Co., Ltd
    Inventors: Yeon-bok Lee, Myung-sun Kim, Shin-gyu Kim
  • Patent number: 10514927
    Abstract: A processor includes logic to execute an instruction stream out-of-order. The instruction stream is divided into a plurality of strands and its instructions and those within the streams are ordered by program order (PO). The processor further includes logic to identify an oldest undispatched instruction in the instruction stream and record its associated PO as an executed instruction pointer, identify a most recently committed store instruction in the instruction stream and record its associated PO as a store commitment pointer, a search pointer with PO less than the execution instruction pointer, identify a first set of store instructions in a store buffer with PO less than the search pointer and eligible for commitment, evaluate whether the first set of store instructions is larger than a number of read ports of the store buffer, and adjust the search pointer.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: December 24, 2019
    Assignee: Intel Corporation
    Inventors: Anton Lechanka, Andrey Efimov, Sergey Y. Shishlov, Andrey Kluchnikov, Kamil Garifullin, Igor Burovenko, Boris A. Babayan
  • Patent number: 10514925
    Abstract: Systems, apparatuses, and methods for managing dependencies between instruction operations when speculatively issuing load instruction operations. A processor may maintain dependency vectors for sources of instruction operations dispatched to the scheduler. The dependency vector may include a column for each cycle of the load recovery window and a row for each load execution pipeline. When a load speculatively issues, any instruction operation which is dependent on the load may have a bit set in the earliest bit position of its dependency vector to indicate the dependency. The bit may shift in the dependency vector toward the cancel bit position during each clock cycle as the load executes. If the load does not produce its data at the expected latency, an instruction operation may be canceled if there is a bit in the cancel bit position of the dependency vector row corresponding to the execution pipeline of the load.
    Type: Grant
    Filed: January 28, 2016
    Date of Patent: December 24, 2019
    Assignee: Apple Inc.
    Inventor: Sean M. Reynolds
  • Patent number: 10445017
    Abstract: A memory system includes a memory device including a plurality of command registers; and a memory controller configured to determine whether an empty command register exists among the plurality of command registers, and transmit a new command to the memory device, when an empty command register exists, wherein, when the new command is transmitted from the memory controller, the memory device stores the transmitted new command in the empty command register.
    Type: Grant
    Filed: December 22, 2016
    Date of Patent: October 15, 2019
    Assignee: SK hynix Inc.
    Inventor: Beom Ju Shin
  • Patent number: 10430342
    Abstract: An apparatus includes a buffer configured to store a plurality of instructions previously fetched from a memory, wherein each instruction of the plurality of instructions may be included in a respective thread of a plurality of threads. The apparatus also includes control circuitry configured to select a given thread of the plurality of threads dependent upon a number of instructions in the buffer that are included in the given thread. The control circuitry is also configured to fetch a respective instruction corresponding to the given thread from the memory, and to store the respective instruction in the buffer.
    Type: Grant
    Filed: November 18, 2015
    Date of Patent: October 1, 2019
    Assignee: Oracle International Corporation
    Inventors: Yuan Chou, Gideon Levinsky, Manish Shah, Robert Golla, Matthew Smittle
  • Patent number: 10140138
    Abstract: Methods for supporting wide and efficient front-end operation with guest architecture emulation are disclosed. As a part of a method for supporting wide and efficient front-end operation, upon receiving a request to fetch a first far taken branch instruction, a cache line that includes the first far taken branch instruction, a next cache line and a cache line located at the target of the first far taken branch instruction is read. Based on information that is accessed from a data table, the cache line and either the next cache line or the cache line located at the target is fetched in a single cycle.
    Type: Grant
    Filed: March 17, 2014
    Date of Patent: November 27, 2018
    Assignee: Intel Corporation
    Inventors: Mohammad Abdallah, Ankur Groen, Erika Gunadi, Mandeep Singh, Ravishankar Rao
  • Patent number: 10127558
    Abstract: Systems and methods for automating an invoice approval process are described herein. Rules are created which are evaluated against a set of attributes. A rules engine is automatically invoked upon receipt of a document in an electronic invoice presentment and payment system. The rules engine determines which rules are applicable to documents received and processed in the system, and applies those applicable rules in a pre-defined sequence.
    Type: Grant
    Filed: March 12, 2010
    Date of Patent: November 13, 2018
    Assignee: Altisource S.à r.l.
    Inventors: Russell G. Bulman, Suresh Kumar, Sanket Karjagi, Ritwik Bose, Rajesh Kumar, Biswajit Nayak, Vikram Kamath, Bhavana Sumathi
  • Patent number: 10095518
    Abstract: Instruction queue circuitry maintains an instruction queue to store fetched instructions. Instruction decode circuitry decodes instructions dispatched from the queue. The instruction decode circuitry allocates processor resource(s) for use in execution of the decoded instruction. Detection circuitry detect, for an instruction to be dispatched from a given instruction queue, a prediction indicating whether sufficient processor resources are predicted to be available for allocation to that instruction by the instruction decode circuitry. Dispatch circuitry dispatches an instruction from the queue to the instruction decode circuitry and allows deletion of the dispatched instruction from that instruction queue when the prediction indicates that sufficient processor resources are predicted to be available for allocation to that instruction by the instruction decode circuitry.
    Type: Grant
    Filed: November 16, 2015
    Date of Patent: October 9, 2018
    Assignee: ARM Limited
    Inventors: Andrew James Antony Lees, Ian Michael Caulfield, Peter Richard Greenhalgh
  • Patent number: 10025527
    Abstract: Hardware structures for check pointing a main shift register one or more times which include a circular buffer used to store the data elements most recently shifted onto the main shift register which has an extra data position for each check point and an extra data position for each restorable point in time; an update history shift register which has a data position for each check point which is used to store information indicating whether the circular buffer was updated in a particular clock cycle; a pointer that identifies a subset of the data positions of the circular buffer as active data positions; and check point generation logic that derives each check point by selecting a subset of the active data positions based on the information stored in the update history shift register.
    Type: Grant
    Filed: July 8, 2016
    Date of Patent: July 17, 2018
    Assignee: MIPS Tech, LLC
    Inventors: Philip Day, Julian Bailey
  • Patent number: 9753751
    Abstract: Processing data includes: receiving units of work that each include one or more work elements, and processing a first unit of work using a first compiled dataflow graph (160) loaded into a data processing system (100) in response to receiving the first unit of work. The processing includes: analysis to determine a characteristic of the first unit of work; identifying one or more compiled dataflow graphs from graphs stored in a data storage system (107) that include at least some that were compiled for processing a unit of work having the determined characteristic; loading one of the identified compiled dataflow graphs into the data processing system (100) as the first compiled dataflow graph (160); and generating one or more output work elements from at least one work element in the first unit of work.
    Type: Grant
    Filed: October 22, 2014
    Date of Patent: September 5, 2017
    Assignee: Ab Initio Technology LLC
    Inventors: Matthew Darcy Atterbury, H. Mark Bromley, Wayne Mesard, Arkadi Popov, Stephen Schmidt, Craig W. Stanfill, Joseph Skeffington Wholey
  • Patent number: 9733912
    Abstract: Disclosed here are methods, systems, paradigms and structures for optimizing intermediate representation (IR) of a script code for fast path execution. A fast path is typically a path that handles most commonly occurring tasks more efficiently than less commonly occurring ones which are handled by slow paths. The less commonly occurring tasks may include uncommon cases, error handling, and other anomalies. The IR includes checkpoints which evaluate to two possible values resulting in either a fast path or slow path execution. The IR is optimized for fast path execution by regenerating a checkpoint as a labeled checkpoint. The code in the portion of the IR following the checkpoint is optimized assuming the checkpoint evaluates to a value resulting in fast path. The code for handling situations where the checkpoint evaluates to a value resulting in slow path is transferred to a portion of the IR identified by the label.
    Type: Grant
    Filed: January 27, 2016
    Date of Patent: August 15, 2017
    Assignee: Facebook, Inc.
    Inventors: Ali-Reza Adl-Tabatabai, Guilherme de Lima Ottoni, Michael Paleczny
  • Patent number: 9690620
    Abstract: Methods and architecture for dynamic polymorphic heterogeneous multi-core processor operation are provided. The method for dynamic heterogeneous polymorphic processing includes the steps of receiving a processing task comprising a plurality of serial threads. The method is performed in a processor including a plurality of processing cores, each of the plurality of processing cores being assigned to one of a plurality of core clusters and each of the plurality of core clusters capable of dynamically forming a coalition comprising two or more of its processing cores. The method further includes determining whether each of the plurality of serial threads requires more than one processing core, and sending a go-into-coalition-mode-now instruction to ones of the plurality of core clusters for handling ones of the plurality of serial threads that require more than one processing core.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: June 27, 2017
    Assignee: National University of Singapore
    Inventors: Tulika Mitra, Mihai Pricopi
  • Patent number: 9672037
    Abstract: A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction.
    Type: Grant
    Filed: January 23, 2013
    Date of Patent: June 6, 2017
    Assignee: Apple Inc.
    Inventors: Conrado Blasco-Allue, Sandeep Gupta
  • Patent number: 9558089
    Abstract: The disclosed embodiments provide a system that facilitates testing of an insecure computing environment. During operation, the system obtains a real data set comprising a set of data strings. Next, the system determines a set of frequency distributions associated with the set of data strings. The system then generates a test data set from the real data set, wherein the test data set comprises a set of random data strings that conforms to the set of frequency distributions. Finally, the system tests the insecure computing environment using the test data set.
    Type: Grant
    Filed: November 12, 2014
    Date of Patent: January 31, 2017
    Assignee: INTUIT INC.
    Inventor: Colin R. Dillard
  • Patent number: 9454467
    Abstract: A method of mining test coverage data includes: at a device having one or more processors and memory: sequentially processing each of a plurality of coverage data files that is generated by executing the program using a respective test input of a plurality of test inputs, where the processing of each current coverage data file extracts respective execution counter data from the current coverage data file; after processing each current coverage data file, determining whether the respective execution counter data extracted from the current coverage data file includes a predetermined change relative to the respective execution counter data extracted from previously processed coverage data files; and in response to detecting the predetermined change for the current coverage data file, including the respective test input used to generate the current coverage data file in a test input collection for testing the program.
    Type: Grant
    Filed: August 12, 2014
    Date of Patent: September 27, 2016
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventor: Yunjia Wu
  • Patent number: 9424045
    Abstract: An apparatus and method includes execution circuitry including a wide operand execution unit configured to allow up to N bits of operand data to be processed during execution of a single instruction. Decoder circuitry decodes and generates, for each instruction, at least one control data block identifying an operation to be performed by the execution circuitry and at least two re-combineable control data blocks for the instruction. Issue queue control circuitry then allocates a slot in the issue queue for each of the at least two data blocks and up to M bits of associated operand data, and marks those allocated slots to identify that they contain re-combineable control data blocks. The issue queue control circuitry issues the combined block to said wide operand execution unit along with the operand data contained in each of the allocated slots for said at least two control data blocks.
    Type: Grant
    Filed: January 29, 2013
    Date of Patent: August 23, 2016
    Assignee: ARM Limited
    Inventors: Cedric Denis Robert Airaud, Luca Scalabrino, Frederic Jean Denis Arsanto, Guillaume Schon, Frederic Claude Marie Piry, Albin Pierick Tonnerre
  • Patent number: 9354874
    Abstract: Producer-consumer instructions, comprising a first instruction and a second instruction in program order, are fetched requiring in-order execution, the second instruction is modified by the processor so that the first instruction and second instruction can be completed out-of-order, the modification comprising any one of extending an immediate field of the second instruction using immediate field information of the first instruction or providing a source location of the first instruction as an additional source location to source locations of the second instruction.
    Type: Grant
    Filed: October 3, 2011
    Date of Patent: May 31, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 9354892
    Abstract: Methods, media, and computing systems are provided. The method includes, the media are configured for, and the computing system includes a processor with control logic for allocating memory for storing a plurality of local register states for work items to be executed in single instruction multiple data hardware and for repacking wavefronts that include work items associated with a program instruction responsive to a conditional statement. The repacking is configured to create repacked wavefronts that include at least one of a wavefront containing work items that all pass the conditional statement and a wavefront containing work items that all fail the conditional statement.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: May 31, 2016
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Timothy G. Rogers, Bradford M. Beckmann, James M. O'Connor
  • Patent number: 9286072
    Abstract: Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register.
    Type: Grant
    Filed: October 3, 2011
    Date of Patent: March 15, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 9250916
    Abstract: Embodiments include a method for chaining data in an exposed-pipeline processing element. The method includes separating a multiple instruction word into a first sub-instruction and a second sub-instruction, receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element. The method also includes issuing the first sub-instruction at a first time, issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction, the first pipeline performing the first sub-instruction at a first clock cycle and communicating the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline.
    Type: Grant
    Filed: March 12, 2013
    Date of Patent: February 2, 2016
    Assignee: International Business Machines Corporation
    Inventors: Thomas W. Fox, Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair
  • Patent number: 9118705
    Abstract: A device for detecting network traffic content is provided. The device includes a memory configured for storing one or more signatures, each of the one or more signatures associated with content desired to be detected, and 5 defined by one or more predicates. The device a/so includes a processor configured to receive data associated with network traffic content, execute one or more instructions based on the one or more signatures and the data, and determine whether the network traffic content matches the content desired to be detected.
    Type: Grant
    Filed: March 12, 2013
    Date of Patent: August 25, 2015
    Assignee: Fortinet, Inc.
    Inventor: Michael Xie
  • Publication number: 20150100761
    Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.
    Type: Application
    Filed: December 12, 2014
    Publication date: April 9, 2015
    Applicant: INTEL CORPORATION
    Inventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, ELMOUSTAPHA OULD-AHMED-VALL, Vinodh Gopal
  • Publication number: 20150100760
    Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.
    Type: Application
    Filed: December 12, 2014
    Publication date: April 9, 2015
    Applicant: INTEL CORPORATION
    Inventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, ELMOUSTAPHA OULD-AHMED-VALL, Vinodh Gopal
  • Publication number: 20150095615
    Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions, and of said sequence of instructions, splitting store instructions into a store address instruction and a store data instruction, wherein the store address performs address calculation and fetch, and wherein the store data performs a load of register contents to a memory address. The method further includes, of said sequence of instructions, splitting load instructions into a load address instruction and a load data instruction, wherein the load address performs address calculation and fetch, and wherein the load data performs a load of memory address contents into a register, and reordering the store address and load address instructions earlier and further away from LD/SD the instruction sequence to enable earlier dispatch and execution of the loads and the stores.
    Type: Application
    Filed: December 11, 2014
    Publication date: April 2, 2015
    Inventors: Mohammad A. ABDALLAH, Gregory A. WOODS
  • Patent number: 8977815
    Abstract: A processing pipeline 6, 8, 10, 12 is provided with a main query stage 20 and a fetch stage 22. A buffer 24 stores program instructions which have missed within a cache memory 14. Query generation circuitry within the main query stage 20 and within a buffer query stage 26 serve to concurrently generate a main query request and a buffer query request sent to the cache memory 14. The cache memory returns a main query response and a buffer query response. Arbitration circuitry 28 controls multiplexers 30, 32 and 34 to direct the program instruction at the main query stage 20, and the program instruction stored within the buffer 24 and the buffer query stage 26 to pass either to the fetch stage 22 or to the buffer 24. The multiplexer 30 can also select a new instruction to be passed to the main query stage 20.
    Type: Grant
    Filed: November 29, 2010
    Date of Patent: March 10, 2015
    Assignee: ARM Limited
    Inventors: Frode Heggelund, Rune Holm, Andreas Due Engh-Halstvedt, Edvard Feilding
  • Patent number: 8972689
    Abstract: A storage processor identifies latency of memory drives for different numbers of concurrent storage operations. The identified latency is used to identify debt limits for the number of concurrent storage operations issued to the memory drives. The storage processor may issue additional storage operations to the memory devices when the number of storage operations is within the debt limit. Storage operations may be deferred when the number of storage operations is outside the debt limit.
    Type: Grant
    Filed: February 2, 2011
    Date of Patent: March 3, 2015
    Assignee: Violin Memory, Inc.
    Inventor: Erik de la Iglesia
  • Publication number: 20150039859
    Abstract: A method for accelerating code optimization a microprocessor. The method includes fetching an incoming microinstruction sequence using an instruction fetch component and transferring the fetched macroinstructions to a decoding component for decoding into microinstructions. Optimization processing is performed by reordering the microinstruction sequence into an optimized microinstruction sequence comprising a plurality of dependent code groups. The optimized microinstruction sequence is output to a microprocessor pipeline for execution. A copy of the optimized microinstruction sequence is stored into a sequence cache for subsequent use upon a subsequent hit optimized microinstruction sequence.
    Type: Application
    Filed: November 22, 2011
    Publication date: February 5, 2015
    Inventor: Mohammad Abdallah
  • Publication number: 20150032997
    Abstract: Tracking global history vector in high performance out of order superscalar processors, in one aspect, may comprise providing a shift register storing global history vector that stores branch predictions and outcomes. A counter is maintained to determine a number of bits to shift the shift register to recover branch history. In another aspect, the global history vector may be implemented with a circular buffer structure. Youngest and oldest pointers to the circular buffer are maintained and used in recovery.
    Type: Application
    Filed: July 23, 2013
    Publication date: January 29, 2015
    Applicant: International Business Machines Corporation
    Inventors: Richard J. Eickemeyer, Tejas Karkhanis, Brian R. Konigsburg, David S. Levitan, Douglas R. G. Logan, Jose E. Moreira, Mauricio J. Serrano
  • Publication number: 20150019840
    Abstract: This invention addresses implements a range of interesting technologies into a single block. Each DSP CPU has a streaming engine. The streaming engines include: a SE to L2 interface that can request 512 bits/cycle from L2; a loose binding between SE and L2 interface, to allow a single stream to peak at 1024 bits/cycle; one-way coherence where the SE sees all earlier writes cached in system, but not writes that occur after stream opens; full protection against single-bit data errors within its internal storage via single-bit parity with semi-automatic restart on parity error.
    Type: Application
    Filed: July 15, 2014
    Publication date: January 15, 2015
    Inventors: Timothy D. Anderson, Joseph Zbiciak, Duc Quang Bui, Abnijeet A. Chachad, Kai Chirca, Naveen Bhoria, Matthew D. Pierson, Daniel Wu
  • Publication number: 20140317383
    Abstract: Provided are an instruction compression apparatus and method for a very long instruction word (VLIW) processor, and an instruction fetching apparatus and method. The instruction compression apparatus includes: an indicator generator configured to generate an indicator code that indicates an issue width of an instruction bundle to be executed in the VLIW processor, and a number of No-Operation (NOP) instruction bundles following the instruction bundle; an instruction compressor configured to compress the instruction bundle by removing at least one of NOP instructions from the instruction bundle and the NOP instruction bundles following the instruction bundle; and an instruction converter configured to include the generated indicator code in the compressed instruction bundle.
    Type: Application
    Filed: April 22, 2014
    Publication date: October 23, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jae-Un PARK, Suk-jin KIM