Of Multiple Instructions Simultaneously Patents (Class 712/206)
-
Patent number: 12210900Abstract: A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.Type: GrantFiled: May 17, 2022Date of Patent: January 28, 2025Assignee: INTEL CORPORATIONInventors: Joydeep Ray, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Rajkishore Barik, Eriko Nurvitadhi, Nicolas Galoppo Von Borries, Tsung-Han Lin, Sanjeev Jahagirdar, Vasanth Ranganathan
-
Patent number: 12204908Abstract: A branch predictor predicts a first outcome of a first branch in a first block of instructions. Fetch logic fetches instructions for speculative execution along a first path indicated by the first outcome. Information representing a remainder of the first block is stored in response to the first predicted outcome being taken. In response to the first branch instruction being not taken, the branch predictor is restarted based on the remainder block. In some cases, entries corresponding to second blocks along speculative paths from the first block are accessed using an address of the first block as an index into a branch prediction structure. Outcomes of branch instructions in the second blocks are concurrently predicted using a corresponding set of instances of branch conditional logic and the predicted outcomes are used in combination with the remainder block to restart the branch predictor in response to mispredictions.Type: GrantFiled: June 4, 2018Date of Patent: January 21, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Marius Evers, Douglas Williams, Ashok T. Venkatachar, Sudherssen Kalaiselvan
-
Patent number: 11960567Abstract: A method for performing a fundamental computational primitive in a device is provided, where the device includes a processor and a matrix multiplication accelerator (MMA). The method includes configuring a streaming engine in the device to stream data for the fundamental computational primitive from memory, configuring the MMA to format the data, and executing the fundamental computational primitive by the device.Type: GrantFiled: July 4, 2021Date of Patent: April 16, 2024Assignee: Texas Instruments IncorporatedInventors: Arthur John Redfern, Timothy David Anderson, Kai Chirca, Chenchi Luo, Zhenhua Yu
-
Patent number: 11960893Abstract: A method, programming product, and/or system for prefetching instructions includes an instruction prefetch table that has a plurality of entries, each entry for storing a first portion of an indirect branch instruction address and a target address, wherein the indirect branch instruction has multiple target addresses and the instruction prefetch table is accessed by an index obtained by hashing a second portion of bits of the indirect branch instruction address with an information vector of the indirect branch instruction. A further embodiment includes a first prefetch table for uni-target branch instructions and a second prefetch table for multi-target branch instructions. In operation it is determined whether a branch instruction hits in one of the multiple prefetch tables; a target address for the branch instruction is read from the respective prefetch table in which the branch instruction hit; and the branch instruction is prefetched to an instruction cache.Type: GrantFiled: December 29, 2021Date of Patent: April 16, 2024Assignee: International Business Machines CorporationInventors: Naga P. Gorti, Mohit Karve
-
Patent number: 11860820Abstract: Processing data through a storage system in a data pipeline including receiving, by the storage system, a dataset from a collector on a data producer, wherein the dataset is disaggregated from metadata for the dataset by the collector; storing the dataset on the storage system; receiving, by the storage system from a data indexer, a request for data from the dataset, wherein the request for the data comprises the metadata gathered by the collector on the data producer; servicing, by the storage system, the request for the data by locating the data using the metadata gathered by the collector on the data producer and received in the request for the data; and receiving, from the data indexer, indexed data indexed using the metadata gathered by the collector on the data producer.Type: GrantFiled: April 3, 2019Date of Patent: January 2, 2024Assignee: PURE STORAGE, INC.Inventors: Ivan Jibaja, Curtis Pullen, Stefan Dorsett, Srinivas Chellappa, Prashant Jaikumar
-
Patent number: 11620153Abstract: Instruction interrupt suppression for an overflow condition. An instruction is executed, and a determination is made that an overflow condition occurred. Based on a per-instruction overflow interrupt indicator being set to a defined value, interrupt processing for the overflow condition is performed, and based on the per-instruction overflow interrupt indicator being set to another defined value, the interrupt processing for the overflow condition is bypassed.Type: GrantFiled: February 4, 2019Date of Patent: April 4, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Cedric Lichtenau, Jonathan D. Bradbury, Reid Copeland, Petra Leber
-
Patent number: 11562216Abstract: Provided are an integrated circuit chip apparatus and a related product, the integrated circuit chip apparatus being used for executing a multiplication operation, a convolution operation or a training operation of a neural network. The present technical solution has the advantages of a small amount of calculation and low power consumption.Type: GrantFiled: December 19, 2019Date of Patent: January 24, 2023Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITEDInventors: Shaoli Liu, Xinkai Song, Bingrui Wang, Yao Zhang, Shuai Hu
-
Patent number: 11562219Abstract: An integrated circuit chip apparatus and a processing method performed by an integrated circuit chip apparatus are disclosed. The disclosed integrated circuit chip apparatus and processing method are used for executing a multiplication operation, a convolution operation, or a training operation of a neural network. The present technical solution has the advantages of a reduced computational cost and low power consumption.Type: GrantFiled: September 2, 2020Date of Patent: January 24, 2023Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITEDInventors: Shaoli Liu, Xinkai Song, Bingrui Wang, Yao Zhang, Shuai Hu
-
Patent number: 11403254Abstract: A methodology for populating multiple instruction words is provided. The methodology includes: creating a dependency graph of instruction nodes, each instruction node including at least one instruction operation; first assigning a first instruction node to a first instruction word; identifying a dependent instruction node that is directly dependent upon a result of the first instruction node; first determining whether the dependent instruction node requires any input from two or more sources that are outside of a predefined physical range of each other, the range being smaller than the full extent of the data path; and second assigning, in response to satisfaction of at least one predetermined criteria including a negative result of the first determining, the dependent instruction node to the first instruction word.Type: GrantFiled: August 14, 2019Date of Patent: August 2, 2022Assignee: TACHYUM LTD.Inventor: Radoslav Danilak
-
Patent number: 11360808Abstract: A mechanism is described for facilitating intelligent thread scheduling at autonomous machines. A method of embodiments, as described herein, includes detecting dependency information relating to a plurality of threads corresponding to a plurality of workloads associated with tasks relating to a processor including a graphics processor. The method may further include generating a tree of thread groups based on the dependency information, where each thread group includes multiple threads, and scheduling one or more of the thread groups associated a similar dependency to avoid dependency conflicts.Type: GrantFiled: April 9, 2017Date of Patent: June 14, 2022Assignee: Intel CorporationInventors: Joydeep Ray, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Rajkishore Barik, Eriko Nurvitadhi, Nicolas Galoppo Von Borries, Tsung-Han Lin, Sanjeev Jahagirdar, Vasanth Ranganathan
-
Patent number: 11307858Abstract: A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.Type: GrantFiled: March 24, 2020Date of Patent: April 19, 2022Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Joseph Raymond Michael Zbiciak, Timothy David Anderson, Jonathan (Son) Hung Tran, Kai Chirca, Daniel Wu, Abhijeet Ashok Chachad, David M. Thompson
-
Patent number: 11237952Abstract: The present disclosure provides a mutation test manager configured to initialize multiple computing threads configuring a computing host to perform parallel computation; mutate class files within context of each computing thread; recompile mutated class files independently in each respective computing thread to generate heterogeneous mutants; and execute pending unit tests against heterogeneous mutants independently in each respective computing thread. Consequently, the mutation testing process is decoupled from computational bottlenecks which would result from linear, sequential generation, compilation, and testing of each mutation, especially in the context of JVM® programming languages configured to generate class-rich object code.Type: GrantFiled: April 7, 2021Date of Patent: February 1, 2022Assignee: State Farm Mutual Automobile Insurance CompanyInventors: Andrew L Pearson, Nate Shepherd
-
Patent number: 11182164Abstract: Support for instruction fusion is provided. An indication whether an instruction is a paired instruction is received from an instruction decoder. Based on the indication, one dispatch slot or a paired dispatch slot is allocated in the instruction dispatcher queue. A mapper converts logical addresses of sources and targets of the instruction to physical addresses. Either one issue slot or a paired issue slot is allocated in an issue queue based on the indication from the instruction decoder. The instruction execution environment is loaded into the issue queue and issued to an execution unit.Type: GrantFiled: July 23, 2020Date of Patent: November 23, 2021Assignee: International Business Machines CorporationInventors: Brian D. Barrick, John B. Griswell, Jr., Dung Q. Nguyen, Brian W. Thompto
-
Patent number: 11171881Abstract: A device configured to receive a data set and instructions for processing the data set from a network device. The device is further configured to parse the data set into a plurality of data segments to be processed, and generate a plurality of instruction segments from the received instructions. The device is further configured to assign each instruction segment to a resource unit, and to generate control information with instructions for combining processed data segments from the resource units. The device is further configured to receive processed data segments from the resource units, to generate the processed data set, and to output the processed data set to the network device.Type: GrantFiled: January 28, 2021Date of Patent: November 9, 2021Assignee: Bank of America CorporationInventors: Manu J. Kurian, Sasidhar Purushothaman, Rajesh Narayanan
-
Patent number: 11150906Abstract: An apparatus and method system and method for increasing performance in a processor or other instruction execution device while minimizing energy consumption. A processor includes a first execution pipeline and a second execution pipeline. The first execution pipeline includes a first decode unit and a first execution control unit coupled to the first decode unit. The first execution control unit is configured to control execution of all instructions executable by the processor. The second execution pipeline includes a second decode unit, and a second execution control unit coupled to the second decode unit. The second execution control unit is configured to control execution of a subset of the instructions executable via the first execution control unit.Type: GrantFiled: October 7, 2019Date of Patent: October 19, 2021Assignee: Texas Instmments IncorporatedInventors: Christian Wiencke, Shrey Bhatia
-
Patent number: 11144353Abstract: Techniques for use in a microprocessor core for soft watermarking in thread shared resources implemented through thread mediation. A thread is removed from a thread mediation decision involving multiple threads competing or requesting to use a shared resource at a current clock cycle based on a number of entries in the shared resource that the thread is estimated to have allocated to it at the current clock cycle. By removing the thread from the thread mediation decision, the thread is stalled from allocating additional entries in the shared resource.Type: GrantFiled: September 27, 2019Date of Patent: October 12, 2021Assignee: Advanced Micro Devices, Inc.Inventor: Kai Troester
-
Patent number: 11144324Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.Type: GrantFiled: September 27, 2019Date of Patent: October 12, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Matthew T. Sobel, Joshua James Lindner, Neil N. Marketkar, Kai Troester, Emil Talpes, Ashok Tirupathy Venkatachar
-
Patent number: 11080194Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.Type: GrantFiled: December 27, 2018Date of Patent: August 3, 2021Assignee: Intel CorporationInventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
-
Patent number: 10901744Abstract: Aspects of the invention include buffered instruction dispatching to an issue queue. A non-limiting example includes dispatching from a dispatch unit of a processor a first group of instructions selected from a first plurality of instructions to a first issue queue partition of the processor in a first cycle. A second group of instructions selected from the first plurality of instructions is passed to an issue queue buffer of the processor in the first cycle. The second group of instructions is passed from the issue queue buffer to the first issue queue partition in a second cycle. A third group of instructions selected from a second plurality of instructions is dispatched to a second issue queue partition in the second cycle.Type: GrantFiled: November 30, 2017Date of Patent: January 26, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Mohit S. Karve, Joel A. Silberman, Balaram Sinharoy
-
Patent number: 10846097Abstract: The present disclosure includes a mispredict recovery apparatus, which may comprise an instruction execution unit, a branch predictor, and a misprediction recovery unit (MRU). The MRU may provide discrete cycle predictions after a misprediction redirect from the instruction execution unit. The MRU may include a branch confidence filter to generate prediction confidence information for predicted branches. The MRU may include a tag content-addressable memory (CAM). The tag CAM may store frequently mispredicting low-confidence branches, probe the misprediction redirect, and obtain the prediction confidence information from the branch confidence filter. The MRU may include a mispredict recovery buffer (MRB) to store an alternate path for frequently mispredicting low-confidence branches present in the tag CAM without storing the instructions themselves. Also disclosed is a method for recovering from mispredicts associated with the instruction fetch pipeline.Type: GrantFiled: February 20, 2019Date of Patent: November 24, 2020Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Reshma C. Jumani, Fuzhou Zou, Monika Tkaczyk, Eric C. Quinnell
-
Patent number: 10698691Abstract: Disclosed are a method and a processing device directed to determining global branch history for branch prediction. The method includes shifting first bits of a branch signature into a current global branch history and performing a bitwise exclusive-or (XOR) function on second bits of the branch signature and shifted bits of the current global branch history. In this way, the current global branch history is updated. The processing device implements the method using a shift logic configured to store and shift bits representing a current global branch history, a register configured to store the current global branch history, decision circuitry configured to determine whether or not a branch is taken, and XOR gates.Type: GrantFiled: August 30, 2016Date of Patent: June 30, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Steven R. Havlir
-
Patent number: 10630682Abstract: A network protocol provides mutual authentication of network-connected devices that are parties to a communication channel in environments where the amount of memory and processing power available to the network-connected devices is constrained. When a new device is added to a network, the device contacts a registration service and provides authentication information that proves the authenticity of the device. After verifying the authenticity of the device, the registration service generates a token that can be used to by the device to authenticate with other network entities, and provides the token to the device. The registration service publishes the token using a directory service. When the device connects to another network entity, the device provides the token to the other network entity, and the other network entity authenticates the device by verifying the token using the directory service.Type: GrantFiled: November 23, 2016Date of Patent: April 21, 2020Assignee: Amazon Technologies, Inc.Inventors: Ramkishore Bhattacharyya, Amit Mhatre, Ashutosh Thakur, Atulya S. Beheray, Rameez Loladia
-
Patent number: 10606602Abstract: An electronic apparatus is provided for obtaining compiling data used in an external processor including a function unit including a plurality of input ports. The electronic apparatus includes a storage configured to store a plurality of instructions, and a processor configured to schedule each of the plurality of instructions in a plurality of cycles, assign a plurality of input data corresponding to the plurality of instructions to the plurality of input ports in a corresponding cycle, and if an unassigned input port among the plurality of input ports is present in a first cycle, assign a part of input data corresponding to an instruction scheduled in a second cycle after the first cycle to the unassigned input port in the first cycle, and obtain the compiling data by assigning remaining data of the input data corresponding the instruction to one of the plurality of input ports in the second cycle.Type: GrantFiled: July 20, 2017Date of Patent: March 31, 2020Assignee: Samsung Electronics Co., LtdInventors: Yeon-bok Lee, Myung-sun Kim, Shin-gyu Kim
-
Patent number: 10514927Abstract: A processor includes logic to execute an instruction stream out-of-order. The instruction stream is divided into a plurality of strands and its instructions and those within the streams are ordered by program order (PO). The processor further includes logic to identify an oldest undispatched instruction in the instruction stream and record its associated PO as an executed instruction pointer, identify a most recently committed store instruction in the instruction stream and record its associated PO as a store commitment pointer, a search pointer with PO less than the execution instruction pointer, identify a first set of store instructions in a store buffer with PO less than the search pointer and eligible for commitment, evaluate whether the first set of store instructions is larger than a number of read ports of the store buffer, and adjust the search pointer.Type: GrantFiled: March 27, 2014Date of Patent: December 24, 2019Assignee: Intel CorporationInventors: Anton Lechanka, Andrey Efimov, Sergey Y. Shishlov, Andrey Kluchnikov, Kamil Garifullin, Igor Burovenko, Boris A. Babayan
-
Patent number: 10514925Abstract: Systems, apparatuses, and methods for managing dependencies between instruction operations when speculatively issuing load instruction operations. A processor may maintain dependency vectors for sources of instruction operations dispatched to the scheduler. The dependency vector may include a column for each cycle of the load recovery window and a row for each load execution pipeline. When a load speculatively issues, any instruction operation which is dependent on the load may have a bit set in the earliest bit position of its dependency vector to indicate the dependency. The bit may shift in the dependency vector toward the cancel bit position during each clock cycle as the load executes. If the load does not produce its data at the expected latency, an instruction operation may be canceled if there is a bit in the cancel bit position of the dependency vector row corresponding to the execution pipeline of the load.Type: GrantFiled: January 28, 2016Date of Patent: December 24, 2019Assignee: Apple Inc.Inventor: Sean M. Reynolds
-
Patent number: 10445017Abstract: A memory system includes a memory device including a plurality of command registers; and a memory controller configured to determine whether an empty command register exists among the plurality of command registers, and transmit a new command to the memory device, when an empty command register exists, wherein, when the new command is transmitted from the memory controller, the memory device stores the transmitted new command in the empty command register.Type: GrantFiled: December 22, 2016Date of Patent: October 15, 2019Assignee: SK hynix Inc.Inventor: Beom Ju Shin
-
Patent number: 10430342Abstract: An apparatus includes a buffer configured to store a plurality of instructions previously fetched from a memory, wherein each instruction of the plurality of instructions may be included in a respective thread of a plurality of threads. The apparatus also includes control circuitry configured to select a given thread of the plurality of threads dependent upon a number of instructions in the buffer that are included in the given thread. The control circuitry is also configured to fetch a respective instruction corresponding to the given thread from the memory, and to store the respective instruction in the buffer.Type: GrantFiled: November 18, 2015Date of Patent: October 1, 2019Assignee: Oracle International CorporationInventors: Yuan Chou, Gideon Levinsky, Manish Shah, Robert Golla, Matthew Smittle
-
Patent number: 10140138Abstract: Methods for supporting wide and efficient front-end operation with guest architecture emulation are disclosed. As a part of a method for supporting wide and efficient front-end operation, upon receiving a request to fetch a first far taken branch instruction, a cache line that includes the first far taken branch instruction, a next cache line and a cache line located at the target of the first far taken branch instruction is read. Based on information that is accessed from a data table, the cache line and either the next cache line or the cache line located at the target is fetched in a single cycle.Type: GrantFiled: March 17, 2014Date of Patent: November 27, 2018Assignee: Intel CorporationInventors: Mohammad Abdallah, Ankur Groen, Erika Gunadi, Mandeep Singh, Ravishankar Rao
-
Patent number: 10127558Abstract: Systems and methods for automating an invoice approval process are described herein. Rules are created which are evaluated against a set of attributes. A rules engine is automatically invoked upon receipt of a document in an electronic invoice presentment and payment system. The rules engine determines which rules are applicable to documents received and processed in the system, and applies those applicable rules in a pre-defined sequence.Type: GrantFiled: March 12, 2010Date of Patent: November 13, 2018Assignee: Altisource S.à r.l.Inventors: Russell G. Bulman, Suresh Kumar, Sanket Karjagi, Ritwik Bose, Rajesh Kumar, Biswajit Nayak, Vikram Kamath, Bhavana Sumathi
-
Patent number: 10095518Abstract: Instruction queue circuitry maintains an instruction queue to store fetched instructions. Instruction decode circuitry decodes instructions dispatched from the queue. The instruction decode circuitry allocates processor resource(s) for use in execution of the decoded instruction. Detection circuitry detect, for an instruction to be dispatched from a given instruction queue, a prediction indicating whether sufficient processor resources are predicted to be available for allocation to that instruction by the instruction decode circuitry. Dispatch circuitry dispatches an instruction from the queue to the instruction decode circuitry and allows deletion of the dispatched instruction from that instruction queue when the prediction indicates that sufficient processor resources are predicted to be available for allocation to that instruction by the instruction decode circuitry.Type: GrantFiled: November 16, 2015Date of Patent: October 9, 2018Assignee: ARM LimitedInventors: Andrew James Antony Lees, Ian Michael Caulfield, Peter Richard Greenhalgh
-
Patent number: 10025527Abstract: Hardware structures for check pointing a main shift register one or more times which include a circular buffer used to store the data elements most recently shifted onto the main shift register which has an extra data position for each check point and an extra data position for each restorable point in time; an update history shift register which has a data position for each check point which is used to store information indicating whether the circular buffer was updated in a particular clock cycle; a pointer that identifies a subset of the data positions of the circular buffer as active data positions; and check point generation logic that derives each check point by selecting a subset of the active data positions based on the information stored in the update history shift register.Type: GrantFiled: July 8, 2016Date of Patent: July 17, 2018Assignee: MIPS Tech, LLCInventors: Philip Day, Julian Bailey
-
Patent number: 9753751Abstract: Processing data includes: receiving units of work that each include one or more work elements, and processing a first unit of work using a first compiled dataflow graph (160) loaded into a data processing system (100) in response to receiving the first unit of work. The processing includes: analysis to determine a characteristic of the first unit of work; identifying one or more compiled dataflow graphs from graphs stored in a data storage system (107) that include at least some that were compiled for processing a unit of work having the determined characteristic; loading one of the identified compiled dataflow graphs into the data processing system (100) as the first compiled dataflow graph (160); and generating one or more output work elements from at least one work element in the first unit of work.Type: GrantFiled: October 22, 2014Date of Patent: September 5, 2017Assignee: Ab Initio Technology LLCInventors: Matthew Darcy Atterbury, H. Mark Bromley, Wayne Mesard, Arkadi Popov, Stephen Schmidt, Craig W. Stanfill, Joseph Skeffington Wholey
-
Patent number: 9733912Abstract: Disclosed here are methods, systems, paradigms and structures for optimizing intermediate representation (IR) of a script code for fast path execution. A fast path is typically a path that handles most commonly occurring tasks more efficiently than less commonly occurring ones which are handled by slow paths. The less commonly occurring tasks may include uncommon cases, error handling, and other anomalies. The IR includes checkpoints which evaluate to two possible values resulting in either a fast path or slow path execution. The IR is optimized for fast path execution by regenerating a checkpoint as a labeled checkpoint. The code in the portion of the IR following the checkpoint is optimized assuming the checkpoint evaluates to a value resulting in fast path. The code for handling situations where the checkpoint evaluates to a value resulting in slow path is transferred to a portion of the IR identified by the label.Type: GrantFiled: January 27, 2016Date of Patent: August 15, 2017Assignee: Facebook, Inc.Inventors: Ali-Reza Adl-Tabatabai, Guilherme de Lima Ottoni, Michael Paleczny
-
Patent number: 9690620Abstract: Methods and architecture for dynamic polymorphic heterogeneous multi-core processor operation are provided. The method for dynamic heterogeneous polymorphic processing includes the steps of receiving a processing task comprising a plurality of serial threads. The method is performed in a processor including a plurality of processing cores, each of the plurality of processing cores being assigned to one of a plurality of core clusters and each of the plurality of core clusters capable of dynamically forming a coalition comprising two or more of its processing cores. The method further includes determining whether each of the plurality of serial threads requires more than one processing core, and sending a go-into-coalition-mode-now instruction to ones of the plurality of core clusters for handling ones of the plurality of serial threads that require more than one processing core.Type: GrantFiled: December 3, 2012Date of Patent: June 27, 2017Assignee: National University of SingaporeInventors: Tulika Mitra, Mihai Pricopi
-
Patent number: 9672037Abstract: A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction.Type: GrantFiled: January 23, 2013Date of Patent: June 6, 2017Assignee: Apple Inc.Inventors: Conrado Blasco-Allue, Sandeep Gupta
-
Patent number: 9558089Abstract: The disclosed embodiments provide a system that facilitates testing of an insecure computing environment. During operation, the system obtains a real data set comprising a set of data strings. Next, the system determines a set of frequency distributions associated with the set of data strings. The system then generates a test data set from the real data set, wherein the test data set comprises a set of random data strings that conforms to the set of frequency distributions. Finally, the system tests the insecure computing environment using the test data set.Type: GrantFiled: November 12, 2014Date of Patent: January 31, 2017Assignee: INTUIT INC.Inventor: Colin R. Dillard
-
Patent number: 9454467Abstract: A method of mining test coverage data includes: at a device having one or more processors and memory: sequentially processing each of a plurality of coverage data files that is generated by executing the program using a respective test input of a plurality of test inputs, where the processing of each current coverage data file extracts respective execution counter data from the current coverage data file; after processing each current coverage data file, determining whether the respective execution counter data extracted from the current coverage data file includes a predetermined change relative to the respective execution counter data extracted from previously processed coverage data files; and in response to detecting the predetermined change for the current coverage data file, including the respective test input used to generate the current coverage data file in a test input collection for testing the program.Type: GrantFiled: August 12, 2014Date of Patent: September 27, 2016Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventor: Yunjia Wu
-
Patent number: 9424045Abstract: An apparatus and method includes execution circuitry including a wide operand execution unit configured to allow up to N bits of operand data to be processed during execution of a single instruction. Decoder circuitry decodes and generates, for each instruction, at least one control data block identifying an operation to be performed by the execution circuitry and at least two re-combineable control data blocks for the instruction. Issue queue control circuitry then allocates a slot in the issue queue for each of the at least two data blocks and up to M bits of associated operand data, and marks those allocated slots to identify that they contain re-combineable control data blocks. The issue queue control circuitry issues the combined block to said wide operand execution unit along with the operand data contained in each of the allocated slots for said at least two control data blocks.Type: GrantFiled: January 29, 2013Date of Patent: August 23, 2016Assignee: ARM LimitedInventors: Cedric Denis Robert Airaud, Luca Scalabrino, Frederic Jean Denis Arsanto, Guillaume Schon, Frederic Claude Marie Piry, Albin Pierick Tonnerre
-
Patent number: 9354892Abstract: Methods, media, and computing systems are provided. The method includes, the media are configured for, and the computing system includes a processor with control logic for allocating memory for storing a plurality of local register states for work items to be executed in single instruction multiple data hardware and for repacking wavefronts that include work items associated with a program instruction responsive to a conditional statement. The repacking is configured to create repacked wavefronts that include at least one of a wavefront containing work items that all pass the conditional statement and a wavefront containing work items that all fail the conditional statement.Type: GrantFiled: November 29, 2012Date of Patent: May 31, 2016Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Timothy G. Rogers, Bradford M. Beckmann, James M. O'Connor
-
Patent number: 9354874Abstract: Producer-consumer instructions, comprising a first instruction and a second instruction in program order, are fetched requiring in-order execution, the second instruction is modified by the processor so that the first instruction and second instruction can be completed out-of-order, the modification comprising any one of extending an immediate field of the second instruction using immediate field information of the first instruction or providing a source location of the first instruction as an additional source location to source locations of the second instruction.Type: GrantFiled: October 3, 2011Date of Patent: May 31, 2016Assignee: International Business Machines CorporationInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 9286072Abstract: Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register.Type: GrantFiled: October 3, 2011Date of Patent: March 15, 2016Assignee: International Business Machines CorporationInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 9250916Abstract: Embodiments include a method for chaining data in an exposed-pipeline processing element. The method includes separating a multiple instruction word into a first sub-instruction and a second sub-instruction, receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element. The method also includes issuing the first sub-instruction at a first time, issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction, the first pipeline performing the first sub-instruction at a first clock cycle and communicating the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline.Type: GrantFiled: March 12, 2013Date of Patent: February 2, 2016Assignee: International Business Machines CorporationInventors: Thomas W. Fox, Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair
-
Patent number: 9118705Abstract: A device for detecting network traffic content is provided. The device includes a memory configured for storing one or more signatures, each of the one or more signatures associated with content desired to be detected, and 5 defined by one or more predicates. The device a/so includes a processor configured to receive data associated with network traffic content, execute one or more instructions based on the one or more signatures and the data, and determine whether the network traffic content matches the content desired to be detected.Type: GrantFiled: March 12, 2013Date of Patent: August 25, 2015Assignee: Fortinet, Inc.Inventor: Michael Xie
-
Publication number: 20150100761Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.Type: ApplicationFiled: December 12, 2014Publication date: April 9, 2015Applicant: INTEL CORPORATIONInventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, ELMOUSTAPHA OULD-AHMED-VALL, Vinodh Gopal
-
Publication number: 20150100760Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.Type: ApplicationFiled: December 12, 2014Publication date: April 9, 2015Applicant: INTEL CORPORATIONInventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, ELMOUSTAPHA OULD-AHMED-VALL, Vinodh Gopal
-
Publication number: 20150095615Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions, and of said sequence of instructions, splitting store instructions into a store address instruction and a store data instruction, wherein the store address performs address calculation and fetch, and wherein the store data performs a load of register contents to a memory address. The method further includes, of said sequence of instructions, splitting load instructions into a load address instruction and a load data instruction, wherein the load address performs address calculation and fetch, and wherein the load data performs a load of memory address contents into a register, and reordering the store address and load address instructions earlier and further away from LD/SD the instruction sequence to enable earlier dispatch and execution of the loads and the stores.Type: ApplicationFiled: December 11, 2014Publication date: April 2, 2015Inventors: Mohammad A. ABDALLAH, Gregory A. WOODS
-
Patent number: 8977815Abstract: A processing pipeline 6, 8, 10, 12 is provided with a main query stage 20 and a fetch stage 22. A buffer 24 stores program instructions which have missed within a cache memory 14. Query generation circuitry within the main query stage 20 and within a buffer query stage 26 serve to concurrently generate a main query request and a buffer query request sent to the cache memory 14. The cache memory returns a main query response and a buffer query response. Arbitration circuitry 28 controls multiplexers 30, 32 and 34 to direct the program instruction at the main query stage 20, and the program instruction stored within the buffer 24 and the buffer query stage 26 to pass either to the fetch stage 22 or to the buffer 24. The multiplexer 30 can also select a new instruction to be passed to the main query stage 20.Type: GrantFiled: November 29, 2010Date of Patent: March 10, 2015Assignee: ARM LimitedInventors: Frode Heggelund, Rune Holm, Andreas Due Engh-Halstvedt, Edvard Feilding
-
Patent number: 8972689Abstract: A storage processor identifies latency of memory drives for different numbers of concurrent storage operations. The identified latency is used to identify debt limits for the number of concurrent storage operations issued to the memory drives. The storage processor may issue additional storage operations to the memory devices when the number of storage operations is within the debt limit. Storage operations may be deferred when the number of storage operations is outside the debt limit.Type: GrantFiled: February 2, 2011Date of Patent: March 3, 2015Assignee: Violin Memory, Inc.Inventor: Erik de la Iglesia
-
Publication number: 20150039859Abstract: A method for accelerating code optimization a microprocessor. The method includes fetching an incoming microinstruction sequence using an instruction fetch component and transferring the fetched macroinstructions to a decoding component for decoding into microinstructions. Optimization processing is performed by reordering the microinstruction sequence into an optimized microinstruction sequence comprising a plurality of dependent code groups. The optimized microinstruction sequence is output to a microprocessor pipeline for execution. A copy of the optimized microinstruction sequence is stored into a sequence cache for subsequent use upon a subsequent hit optimized microinstruction sequence.Type: ApplicationFiled: November 22, 2011Publication date: February 5, 2015Inventor: Mohammad Abdallah
-
Publication number: 20150032997Abstract: Tracking global history vector in high performance out of order superscalar processors, in one aspect, may comprise providing a shift register storing global history vector that stores branch predictions and outcomes. A counter is maintained to determine a number of bits to shift the shift register to recover branch history. In another aspect, the global history vector may be implemented with a circular buffer structure. Youngest and oldest pointers to the circular buffer are maintained and used in recovery.Type: ApplicationFiled: July 23, 2013Publication date: January 29, 2015Applicant: International Business Machines CorporationInventors: Richard J. Eickemeyer, Tejas Karkhanis, Brian R. Konigsburg, David S. Levitan, Douglas R. G. Logan, Jose E. Moreira, Mauricio J. Serrano