Patents by Inventor Jan Gray

Jan Gray has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Decoupled processor instruction window and operand buffer

Patent number: 11048517

Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.

Type: Grant

Filed: June 24, 2019

Date of Patent: June 29, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
DECOUPLED PROCESSOR INSTRUCTION WINDOW AND OPERAND BUFFER

Publication number: 20190310852

Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.

Type: Application

Filed: June 24, 2019

Publication date: October 10, 2019

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
Decoding information about a group of instructions including a size of the group of instructions

Patent number: 10409599

Abstract: A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor is provided. The method further includes decoding at least one of a first instruction or a second instruction, where: (1) decoding the first instruction results in a processing of information about a group of instructions, including information about a size of the group of instructions, and (2) decoding the second instruction results in a processing of at least one of: (a) a reference to a memory location having the information about the group of instructions, including information about the size of the group of instructions or (b) a processor status word having information about the group of instructions, including information about the size of the group of instructions.

Type: Grant

Filed: June 26, 2015

Date of Patent: September 10, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jan Gray, Doug Burger, Aaron Smith
Decoupled processor instruction window and operand buffer

Patent number: 10346168

Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.

Type: Grant

Filed: June 26, 2015

Date of Patent: July 9, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
Parallel decision tree processor architecture

Patent number: 10332008

Abstract: A decision tree multi-processor system includes a plurality of decision tree processors that access a common feature vector and execute one or more decision trees with respect to the common feature vector. A related method includes providing a common feature vector to a plurality of decision tree processors implemented within an on-chip decision tree scoring system, and executing, by the plurality of decision tree processors, a plurality off decision trees, by reference to the common feature vector. A related decision tree-walking system includes feature storage that stores a common feature vector and a plurality of decision tree processors that access the common feature vector from the feature storage and execute a plurality of decision trees by comparing threshold values of the decision trees to feature values within the common feature vector.

Type: Grant

Filed: March 17, 2014

Date of Patent: June 25, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, James R. Larus, Andrew Putnam, Jan Gray
Explicit instruction scheduler state information for a processor

Patent number: 10175988

Abstract: A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor, is provided. The method further includes scheduling at least one of the group of instructions for execution by the processor before decoding the at least one of the group of instructions based at least on pre-computed ready state information associated with the at least one of the group of instructions.

Type: Grant

Filed: June 26, 2015

Date of Patent: January 8, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jan Gray, Doug Burger, Aaron Smith
Mapping instruction blocks based on block size

Patent number: 9952867

Abstract: A processor core in an instruction block-based microarchitecture utilizes instruction blocks having headers that include an index to a size table that may be expressed using one of memory, register, logic, or code stream. A control unit in the processor core determines how many instructions to fetch for a current instruction block for mapping into an instruction window based on the block size that is indicated from the size table. As instruction block sizes are often unevenly distributed for a given program, utilization of the size table enables more flexibility in matching instruction blocks to the sizes of available slots in the instruction window as compared to arrangements in which instruction blocks have a fixed sized or are sized with less granularity. Such flexibility may enable denser instruction packing which increases overall processing efficiency by reducing the number of nops (no operations, such as null functions) in a given instruction block.

Type: Grant

Filed: June 26, 2015

Date of Patent: April 24, 2018

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
Age-based management of instruction blocks in a processor instruction window

Patent number: 9946548

Abstract: A processor core in an instruction block-based microarchitecture includes a control unit that explicitly tracks instruction block state including age or priority for current blocks that have been fetched from an instruction cache. Tracked instruction blocks are maintained in an age-ordered or priority-ordered list. When an instruction block is identified by the control unit for commitment, the list is checked for a match and a matching instruction block can be refreshed without re-fetching from the instruction cache. If a match is not found, an instruction block can be committed and replaced based on either age or priority. Such instruction state tracking typically consumes little overhead and enables instruction blocks to be reused and mispredicted instructions to be skipped to increase processor core efficiency.

Type: Grant

Filed: June 26, 2015

Date of Patent: April 17, 2018

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
Private memory regions and coherency optimization by controlling snoop traffic volume in multi-level cache hierarchy

Patent number: 9767027

Abstract: A system for optimizing cache coherence message traffic volume is disclosed. The system includes a plurality of caches in a multi-level memory hierarchy and a plurality of agents. Each agent is associated with a cache. The system includes one or more monitoring engines. Each agent in the plurality of agents is associated with a monitoring engine. The agents can execute a processor level software instruction causing a memory region to be private to the agent. Each of the agents is configured to execute a memory access for data on an associated cache and to send a request for data up the hierarchy on a cache miss. The monitoring engine is configured to intercept request for data from an agent and to prevent snooping for the cache line in peer caches when the cache line associated with a memory region represented as private to the agent.

Type: Grant

Filed: July 10, 2014

Date of Patent: September 19, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jan Gray, David Callahn, Burton Jordan Smith, Gad Sheaffer, Ali-Reza Adl-Tabatabai
Bulk allocation of instruction blocks to a processor instruction window

Patent number: 9720693

Abstract: A processor core in an instruction block-based microarchitecture includes a control unit that allocates instructions into an instruction window in bulk by fetching blocks of instructions and associated resources including control bits and operands at once. Such bulk allocation supports increased efficiency in processor core operations by enabling consistent management and policy implementation across all the instructions in the block during execution. For example, when an instruction block branches back on itself, it may be reused in a refresh process rather than being re-fetched from the instruction cache. As all of the resources for that instruction block are in one place, the instructions can remain in place and only valid bits need to be cleared. Bulk allocation also facilitates operand sharing by instructions in a block and explicit messaging among instructions.

Type: Grant

Filed: June 26, 2015

Date of Patent: August 1, 2017

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
Efficient garbage collection and exception handling in a hardware accelerated transactional memory system

Patent number: 9658880

Abstract: Handling garbage collection and exceptions in hardware assisted transactions. Embodiments are practiced in a computing environment including a hardware assisted transaction system. A method includes beginning a hardware assisted transaction, raising an exception while in the hardware assisted transaction, including creating an exception object, determining that the transaction should be rolled back, and as a result of determining that the transaction should be rolled back, marshaling the exception object out of the hardware assisted transaction.

Type: Grant

Filed: March 18, 2013

Date of Patent: May 23, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jan Gray, Martin Taillefer, Yosseff Levanoni, Ali-Reza Adl-Tabatabai, Dave Detlefs, Vinod K. Grover, Michael Magruder, Gad Sheaffer
Explicit Instruction Scheduler State Information for a Processor

Publication number: 20160378496

Abstract: A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor, is provided. The method further includes scheduling at least one of the group of instructions for execution by the processor before decoding the at least one of the group of instructions based at least on pre-computed ready state information associated with the at least one of the group of instructions.

Type: Application

Filed: June 26, 2015

Publication date: December 29, 2016

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jan Gray, Doug Burger, Aaron Smith
Decoding Information About a Group of Instructions Including a Size of the Group of Instructions

Publication number: 20160378492

Abstract: A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor is provided. The method further includes decoding at least one of a first instruction or a second instruction, where: (1) decoding the first instruction results in a processing of information about a group of instructions, including information about a size of the group of instructions, and (2) decoding the second instruction results in a processing of at least one of: (a) a reference to a memory location having the information about the group of instructions, including information about the size of the group of instructions or (b) a processor status word having information about the group of instructions, including information about the size of the group of instructions.

Type: Application

Filed: June 26, 2015

Publication date: December 29, 2016

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jan Gray, Doug Burger, Aaron Smith
MAPPING INSTRUCTION BLOCKS BASED ON BLOCK SIZE

Publication number: 20160378484

Abstract: A processor core in an instruction block-based microarchitecture utilizes instruction blocks having headers that include an index to a size table that may be expressed using one of memory, register, logic, or code stream. A control unit in the processor core determines how many instructions to fetch for a current instruction block for mapping into an instruction window based on the block size that is indicated from the size table. As instruction block sizes are often unevenly distributed for a given program, utilization of the size table enables more flexibility in matching instruction blocks to the sizes of available slots in the instruction window as compared to arrangements in which instruction blocks have a fixed sized or are sized with less granularity. Such flexibility may enable denser instruction packing which increases overall processing efficiency by reducing the number of nops (no operations, such as null functions) in a given instruction block.

Type: Application

Filed: June 26, 2015

Publication date: December 29, 2016

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
DECOUPLED PROCESSOR INSTRUCTION WINDOW AND OPERAND BUFFER

Publication number: 20160378479

Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.

Type: Application

Filed: June 26, 2015

Publication date: December 29, 2016

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
AGE-BASED MANAGEMENT OF INSTRUCTION BLOCKS IN A PROCESSOR INSTRUCTION WINDOW

Publication number: 20160378502

Abstract: A processor core in an instruction block-based microarchitecture includes a control unit that explicitly tracks instruction block state including age or priority for current blocks that have been fetched from an instruction cache. Tracked instruction blocks are maintained in an age-ordered or priority-ordered list. When an instruction block is identified by the control unit for commitment, the list is checked for a match and a matching instruction block can be refreshed without re-fetching from the instruction cache. If a match is not found, an instruction block can be committed and replaced based on either age or priority. Such instruction state tracking typically consumes little overhead and enables instruction blocks to be reused and mispredicted instructions to be skipped to increase processor core efficiency.

Type: Application

Filed: June 26, 2015

Publication date: December 29, 2016

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
BULK ALLOCATION OF INSTRUCTION BLOCKS TO A PROCESSOR INSTRUCTION WINDOW

Publication number: 20160378493

Abstract: A processor core in an instruction block-based microarchitecture includes a control unit that allocates instructions into an instruction window in bulk by fetching blocks of instructions and associated resources including control bits and operands at once. Such bulk allocation supports increased efficiency in processor core operations by enabling consistent management and policy implementation across all the instructions in the block during execution. For example, when an instruction block branches back on itself, it may be reused in a refresh process rather than being re-fetched from the instruction cache. As all of the resources for that instruction block are in one place, the instructions can remain in place and only valid bits need to be cleared. Bulk allocation also facilitates operand sharing by instructions in a block and explicit messaging among instructions.

Type: Application

Filed: June 26, 2015

Publication date: December 29, 2016

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
Handling operating system (OS) transitions in an unbounded transactional memory (UTM) mode

Patent number: 9477515

Abstract: In one embodiment, the present invention includes a method for receiving control in a kernel mode via a ring transition from a user thread during execution of an unbounded transactional memory (UTM) transaction, updating a state of a transaction status register (TSR) associated with the user thread and storing the TSR with a context of the user thread, and later restoring the context during a transition from the kernel mode to the user thread. In this way, the UTM transaction may continue on resumption of the user thread. Other embodiments are described and claimed.

Type: Grant

Filed: August 1, 2013

Date of Patent: October 25, 2016

Assignee: Intel Corporation

Inventors: Koichi Yamada, Landy Wang, Martin Taillefer, Arun Kishan, David Callahan, Jan Gray, Gad Sheaffer, Ali-Reza Adl-Tabatabai
Handling Operating System (Os) Transitions In An Unbounded Transactional Memory (Utm) Mode

Publication number: 20160216973

Abstract: In one embodiment, the present invention includes a method for receiving control in a kernel mode via a ring transition from a user thread during execution of an unbounded transactional memory (UTM) transaction, updating a state of a transaction status register (TSR) associated with the user thread and storing the TSR with a context of the user thread, and later restoring the context during a transition from the kernel mode to the user thread. In this way, the UTM transaction may continue on resumption of the user thread. Other embodiments are described and claimed.

Type: Application

Filed: August 1, 2013

Publication date: July 28, 2016

Inventors: Koichi Yamada, GAD SHEAFFER, JAN GRAY, LANDY WANG, MARTIN TAILLEFER, ARUN KISHAN, ALI-REZA ADL-TABATABAI, DAVID CALLAHAN
Using buffered stores or monitoring to filter redundant transactional accesses and mechanisms for mapping data to buffered metadata

Patent number: 9280397

Abstract: A method and apparatus for accelerating a Software Transactional Memory (STM) system is herein described. A data object and metadata for the data object may each be associated with a filter, such as a hardware monitor or ephemerally held filter information. The filter is in a first, default state when no access, such as a read, from the data object has occurred during a pendancy of a transaction. Upon encountering a first access to the metadata, such as a first read, access barrier operations, such as logging of the metadata; setting a read monitor; or updating ephemeral filter information with an ephemeral/buffered store operation, are performed. Upon a subsequent/redundant access to the metadata, such as a second read, access barrier operations are elided to accelerate the subsequent access based on the filter being set to the second state to indicate a previous access occurred.

Type: Grant

Filed: December 15, 2009

Date of Patent: March 8, 2016

Assignee: Intel Corporation

Inventors: Ali-Reza Adl-Tabatabai, Gad Sheaffer, Bratin Saha, Jan Gray, David Callahan, Burton Smith, Graefe Goetz

1 2 3 4 5 next