Patents by Inventor Jan Gray
Jan Gray has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11048517Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.Type: GrantFiled: June 24, 2019Date of Patent: June 29, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Publication number: 20190310852Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.Type: ApplicationFiled: June 24, 2019Publication date: October 10, 2019Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Patent number: 10409599Abstract: A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor is provided. The method further includes decoding at least one of a first instruction or a second instruction, where: (1) decoding the first instruction results in a processing of information about a group of instructions, including information about a size of the group of instructions, and (2) decoding the second instruction results in a processing of at least one of: (a) a reference to a memory location having the information about the group of instructions, including information about the size of the group of instructions or (b) a processor status word having information about the group of instructions, including information about the size of the group of instructions.Type: GrantFiled: June 26, 2015Date of Patent: September 10, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Jan Gray, Doug Burger, Aaron Smith
-
Patent number: 10346168Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.Type: GrantFiled: June 26, 2015Date of Patent: July 9, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Patent number: 10332008Abstract: A decision tree multi-processor system includes a plurality of decision tree processors that access a common feature vector and execute one or more decision trees with respect to the common feature vector. A related method includes providing a common feature vector to a plurality of decision tree processors implemented within an on-chip decision tree scoring system, and executing, by the plurality of decision tree processors, a plurality off decision trees, by reference to the common feature vector. A related decision tree-walking system includes feature storage that stores a common feature vector and a plurality of decision tree processors that access the common feature vector from the feature storage and execute a plurality of decision trees by comparing threshold values of the decision trees to feature values within the common feature vector.Type: GrantFiled: March 17, 2014Date of Patent: June 25, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Douglas C. Burger, James R. Larus, Andrew Putnam, Jan Gray
-
Patent number: 10175988Abstract: A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor, is provided. The method further includes scheduling at least one of the group of instructions for execution by the processor before decoding the at least one of the group of instructions based at least on pre-computed ready state information associated with the at least one of the group of instructions.Type: GrantFiled: June 26, 2015Date of Patent: January 8, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Jan Gray, Doug Burger, Aaron Smith
-
Patent number: 9952867Abstract: A processor core in an instruction block-based microarchitecture utilizes instruction blocks having headers that include an index to a size table that may be expressed using one of memory, register, logic, or code stream. A control unit in the processor core determines how many instructions to fetch for a current instruction block for mapping into an instruction window based on the block size that is indicated from the size table. As instruction block sizes are often unevenly distributed for a given program, utilization of the size table enables more flexibility in matching instruction blocks to the sizes of available slots in the instruction window as compared to arrangements in which instruction blocks have a fixed sized or are sized with less granularity. Such flexibility may enable denser instruction packing which increases overall processing efficiency by reducing the number of nops (no operations, such as null functions) in a given instruction block.Type: GrantFiled: June 26, 2015Date of Patent: April 24, 2018Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Patent number: 9946548Abstract: A processor core in an instruction block-based microarchitecture includes a control unit that explicitly tracks instruction block state including age or priority for current blocks that have been fetched from an instruction cache. Tracked instruction blocks are maintained in an age-ordered or priority-ordered list. When an instruction block is identified by the control unit for commitment, the list is checked for a match and a matching instruction block can be refreshed without re-fetching from the instruction cache. If a match is not found, an instruction block can be committed and replaced based on either age or priority. Such instruction state tracking typically consumes little overhead and enables instruction blocks to be reused and mispredicted instructions to be skipped to increase processor core efficiency.Type: GrantFiled: June 26, 2015Date of Patent: April 17, 2018Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Patent number: 9767027Abstract: A system for optimizing cache coherence message traffic volume is disclosed. The system includes a plurality of caches in a multi-level memory hierarchy and a plurality of agents. Each agent is associated with a cache. The system includes one or more monitoring engines. Each agent in the plurality of agents is associated with a monitoring engine. The agents can execute a processor level software instruction causing a memory region to be private to the agent. Each of the agents is configured to execute a memory access for data on an associated cache and to send a request for data up the hierarchy on a cache miss. The monitoring engine is configured to intercept request for data from an agent and to prevent snooping for the cache line in peer caches when the cache line associated with a memory region represented as private to the agent.Type: GrantFiled: July 10, 2014Date of Patent: September 19, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Jan Gray, David Callahn, Burton Jordan Smith, Gad Sheaffer, Ali-Reza Adl-Tabatabai
-
Patent number: 9720693Abstract: A processor core in an instruction block-based microarchitecture includes a control unit that allocates instructions into an instruction window in bulk by fetching blocks of instructions and associated resources including control bits and operands at once. Such bulk allocation supports increased efficiency in processor core operations by enabling consistent management and policy implementation across all the instructions in the block during execution. For example, when an instruction block branches back on itself, it may be reused in a refresh process rather than being re-fetched from the instruction cache. As all of the resources for that instruction block are in one place, the instructions can remain in place and only valid bits need to be cleared. Bulk allocation also facilitates operand sharing by instructions in a block and explicit messaging among instructions.Type: GrantFiled: June 26, 2015Date of Patent: August 1, 2017Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Patent number: 9658880Abstract: Handling garbage collection and exceptions in hardware assisted transactions. Embodiments are practiced in a computing environment including a hardware assisted transaction system. A method includes beginning a hardware assisted transaction, raising an exception while in the hardware assisted transaction, including creating an exception object, determining that the transaction should be rolled back, and as a result of determining that the transaction should be rolled back, marshaling the exception object out of the hardware assisted transaction.Type: GrantFiled: March 18, 2013Date of Patent: May 23, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Jan Gray, Martin Taillefer, Yosseff Levanoni, Ali-Reza Adl-Tabatabai, Dave Detlefs, Vinod K. Grover, Michael Magruder, Gad Sheaffer
-
Publication number: 20160378496Abstract: A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor, is provided. The method further includes scheduling at least one of the group of instructions for execution by the processor before decoding the at least one of the group of instructions based at least on pre-computed ready state information associated with the at least one of the group of instructions.Type: ApplicationFiled: June 26, 2015Publication date: December 29, 2016Applicant: Microsoft Technology Licensing, LLCInventors: Jan Gray, Doug Burger, Aaron Smith
-
Publication number: 20160378492Abstract: A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor is provided. The method further includes decoding at least one of a first instruction or a second instruction, where: (1) decoding the first instruction results in a processing of information about a group of instructions, including information about a size of the group of instructions, and (2) decoding the second instruction results in a processing of at least one of: (a) a reference to a memory location having the information about the group of instructions, including information about the size of the group of instructions or (b) a processor status word having information about the group of instructions, including information about the size of the group of instructions.Type: ApplicationFiled: June 26, 2015Publication date: December 29, 2016Applicant: Microsoft Technology Licensing, LLCInventors: Jan Gray, Doug Burger, Aaron Smith
-
Publication number: 20160378484Abstract: A processor core in an instruction block-based microarchitecture utilizes instruction blocks having headers that include an index to a size table that may be expressed using one of memory, register, logic, or code stream. A control unit in the processor core determines how many instructions to fetch for a current instruction block for mapping into an instruction window based on the block size that is indicated from the size table. As instruction block sizes are often unevenly distributed for a given program, utilization of the size table enables more flexibility in matching instruction blocks to the sizes of available slots in the instruction window as compared to arrangements in which instruction blocks have a fixed sized or are sized with less granularity. Such flexibility may enable denser instruction packing which increases overall processing efficiency by reducing the number of nops (no operations, such as null functions) in a given instruction block.Type: ApplicationFiled: June 26, 2015Publication date: December 29, 2016Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Publication number: 20160378479Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.Type: ApplicationFiled: June 26, 2015Publication date: December 29, 2016Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Publication number: 20160378502Abstract: A processor core in an instruction block-based microarchitecture includes a control unit that explicitly tracks instruction block state including age or priority for current blocks that have been fetched from an instruction cache. Tracked instruction blocks are maintained in an age-ordered or priority-ordered list. When an instruction block is identified by the control unit for commitment, the list is checked for a match and a matching instruction block can be refreshed without re-fetching from the instruction cache. If a match is not found, an instruction block can be committed and replaced based on either age or priority. Such instruction state tracking typically consumes little overhead and enables instruction blocks to be reused and mispredicted instructions to be skipped to increase processor core efficiency.Type: ApplicationFiled: June 26, 2015Publication date: December 29, 2016Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Publication number: 20160378493Abstract: A processor core in an instruction block-based microarchitecture includes a control unit that allocates instructions into an instruction window in bulk by fetching blocks of instructions and associated resources including control bits and operands at once. Such bulk allocation supports increased efficiency in processor core operations by enabling consistent management and policy implementation across all the instructions in the block during execution. For example, when an instruction block branches back on itself, it may be reused in a refresh process rather than being re-fetched from the instruction cache. As all of the resources for that instruction block are in one place, the instructions can remain in place and only valid bits need to be cleared. Bulk allocation also facilitates operand sharing by instructions in a block and explicit messaging among instructions.Type: ApplicationFiled: June 26, 2015Publication date: December 29, 2016Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
-
Patent number: 9477515Abstract: In one embodiment, the present invention includes a method for receiving control in a kernel mode via a ring transition from a user thread during execution of an unbounded transactional memory (UTM) transaction, updating a state of a transaction status register (TSR) associated with the user thread and storing the TSR with a context of the user thread, and later restoring the context during a transition from the kernel mode to the user thread. In this way, the UTM transaction may continue on resumption of the user thread. Other embodiments are described and claimed.Type: GrantFiled: August 1, 2013Date of Patent: October 25, 2016Assignee: Intel CorporationInventors: Koichi Yamada, Landy Wang, Martin Taillefer, Arun Kishan, David Callahan, Jan Gray, Gad Sheaffer, Ali-Reza Adl-Tabatabai
-
Publication number: 20160216973Abstract: In one embodiment, the present invention includes a method for receiving control in a kernel mode via a ring transition from a user thread during execution of an unbounded transactional memory (UTM) transaction, updating a state of a transaction status register (TSR) associated with the user thread and storing the TSR with a context of the user thread, and later restoring the context during a transition from the kernel mode to the user thread. In this way, the UTM transaction may continue on resumption of the user thread. Other embodiments are described and claimed.Type: ApplicationFiled: August 1, 2013Publication date: July 28, 2016Inventors: Koichi Yamada, GAD SHEAFFER, JAN GRAY, LANDY WANG, MARTIN TAILLEFER, ARUN KISHAN, ALI-REZA ADL-TABATABAI, DAVID CALLAHAN
-
Patent number: 9280397Abstract: A method and apparatus for accelerating a Software Transactional Memory (STM) system is herein described. A data object and metadata for the data object may each be associated with a filter, such as a hardware monitor or ephemerally held filter information. The filter is in a first, default state when no access, such as a read, from the data object has occurred during a pendancy of a transaction. Upon encountering a first access to the metadata, such as a first read, access barrier operations, such as logging of the metadata; setting a read monitor; or updating ephemeral filter information with an ephemeral/buffered store operation, are performed. Upon a subsequent/redundant access to the metadata, such as a second read, access barrier operations are elided to accelerate the subsequent access based on the filter being set to the second state to indicate a previous access occurred.Type: GrantFiled: December 15, 2009Date of Patent: March 8, 2016Assignee: Intel CorporationInventors: Ali-Reza Adl-Tabatabai, Gad Sheaffer, Bratin Saha, Jan Gray, David Callahan, Burton Smith, Graefe Goetz