Patents by Inventor Doron Orenstein

Doron Orenstein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Permute Operations With Flexible Zero Control

Publication number: 20150058603

Abstract: In one embodiment, the present invention includes logic to receive a permute instruction, first and second source operands, and control values, and to perform a permute operation based on an operation between at least two of the control values so that selected portions of the first and second source operands or a predetermined value can be stored into elements of a destination. Multiple permute instructions may be combined to perform efficient table lookups. Other embodiments are described and claimed.

Type: Application

Filed: November 5, 2014

Publication date: February 26, 2015

Inventors: Cristina Anderson, Mark Buxton, Doron Orenstein, Robert Valentine
Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a same set of per-lane control bits

Patent number: 8914613

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Grant

Filed: August 26, 2011

Date of Patent: December 16, 2014

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
COMPRESSED INSTRUCTION FORMAT

Publication number: 20140310505

Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.

Type: Application

Filed: June 17, 2014

Publication date: October 16, 2014

Inventors: Robert Valentine, Doron Orenstein, Brett L. Toll
Compressed instruction format

Patent number: 8756403

Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.

Type: Grant

Filed: March 15, 2013

Date of Patent: June 17, 2014

Assignee: Intel Corporation

Inventors: Robert Valentine, Doron Orenstein, Brett L. Toll
INSTRUCTION AND HIGHLY EFFICIENT MICRO-ARCHITECTURE TO ENABLE INSTANT CONTEXT SWITCH FOR USER-LEVEL THREADING

Publication number: 20140095847

Abstract: A processor uses multiple banks of an extended register set to store the contexts of multiple user-level threads. A current bank register provides a pointer to the bank that is currently active. A first thread saves its context (first context) in a first bank of the extended register set and a second thread saves its context (second context) in a second bank of the extended register set. When the processor receives an instruction for exchanging contexts between the first thread and the second thread, the processor changes the pointer from the first bank to the second bank, and executes the second thread using the second context stored in the second bank.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Inventor: Doron Orenstein
COMPRESSED INSTRUCTION FORMAT

Publication number: 20130290682

Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.

Type: Application

Filed: March 15, 2013

Publication date: October 31, 2013

Inventors: Robert Valentine, Doron Orenstein, Brett L. Toll
Unpacking Packed Data In Multiple Lanes

Publication number: 20130232321

Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.

Type: Application

Filed: March 15, 2013

Publication date: September 5, 2013

Inventors: Asaf Hargil, Doron Orenstein
In-Lane Vector Shuffle Instructions

Publication number: 20130212360

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Application

Filed: March 15, 2013

Publication date: August 15, 2013

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
Compressed instruction format

Patent number: 8281109

Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.

Type: Grant

Filed: December 27, 2007

Date of Patent: October 2, 2012

Assignee: Intel Corporation

Inventors: Robert Valentine, Doron Orenstein, Bret Toll
IN-LANE VECTOR SHUFFLE INSTRUCTIONS

Publication number: 20110307687

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Application

Filed: August 26, 2011

Publication date: December 15, 2011

Inventors: ZEEV SPERBER, Robert Valentine, Benny Eitan, Doron Orenstein
Method and apparatus for affinity-guided speculative helper threads in chip multiprocessors

Patent number: 8078831

Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.

Type: Grant

Filed: October 21, 2010

Date of Patent: December 13, 2011

Assignee: Intel Corporation

Inventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a common set of per-lane control bits

Patent number: 8078836

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Grant

Filed: December 30, 2007

Date of Patent: December 13, 2011

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
METHOD AND APPARATUS FOR AFFINITY-GUIDED SPECULATIVE HELPER THREADS IN CHIP MULTIPROCESSORS

Publication number: 20110035555

Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.

Type: Application

Filed: October 21, 2010

Publication date: February 10, 2011

Inventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
UNPACKING PACKED DATA IN MULTIPLE LANES

Publication number: 20100332794

Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.

Type: Application

Filed: June 30, 2009

Publication date: December 30, 2010

Inventors: Asaf Hargil, Doron Orenstein
Programmable event driven yield mechanism which may activate service threads

Patent number: 7849465

Abstract: Method, apparatus, and system for a programmable event driven yield mechanism that may activate other threads. The yield mechanism may allow triggering of a service thread that may execute currently with a main thread upon occurrence of an architecturally-defined condition. The service thread may be activated, in response to the condition, with limited intervention of an operating system. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect an architecturally-defined condition. The apparatus may include an event handler to handle a yield event generated when the architecturally-defined condition has been detected. An architectural mechanism, including processor instructions and channel registers, may be utilized to allow user-level code to enable the yield event mechanism. Other embodiments are also described and claimed.

Type: Grant

Filed: May 19, 2005

Date of Patent: December 7, 2010

Assignee: Intel Corporation

Inventors: Xiang Zou, Hong Wang, Scott Dion Rodgers, Darrell D. Boggs, Bryant Bigbee, Shivanandan Kaushik, Anil Aggarwal, Ittai Anati, Doron Orenstein, Per Hammarlund, John Shen, Larry O. Smith, James B. Crossland, Chris J. Newburn
Method and apparatus for affinity-guided speculative helper threads in chip multiprocessors

Patent number: 7844801

Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.

Type: Grant

Filed: July 31, 2003

Date of Patent: November 30, 2010

Assignee: Intel Corporation

Inventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
Method and apparatus for reducing clock frequency during low workload periods

Patent number: 7721129

Abstract: A clock frequency control unit for an integrated circuit (IC) includes a clock generator, a finite state machine (FSM), and a gating circuit (GC). The FSM has at least first and second states corresponding to non-low workload low workload states, respectively. In the first state, the GC provides a clock signal to functional units of the IC with the same frequency as the clock generator output. In the second state, the GC reduces the frequency of the clock signal. In one embodiment, the GC masks out selected cycles of the clock generator output to reduce the clock signal frequency. The FSM monitors the operation of the IC to transition from the first state to the second state when selected “low workload” conditions are detected (e.g., long latency cache miss). Similarly, the FSM transitions from the second state to the first state when selected “non-low workload” conditions are detected.

Type: Grant

Filed: January 12, 2006

Date of Patent: May 18, 2010

Assignee: Intel Corporation

Inventors: Itamar S. Kazachinsky, Doron Orenstein
IN-LANE VECTOR SHUFFLE INSTRUCTIONS

Publication number: 20090172358

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Application

Filed: December 30, 2007

Publication date: July 2, 2009

Inventors: ZEEV SPERBER, Robert Valentine, Benny Eitan, Doron Orenstein
Method and apparatus for varying energy per instruction according to the amount of available parallelism

Patent number: 7437581

Abstract: A method and apparatus for changing the configuration of a multi-core processor is disclosed. In one embodiment, a throttle module (or throttle logic) may determine the amount of parallelism present in the currently-executing program, and change the execution of the threads of that program on the various cores. If the amount of parallelism is high, then the processor may be configured to run a larger amount of threads on cores configured to consume less power. If the amount of parallelism is low, then the processor may be configured to run a smaller amount of threads on cores configured for greater scalar performance.

Type: Grant

Filed: September 28, 2004

Date of Patent: October 14, 2008

Assignee: Intel Corporation

Inventors: Edward Grochowski, John Shen, Hong Wang, Doron Orenstein, Gad S Sheaffer, Ronny Ronen, Murali M. Annavaram
Programmable event driven yield mechanism which may activate service threads

Publication number: 20060294347

Abstract: Method, apparatus, and system for a programmable event driven yield mechanism that may activate other threads. The yield mechanism may allow triggering of a service thread that may execute currently with a main thread upon occurrence of an architecturally-defined condition. The service thread may be activated, in response to the condition, with limited intervention of an operating system. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect an architecturally-defined condition. The apparatus may include an event handler to handle a yield event generated when the architecturally-defined condition has been detected. An architectural mechanism, including processor instructions and channel registers, may be utilized to allow user-level code to enable the yield event mechanism. Other embodiments are also described and claimed.

Type: Application

Filed: May 19, 2005

Publication date: December 28, 2006

Inventors: Xiang Zou, Hong Wang, Scott Rodgers, Darrell Boggs, Bryant Bigbee, Shivanandan Kaushik, Anil Aggarwal, Ittai Anati, Doron Orenstein, Per Hammarlund, John Shen, Larry Smith, James Crossland, Chris Newburn

prev 1 2 3 next