Patents by Inventor Doron Orenstein
Doron Orenstein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20150058603Abstract: In one embodiment, the present invention includes logic to receive a permute instruction, first and second source operands, and control values, and to perform a permute operation based on an operation between at least two of the control values so that selected portions of the first and second source operands or a predetermined value can be stored into elements of a destination. Multiple permute instructions may be combined to perform efficient table lookups. Other embodiments are described and claimed.Type: ApplicationFiled: November 5, 2014Publication date: February 26, 2015Inventors: Cristina Anderson, Mark Buxton, Doron Orenstein, Robert Valentine
-
Patent number: 8914613Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.Type: GrantFiled: August 26, 2011Date of Patent: December 16, 2014Assignee: Intel CorporationInventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
-
Publication number: 20140310505Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.Type: ApplicationFiled: June 17, 2014Publication date: October 16, 2014Inventors: Robert Valentine, Doron Orenstein, Brett L. Toll
-
Patent number: 8756403Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.Type: GrantFiled: March 15, 2013Date of Patent: June 17, 2014Assignee: Intel CorporationInventors: Robert Valentine, Doron Orenstein, Brett L. Toll
-
Publication number: 20140095847Abstract: A processor uses multiple banks of an extended register set to store the contexts of multiple user-level threads. A current bank register provides a pointer to the bank that is currently active. A first thread saves its context (first context) in a first bank of the extended register set and a second thread saves its context (second context) in a second bank of the extended register set. When the processor receives an instruction for exchanging contexts between the first thread and the second thread, the processor changes the pointer from the first bank to the second bank, and executes the second thread using the second context stored in the second bank.Type: ApplicationFiled: September 28, 2012Publication date: April 3, 2014Inventor: Doron Orenstein
-
Publication number: 20130290682Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.Type: ApplicationFiled: March 15, 2013Publication date: October 31, 2013Inventors: Robert Valentine, Doron Orenstein, Brett L. Toll
-
Publication number: 20130232321Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.Type: ApplicationFiled: March 15, 2013Publication date: September 5, 2013Inventors: Asaf Hargil, Doron Orenstein
-
Publication number: 20130212360Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.Type: ApplicationFiled: March 15, 2013Publication date: August 15, 2013Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
-
Patent number: 8281109Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.Type: GrantFiled: December 27, 2007Date of Patent: October 2, 2012Assignee: Intel CorporationInventors: Robert Valentine, Doron Orenstein, Bret Toll
-
Publication number: 20110307687Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.Type: ApplicationFiled: August 26, 2011Publication date: December 15, 2011Inventors: ZEEV SPERBER, Robert Valentine, Benny Eitan, Doron Orenstein
-
Patent number: 8078831Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.Type: GrantFiled: October 21, 2010Date of Patent: December 13, 2011Assignee: Intel CorporationInventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
-
Patent number: 8078836Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.Type: GrantFiled: December 30, 2007Date of Patent: December 13, 2011Assignee: Intel CorporationInventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
-
Publication number: 20110035555Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.Type: ApplicationFiled: October 21, 2010Publication date: February 10, 2011Inventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
-
Publication number: 20100332794Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.Type: ApplicationFiled: June 30, 2009Publication date: December 30, 2010Inventors: Asaf Hargil, Doron Orenstein
-
Patent number: 7849465Abstract: Method, apparatus, and system for a programmable event driven yield mechanism that may activate other threads. The yield mechanism may allow triggering of a service thread that may execute currently with a main thread upon occurrence of an architecturally-defined condition. The service thread may be activated, in response to the condition, with limited intervention of an operating system. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect an architecturally-defined condition. The apparatus may include an event handler to handle a yield event generated when the architecturally-defined condition has been detected. An architectural mechanism, including processor instructions and channel registers, may be utilized to allow user-level code to enable the yield event mechanism. Other embodiments are also described and claimed.Type: GrantFiled: May 19, 2005Date of Patent: December 7, 2010Assignee: Intel CorporationInventors: Xiang Zou, Hong Wang, Scott Dion Rodgers, Darrell D. Boggs, Bryant Bigbee, Shivanandan Kaushik, Anil Aggarwal, Ittai Anati, Doron Orenstein, Per Hammarlund, John Shen, Larry O. Smith, James B. Crossland, Chris J. Newburn
-
Patent number: 7844801Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.Type: GrantFiled: July 31, 2003Date of Patent: November 30, 2010Assignee: Intel CorporationInventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
-
Patent number: 7721129Abstract: A clock frequency control unit for an integrated circuit (IC) includes a clock generator, a finite state machine (FSM), and a gating circuit (GC). The FSM has at least first and second states corresponding to non-low workload low workload states, respectively. In the first state, the GC provides a clock signal to functional units of the IC with the same frequency as the clock generator output. In the second state, the GC reduces the frequency of the clock signal. In one embodiment, the GC masks out selected cycles of the clock generator output to reduce the clock signal frequency. The FSM monitors the operation of the IC to transition from the first state to the second state when selected “low workload” conditions are detected (e.g., long latency cache miss). Similarly, the FSM transitions from the second state to the first state when selected “non-low workload” conditions are detected.Type: GrantFiled: January 12, 2006Date of Patent: May 18, 2010Assignee: Intel CorporationInventors: Itamar S. Kazachinsky, Doron Orenstein
-
Publication number: 20090172358Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.Type: ApplicationFiled: December 30, 2007Publication date: July 2, 2009Inventors: ZEEV SPERBER, Robert Valentine, Benny Eitan, Doron Orenstein
-
Patent number: 7437581Abstract: A method and apparatus for changing the configuration of a multi-core processor is disclosed. In one embodiment, a throttle module (or throttle logic) may determine the amount of parallelism present in the currently-executing program, and change the execution of the threads of that program on the various cores. If the amount of parallelism is high, then the processor may be configured to run a larger amount of threads on cores configured to consume less power. If the amount of parallelism is low, then the processor may be configured to run a smaller amount of threads on cores configured for greater scalar performance.Type: GrantFiled: September 28, 2004Date of Patent: October 14, 2008Assignee: Intel CorporationInventors: Edward Grochowski, John Shen, Hong Wang, Doron Orenstein, Gad S Sheaffer, Ronny Ronen, Murali M. Annavaram
-
Publication number: 20060294347Abstract: Method, apparatus, and system for a programmable event driven yield mechanism that may activate other threads. The yield mechanism may allow triggering of a service thread that may execute currently with a main thread upon occurrence of an architecturally-defined condition. The service thread may be activated, in response to the condition, with limited intervention of an operating system. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect an architecturally-defined condition. The apparatus may include an event handler to handle a yield event generated when the architecturally-defined condition has been detected. An architectural mechanism, including processor instructions and channel registers, may be utilized to allow user-level code to enable the yield event mechanism. Other embodiments are also described and claimed.Type: ApplicationFiled: May 19, 2005Publication date: December 28, 2006Inventors: Xiang Zou, Hong Wang, Scott Rodgers, Darrell Boggs, Bryant Bigbee, Shivanandan Kaushik, Anil Aggarwal, Ittai Anati, Doron Orenstein, Per Hammarlund, John Shen, Larry Smith, James Crossland, Chris Newburn