Patents by Inventor Steven Hsu
Steven Hsu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240007087Abstract: Techniques and mechanisms for an integrated clock gate (ICG) to selectively output a clock signal, and to provide frequency division functionality. In an embodiment, an ICG circuit comprises first circuitry which is coupled to receive a first clock signal, and second circuitry which is coupled to receive a control signal. The first circuitry provides a single edge triggered flip-flop functionality, and is coupled to communicate a feedback signal which the first circuitry is further coupled to receive. Based on the control signal and the feedback signal, the second circuitry performs an exclusive OR (XOR) operation to selectively enable the first circuitry to generate a second clock signal based on the first clock signal. In another embodiment, a frequency of the second clock signal is substantially equal to one half of a frequency of the first clock signal.Type: ApplicationFiled: July 1, 2022Publication date: January 4, 2024Applicant: Intel CorporationInventors: Steven Hsu, Amit Agarwal, Simeon Realov, Mark Anders, Ram Krishnamurthy
-
Publication number: 20230376274Abstract: A fused dot-product multiply-accumulate (MAC) circuit may support variable precisions of floating-point data elements to perform computations (e.g., MAC operations) in deep learning operations. An operation mode of the circuit may be selected based on the precision of an input element. The operation mode may be a FP16 mode or a FP8 mode. In the FP8 mode, product exponents may be computed based on exponents of floating-point input elements. A maximum exponent may be selected from the one or more product exponents. A global maximum exponent may be selected from a plurality of maximum exponents. A product mantissa may be computed and aligned with another product mantissa based on a difference between the global maximum exponent and a corresponding maximum exponent. An adder tree may accumulate the aligned product mantissas and compute a partial sum mantissa. The partial sum mantissa may be normalized using the global maximum exponent.Type: ApplicationFiled: July 31, 2023Publication date: November 23, 2023Applicant: Intel CorporationInventors: Mark Anders, Arnab Raha, Amit Agarwal, Steven Hsu, Deepak Abraham Mathaikutty, Ram K. Krishnamurthy, Martin Power
-
Patent number: 11791819Abstract: A parasitic-aware single-edge triggered flip-flop reduces clock power through layout optimization, enabled through process-circuit co-optimization. The static pass-gate master-slave flip-flop utilizes novel layout optimization enabling significant power reduction. The layout removes the clock poly over notches in the diffusion area. Poly lines implement clock nodes. The poly lines are aligned between n-type and p-type active regions.Type: GrantFiled: December 26, 2019Date of Patent: October 17, 2023Assignee: Intel CorporationInventors: Steven Hsu, Amit Agarwal, Simeon Realov, Ram Krishnamurthy
-
Patent number: 11757434Abstract: A fast Mux-D scan flip-flop is provided, which bypasses a scan multiplexer to a master keeper side path, removing delay overhead of a traditional Mux-D scan topology. The design is compatible with simple scan methodology of Mux-D scan, while preserving smaller area and small number of inputs/outputs. Since scan Mux is not in the forward critical path, circuit topology has similar high performance as level-sensitive scan flip-flop and can be easily converted into bare pass-gate version. The new fast Mux-D scan flip-flop combines the advantages of the conventional LSSD and Mux-D scan flip-flop, without the disadvantages of each.Type: GrantFiled: April 1, 2022Date of Patent: September 12, 2023Assignee: Intel CorporationInventors: Amit Agarwal, Steven Hsu, Simeon Realov, Mahesh Kumashikar, Ram Krishnamurthy
-
Publication number: 20230195388Abstract: Methods and apparatus relating to register file virtualization techniques are described. In an embodiment, a register file includes a plurality of register file cells. Each of the register file cells includes a register file entry and a shadow buffer. Logic circuitry causes storage of input data to the shadow buffer, while data stored in the register file entry is accessible to perform one or more operations. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: December 17, 2021Publication date: June 22, 2023Applicant: Intel CorporationInventors: William Butera, David Webb, Mitchell Diamond, Steven Hsu, Amit Agarwal
-
Publication number: 20230014656Abstract: A memory array of a compute tile may store activations or weights of a DNN. The memory array may include databanks for storing contexts, context MUXs, and byte MUXs. A databank may store a context with flip-flop arrays, each of which includes a sequence of flip-flops. A logic gate and an ICG unit may gate flip-flops and control whether states of the flip-flops can be changed. The data gating can prevent a context not selected for the databank from inadvertently toggling and wasting power A context MUX may read a context from different flip-flop arrays in a databank based on gray-coded addresses. A byte MUX can combine bits from different bytes in a context read by the context MUX. The memory array may be implemented with bit packing to reduce distance between the context MUX and byte MUX to reduce lengths of wires connecting the context MUXs and byte MUXs.Type: ApplicationFiled: September 23, 2022Publication date: January 19, 2023Inventors: Raymond Jit-Hung Sung, Deepak Abraham Mathaikutty, Amit Agarwal, David Thomas Bernard, Steven Hsu, Martin Power, Conor Byme, Arnab Raha
-
Patent number: 11442103Abstract: An apparatus is provided which comprises: a multi-bit quad latch with an internally coupled level sensitive scan circuitry; and a combinational logic coupled to an output of the multi-bit quad latch. Another apparatus is provided which comprises: a plurality of sequential logic circuitries; and a clocking circuitry comprising inverters, wherein the clocking circuitry is shared by the plurality of sequential logic circuitries.Type: GrantFiled: April 26, 2021Date of Patent: September 13, 2022Assignee: Intel CorporationInventors: Amit Agarwal, Ram Krishnamurthy, Satish Damaraju, Steven Hsu, Simeon Realov
-
Patent number: 11398814Abstract: A new family of shared clock single-edge triggered flip-flops that reduces a number of internal clock devices from 8 to 6 devices to reduce clock power. The static pass-gate master-slave flip-flop has no performance penalty compared to the flip-flops with 8 clock devices thus enabling significant power reduction. The flip-flop intelligently maintains the same polarity between the master and slave stages which enables the sharing of the master tristate and slave state feedback clock devices without risk of charge sharing across all combinations of clock and data toggling. Because of this, the state of the flip-flop remains undisturbed, and is robust across charge sharing noise. A multi-bit time borrowing internal stitched flip-flop is also described, which enables internal stitching of scan in a high performance time-borrowing flip-flop without incurring increase in layout area.Type: GrantFiled: March 9, 2020Date of Patent: July 26, 2022Assignee: Intel CorporationInventors: Steven Hsu, Amit Agarwal, Simeon Realov, Satish Damaraju, Ram Krishnamurthy
-
Publication number: 20220224316Abstract: A fast Mux-D scan flip-flop is provided, which bypasses a scan multiplexer to a master keeper side path, removing delay overhead of a traditional Mux-D scan topology. The design is compatible with simple scan methodology of Mux-D scan, while preserving smaller area and small number of inputs/outputs. Since scan Mux is not in the forward critical path, circuit topology has similar high performance as level-sensitive scan flip-flop and can be easily converted into bare pass-gate version. The new fast Mux-D scan flip-flop combines the advantages of the conventional LSSD and Mux-D scan flip-flop, without the disadvantages of each.Type: ApplicationFiled: April 1, 2022Publication date: July 14, 2022Inventors: Amit Agarwal, Steven Hsu, Simeon Realov, Mahesh Kumashikar, Ram Krishnamurthy
-
Patent number: 11296681Abstract: A fast Mux-D scan flip-flop is provided, which bypasses a scan multiplexer to a master keeper side path, removing delay overhead of a traditional Mux-D scan topology. The design is compatible with simple scan methodology of Mux-D scan, while preserving smaller area and small number of inputs/outputs. Since scan Mux is not in the forward critical path, circuit topology has similar high performance as level-sensitive scan flip-flop and can be easily converted into bare pass-gate version. The new fast Mux-D scan flip-flop combines the advantages of the conventional LSSD and Mux-D scan flip-flop, without the disadvantages of each.Type: GrantFiled: December 23, 2019Date of Patent: April 5, 2022Assignee: Intel CorporationInventors: Amit Agarwal, Steven Hsu, Simeon Realov, Mahesh Kumashikar, Ram Krishnamurthy
-
Publication number: 20210407168Abstract: A method comprising: dividing a 3D space into a voxel grid comprising a plurality of voxels; associating a plurality of distance values with the plurality of voxels, each distance value based on a distance to a boundary of an object; selecting an approximate interpolation mode for stepping a ray through a first one or more voxels of the 3D space responsive to the first one or more voxels having distance values greater than a threshold; and detecting the ray reaching a second one or more voxels having distance values less than the first threshold; and responsively selecting a precise interpolation mode for stepping the ray through the second one or more voxels.Type: ApplicationFiled: October 14, 2020Publication date: December 30, 2021Inventors: Vivek De, Ram Krishnamurthy, Amit Agarwal, Steven Hsu, Monodeep Kar
-
Publication number: 20210407039Abstract: A method comprising: dividing a 3D space into a voxel grid comprising a plurality of voxels; associating a plurality of distance values with the plurality of voxels, each distance value based on a distance to a boundary of an object; selecting an approximate interpolation mode for stepping a ray through a first one or more voxels of the 3D space responsive to the first one or more voxels having distance values greater than a threshold; and detecting the ray reaching a second one or more voxels having distance values less than the first threshold; and responsively selecting a precise interpolation mode for stepping the ray through the second one or more voxels.Type: ApplicationFiled: June 30, 2020Publication date: December 30, 2021Inventors: Vivek De, Ram Krishnamurthy, Amit Agarwal, Steven Hsu, Monodeep Kar
-
Publication number: 20210281250Abstract: A new family of shared clock single-edge triggered flip-flops that reduces a number of internal clock devices from 8 to 6 devices to reduce clock power. The static pass-gate master-slave flip-flop has no performance penalty compared to the flip-flops with 8 clock devices thus enabling significant power reduction. The flip-flop intelligently maintains the same polarity between the master and slave stages which enables the sharing of the master tristate and slave state feedback clock devices without risk of charge sharing across all combinations of clock and data toggling. Because of this, the state of the flip-flop remains undisturbed, and is robust across charge sharing noise. A multi-bit time borrowing internal stitched flip-flop is also described, which enables internal stitching of scan in a high performance time-borrowing flip-flop without incurring increase in layout area.Type: ApplicationFiled: March 9, 2020Publication date: September 9, 2021Applicant: Intel CorporationInventors: Steven Hsu, Amit Agarwal, Simeon Realov, Satish Damaraju, Ram Krishnamurthy
-
Publication number: 20210263100Abstract: An apparatus is provided which comprises: a multi-bit quad latch with an internally coupled level sensitive scan circuitry; and a combinational logic coupled to an output of the multi-bit quad latch. Another apparatus is provided which comprises: a plurality of sequential logic circuitries; and a clocking circuitry comprising inverters, wherein the clocking circuitry is shared by the plurality of sequential logic circuitries.Type: ApplicationFiled: April 26, 2021Publication date: August 26, 2021Applicant: Intel CorporationInventors: Amit Agarwal, Ram Krishnamurthy, Satish Damaraju, Steven Hsu, Simeon Realov
-
Patent number: 11054470Abstract: A family of novel, low power, min-drive strength, double-edge triggered (DET) input data multiplexer (Mux-D) scan flip-flop (FF) is provided. The flip-flop takes the advantage of no state node in the slave to remove data inverters in a traditional DET FF to save power, without affecting the flip-flop functionality under coupling/glitch scenarios.Type: GrantFiled: December 23, 2019Date of Patent: July 6, 2021Assignee: Intel CorporationInventors: Amit Agarwal, Steven Hsu, Anupama Ambardar Thaploo, Simeon Realov, Ram Krishnamurthy
-
Publication number: 20210203323Abstract: A parasitic-aware single-edge triggered flip-flop reduces clock power through layout optimization, enabled through process-circuit co-optimization. The static pass-gate master-slave flip-flop utilizes novel layout optimization enabling significant power reduction. The layout removes the clock poly over notches in the diffusion area. Poly lines implement clock nodes. The poly lines are aligned between n-type and p-type active regions.Type: ApplicationFiled: December 26, 2019Publication date: July 1, 2021Applicant: Intel CorporationInventors: Steven Hsu, Amit Agarwal, Simeon Realov, Ram Krishnamurthy
-
Publication number: 20210194468Abstract: A family of novel, low power, min-drive strength, double-edge triggered (DET) input data multiplexer (Mux-D) scan flip-flop (FF) is provided. The flip-flop takes the advantage of no state node in the slave to remove data inverters in a traditional DET FF to save power, without affecting the flip-flop functionality under coupling/glitch scenarios.Type: ApplicationFiled: December 23, 2019Publication date: June 24, 2021Applicant: Intel CorporationInventors: Amit Agarwal, Steven Hsu, Anupama Ambardar Thaploo, Simeon Realov, Ram Krishnamurthy
-
Publication number: 20210194469Abstract: A fast Mux-D scan flip-flop is provided, which bypasses a scan multiplexer to a master keeper side path, removing delay overhead of a traditional Mux-D scan topology. The design is compatible with simple scan methodology of Mux-D scan, while preserving smaller area and small number of inputs/outputs. Since scan Mux is not in the forward critical path, circuit topology has similar high performance as level-sensitive scan flip-flop and can be easily converted into bare pass-gate version. The new fast Mux-D scan flip-flop combines the advantages of the conventional LSSD and Mux-D scan flip-flop, without the disadvantages of each.Type: ApplicationFiled: December 23, 2019Publication date: June 24, 2021Applicant: Intel CorporationInventors: Amit Agarwal, Steven Hsu, Simeon Realov, Mahesh Kumashikar, Ram Krishnamurthy
-
Patent number: 11009549Abstract: An apparatus is provided which comprises: a multi-bit quad latch with an internally coupled level sensitive scan circuitry; and a combinational logic coupled to an output of the multi-bit quad latch. Another apparatus is provided which comprises: a plurality of sequential logic circuitries; and a clocking circuitry comprising inverters, wherein the clocking circuitry is shared by the plurality of sequential logic circuitries.Type: GrantFiled: November 12, 2019Date of Patent: May 18, 2021Assignee: Intel CorporationInventors: Amit Agarwal, Ram Krishnamurthy, Satish Damaraju, Steven Hsu, Simeon Realov
-
Publication number: 20210117197Abstract: Systems, apparatuses and methods identify a plurality of registers that are associated with a system-on-chip. The plurality of registers includes a first portion dedicated to write operations and a second portion dedicated to read operations. The technology writes data to the first portion of the plurality of registers, and transfers the data from the first portion to the second portion.Type: ApplicationFiled: December 23, 2020Publication date: April 22, 2021Applicant: Intel CorporationInventors: Steven Hsu, Amit Agarwal, Debabrata Mohapatra, Arnab Raha, Moongon Jung, Gautham Chinya, Ram Krishnamurthy