Patents by Inventor John R. Nickolls

John R. Nickolls has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Processor having a finite field arithmetic unit utilizing an array of multipliers and adders

Patent number: 7343472

Abstract: A processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder. The instruction memory temporarily stores an instruction that includes at least one of: an operational code, destination information, and source information. The instruction decoder is operably coupled to interpret the instruction to identify the arithmetic logic unit and/or the finite field arithmetic unit to perform the operational code of the corresponding instruction. The instruction decoder then identifies at least one destination location within the digital storage device based on the destination information contained within the corresponding instruction. The instruction decoder then identifies at least one source location within the digital storage device based on the source information of the corresponding instruction.

Type: Grant

Filed: June 11, 2003

Date of Patent: March 11, 2008

Assignee: Broadcom Corporation

Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
Simulating multiported memories using lower port count memories

Patent number: 7339592

Abstract: An apparatus and method for simulating a multiported memory using lower port count memories as banks. A portion of memory is allocated for storing data associated with a thread. The portion of memory allocated to a thread may be stored in a single bank or in multiple banks. A collector unit coupled to each bank gathers source operands needed to process a program instruction as the source operands output from one or more banks. The collector unit outputs the source operands to an execution unit when all of the source operands needed to process the program instruction have been gathered.

Type: Grant

Filed: July 13, 2004

Date of Patent: March 4, 2008

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Ming Y. Siu, Simon S. Moy, Samuel Liu, John R. Nickolls
Galois field arithmetic unit for use within a processor

Patent number: 7313583

Abstract: A Galois field arithmetic unit includes a Galois field multiplier section and a Galois field adder section. The Galois field multiplier section includes a plurality of Galois field multiplier arrays that perform a Galois field multiplication by multiplying, in accordance with a generating polynomial, a 1st operand and a 2nd operand. The bit size of the 1st and 2nd operands correspond to the bit size of a processor data path, where each of the Galois field multiplier arrays performs a portion of the Galois field multiplication by multiplying, in accordance with a corresponding portion of the generating polynomial, corresponding portions of the 1st and 2nd operands. The bit size of the corresponding portions of the 1st and 2nd operands corresponds to a symbol size of symbols of a coding scheme being implemented by the corresponding processor.

Type: Grant

Filed: June 12, 2003

Date of Patent: December 25, 2007

Assignee: Broadcom Corporation

Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
Register based queuing for texture requests

Patent number: 7027062

Abstract: A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

Type: Grant

Filed: February 27, 2004

Date of Patent: April 11, 2006

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, John R. Nickolls, Simon S. Moy, Brett W. Coon
Pipelined multi-access memory apparatus and method

Patent number: 6976141

Abstract: A memory management system provides the ability for multiple requesters to access blocks of memory in a pipelined manner. During a first clock, requests for one or more of the memory blocks are received by the system. A determination is made of whether one of the memory blocks is requested by one or more requests. If the same memory block is requested by two or more requests, the system performs a further determination of which of the requests will be provided to the memory block. The determined request is provided to the memory block on the first clock. During a second clock, the data of the determined request is latched to the memory block and a memory access is initiated. If the request is a write request, the data is written to the memory block. If the request is a read request, then the requested data is retrieved and, on a third clock, the data is driven onto a bus, routed to the determined requester, and available to be latched into the requester on the fourth clock.

Type: Grant

Filed: November 2, 2001

Date of Patent: December 13, 2005

Assignee: Broadcom Corporation

Inventors: Lawrence J. Madar, III, John R. Nickolls, Ethan Mirsky
Reconfigurable processing system and method

Patent number: 6959378

Abstract: A reconfigurable processing system executes instructions and configurations in parallel. Initially, a first instruction loads configurations into configuration registers. The configuration field of a subsequently fetched instruction selects a configuration register. The instruction controls and controls of the configuration in the selected configuration register are decoded and modified as specified by the instruction. The controls provide data operands to the execution units which process the operands and generate results. Scalar data, vector data, or a combination of scalar and vector data can be processed. The processing is controlled by instructions executed in parallel with configurations invoked by configuration fields within the instructions. Vectors are processed using a vector register file which stores vectors. A vector address unit identifies addresses of vector elements in the vector register file to be processed.

Type: Grant

Filed: November 2, 2001

Date of Patent: October 25, 2005

Assignee: Broadcom Corporation

Inventors: John R. Nickolls, Scott D. Johnson, Mark Williams, Ethan Mirsky, Kambdur Kirthiranjan, Amrit Raj Pant, Lawrence J. Madar, III
Defect tolerant redundancy

Patent number: 6879207

Abstract: Circuits, methods, and apparatus for using redundant circuitry on integrated circuits in order to increase manufacturing yields. One exemplary embodiment of the present invention provides a circuit configuration wherein functional circuit blocks in a group of circuit blocks are selected by multiplexers. Multiplexers at the input and output of the group of circuit blocks steer input and output signals to and from functional circuit blocks, avoiding circuit blocks found to be defective or nonfunctional. Multiple groups of these circuit blocks may be arranged in series and in parallel. Alternate multiplexer configurations may be used in order to provide a higher level of redundancy. Other embodiments use all functional circuit blocks and sort integrated circuits based on the level of functionality or performance. Other embodiments provide methods of testing integrated circuits having one or more of these circuit configurations.

Type: Grant

Filed: December 18, 2003

Date of Patent: April 12, 2005

Assignee: NVIDIA Corporation

Inventor: John R. Nickolls
Galois field arithmetic unit for use within a processor

Publication number: 20040078411

Abstract: A Galois field arithmetic unit includes a Galois field multiplier section and a Galois field adder section. The Galois field multiplier section includes a plurality of Galois field multiplier arrays that perform a Galois field multiplication by multiplying, in accordance with a generating polynomial, a 1st operand and a 2nd operand. The bit size of the 1st and 2nd operands correspond to the bit size of a processor data path, where each of the Galois field multiplier arrays performs a portion of the Galois field multiplication by multiplying, in accordance with a corresponding portion of the generating polynomial, corresponding portions of the 1st and 2nd operands. The bit size of the corresponding portions of the 1st and 2nd operands corresponds to a symbol size of symbols of a coding scheme being implemented by the corresponding processor.

Type: Application

Filed: June 12, 2003

Publication date: April 22, 2004

Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
Galois field multiplier array for use within a finite field arithmetic unit

Publication number: 20040078410

Abstract: A Galois field multiplier array includes a 1st register, a 2nd register, a 3rd register, and a plurality of multiplier cells. The 1st register stores bits of a 1st operand. The 2nd register stores bits of a 2nd operand. The 3rd register stores bits of a generating polynomial that corresponds to one of a plurality of applications (e.g., FEC, CRC, Reed Solomon, et cetera). The plurality of multiplier cells is arranged in rows and columns. Each of the multiplier cells outputs a sum and a product and each cell includes five inputs. The 1st input receives a preceding cell's multiply output, the 2nd input receives at least one bit of the 2nd operand, the 3rd input receives a preceding cell's sum output, a 4th input receives at least one bit of the generating polynomial, and the 5th input receives a feedback term from a preceding cell in a preceding row. The multiplier cells in the 1st row have the 1st input, 3rd input, and 5th input set to corresponding initialization values in accordance with the 2nd operand.

Type: Application

Filed: June 12, 2003

Publication date: April 22, 2004

Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
Processor having a finite field arithmetic unit

Publication number: 20040078555

Abstract: A processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder. The instruction memory temporarily stores an instruction that includes at least one of: an operational code, destination information, and source information. The instruction decoder is operably coupled to interpret the instruction to identify the arithmetic logic unit and/or the finite field arithmetic unit to perform the operational code of the corresponding instruction. The instruction decoder then identifies at least one destination location within the digital storage device based on the destination information contained within the corresponding instruction. The instruction decoder then identifies at least one source location within the digital storage device based on the source information of the corresponding instruction.

Type: Application

Filed: June 11, 2003

Publication date: April 22, 2004

Inventors: Joshua Porten, Won Kim, Scott D. Johnson, John R. Nickolls
Reconfigurable processing system and method

Publication number: 20020087846

Abstract: A reconfigurable processing system executes instructions and configurations in parallel. Initially, a first instruction loads configurations into configuration registers. The configuration field of a subsequently fetched instruction selects a configuration register. The instruction controls and controls of the configuration in the selected configuration register are decoded and modified as specified by the instruction. The controls provide data operands to the execution units which process the operands and generate results. Scalar data, vector data, or a combination of scalar and vector data can be processed. The processing is controlled by instructions executed in parallel with configurations invoked by configuration fields within the instructions. Vectors are processed using a vector register file which stores vectors. A vector address unit identifies addresses of vector elements in the vector register file to be processed.

Type: Application

Filed: November 2, 2001

Publication date: July 4, 2002

Inventors: John R. Nickolls, Scott D. Johnson, Mark Williams, Ethan Mirsky, Kambdur Kirthiranjan, Amrit Raj Pant, Lawrence J. Madar
Pipelined multi-access memory apparatus and method

Publication number: 20020056032

Abstract: A memory management system provides the ability for multiple requesters to access blocks of memory in a pipelined manner. During a first clock, requests for one or more of the memory blocks are received by the system. A determination is made of whether one of the memory blocks is requested by one or more requests. If the same memory block is requested by two or more requests, the system performs a further determination of which of the requests will be provided to the memory block. The determined request is provided to the memory block on the first clock. During a second clock, the data of the determined request is latched to the memory block and a memory access is initiated. If the request is a write request, the data is written to the memory block. If the request is a read request, then the requested data is retrieved and, on a third clock, the data is driven onto a bus, routed to the determined requester, and available to be latched into the requester on the fourth clock.

Type: Application

Filed: November 2, 2001

Publication date: May 9, 2002

Inventors: Lawrence J. Madar, John R. Nickolls, Ethan Mirsky
Scalable processor to processor and processor to I/O interconnection network and method for parallel processing arrays

Patent number: 5598408

Abstract: A massively parallel computer system is disclosed having a global router network in which pipeline registers are spatially distributed to increase the messaging speed of the global router network. The global router network includes an expansion tap for processor to I/O messaging so that I/O messaging bandwidth matches interprocessor messaging bandwidth. A route-opening message packet includes protocol bits which are treated homogeneously with steering bits. The route-opening packet further includes redundant address bits for imparting a multiple-crossbars personality to router chips within the global router network. A structure and method for spatially supporting the processors of the massively parallel system and the global router network are also disclosed.

Type: Grant

Filed: January 14, 1994

Date of Patent: January 28, 1997

Assignee: MasPar Computer Corporation

Inventors: John R. Nickolls, John Zapisek, Won S. Kim, Jeffrey C. Kalb, W. Thomas Blank, Eliot Wegbreit, Kevin Van Horn
Parallel processor memory transfer system using parallel transfers between processors and staging registers and sequential transfers between staging registers and memory

Patent number: 5581777

Abstract: A massively parallel processor is provided with a plurality of clusters. Each cluster includes a plurality of processor elements ("PEs") and a cluster memory. Each PE of the cluster has associated with it an address register, a stage register, an error register, a PE enable flag, a memory flag, and a grant request flag. A cluster data bus and an error bus connects each of the stage registers and error registers of the cluster to the memory. The grant request flags of the cluster are interconnected by a polling network, which polls only one of the grant request flags at a time. In response to a signal on the polling network and the state of the associated memory flag, the grant request flag determines an I/O operation between the associated data register and the cluster memory over the cluster data bus. Both data and error bits are associated with respective processor elements. The sequential memory operations proceed in parallel with parallel processor operations.

Type: Grant

Filed: March 3, 1995

Date of Patent: December 3, 1996

Assignee: MasPar Computer Corporation

Inventors: Won S. Kim, David M. Bulfer, John R. Nickolls, W. Thomas Blank, Hannes Figel
Parallel processor system with highly flexible local control capability, including selective inversion of instruction signal and control of bit shift amount

Patent number: 5542074

Abstract: A parallel processor system which operates in a single-instruction multiple-data mode has a highly flexible local control capability for enabling the system to operate fast. The system contains an array of processing elements or PEs (12.sub.1 -12.sub.N) that process respective sets of data according to instructions supplied from a global control unit (20). Each instruction is furnished simultaneously to all the PEs. One local control feature (52) entails selectively inverting certain instruction signals according to a data-dependent signal. Another local control feature (48) involves controlling the amount of a bit shift in a barrel shifter (98) according to a data-dependent signal.

Type: Grant

Filed: October 22, 1992

Date of Patent: July 30, 1996

Assignee: MasPar Computer Corporation

Inventors: Won S. Kim, John R. Nickolls
Broadcasting headers to configure physical devices interfacing a data bus with a logical assignment and to effect block data transfers between the configured logical devices

Patent number: 5488694

Abstract: To effect a block data transfer between a plurality of physical I/O devices coupled through interfaces to an I/O channel ("IOC") bus, a source logical device is established by programmably assigning to each of the physical device interfaces a logical device identifier, a leaf identifier determining when the physical device participates relative to the first data transfer in the block data transfer, a burst count specifying the number of consecutive transfers for which the physical device is responsible when its interleave period arrives, and an interleave factor identifying how often the physical device participates in the block data transfer. A destination logical device is similarly established. The source and logical devices are then activated to accomplish a block transfer of data between them.

Type: Grant

Filed: August 28, 1992

Date of Patent: January 30, 1996

Assignee: MasPar Computer Company

Inventors: Mark P. McKee, John Zapisek, David M. Bulfer, John M. Long, John R. Nickolls, William T. Blank
Scalable processor to processor and processor-to-I/O interconnection network and method for parallel processing arrays

Patent number: 5280474

Abstract: A massively parallel computer system is disclosed having a global router network in which pipeline registers are spatially distributed to increase the messaging speed of the global router network. The global router network includes an expansion tap for processor to I/O messaging so that I/O messaging bandwidth matches interprocessor messaging bandwidth. A route-opening message packet includes protocol bits which are treated homogeneously with steering bits. The route-opening packet further includes redundant address bits for imparting a multiple-crossbars personality to router chips within the global router network. A structure and method for spatially supporting the processors of the massively parallel system and the global router network are also disclosed.

Type: Grant

Filed: January 5, 1990

Date of Patent: January 18, 1994

Assignee: Maspar Computer Corporation

Inventors: John R. Nickolls, John Zapisek, Won S. Kim, Jeffery C. Kalb, W. Thomas Blank, Eliot Wegbreit, Kevin Van Horn
Input/output system for parallel processing arrays

Patent number: 5243699

Abstract: A massively parallel processor includes an array of processor elements (20), of PEs, and a multi-stage router interconnection network (30), which is used both for I/O communications and for PE to PE communications. The I/O system (10) for the massively parallel processor is based on a globally shared addressable I/O RAM buffer memory (50) that has address and data buses (52) to the I/O devices (80, 82) and other address and data buses (42) which are coupled to a router I/O element array (40). The router I/O element array is in turn coupled to the router ports (e.g. S2.sub.-- 0.sub.-- X0) of the second stage (430) of the router interconnection network. The router I/O array provides the corner turn conversion between the massive array of router lines (32) and the relatively few buses (52) to the I/O devices.

Type: Grant

Filed: December 6, 1991

Date of Patent: September 7, 1993

Assignee: MasPar Computer Corporation

Inventors: John R. Nickolls, Won S. Kim, John Zapisek, William T. Blank

prev 1 2 3 4 5