Abstract: A method and system are provided for performing efficient and effective scheduling in a multi-threaded system. Dynamic control of scheduling is provided, in which priority weights can be assigned for some or all of the threads in the multi-threaded system. The priority weights are employed to control prioritization of threads and thread instructions by a scheduler. An instruction count for each thread is used in combination with the priority weights to determine the prioritization order in which instructions are fetched and assigned to execution units for processing.
Abstract: Systems, apparatusses, and methods are disclosed for transmission control protocol (TCP) segmentation offload (TSO). A hardware TSO engine is capable of handling segmentation of data packets and consequent header field mutation of hundreds of flows simultaneously. The TSO engine generates data pointers in order to “cut up” the payload data of a data packet, thereby creating multiple TCP segments. Once the data of the data packet has been fetched, the TSO engine “packs” the potentially-scattered chunks of data into TCP segments, and recalculates each TCP segment's internet protocol (IP) length, IP identification (ID), IP checksum, TCP sequence number, and TCP checksum, as well as modifies the TCP flags. The TSO engine is able to rapidly switch contexts, and share the control logic amongst all flows.
Abstract: Approaches for a packet format for error reporting in a content addressable memory (CAM) device are disclosed. The CAM device may comprise a CAM array that includes a plurality of rows, each row including a plurality of CAM cells coupled to a match line, and an error notification circuit capable of forming a packet that indicates whether the CAM device is experiencing an error condition. If an error condition was experienced by the CAM device, the response packet may also indicate the type(s) of error that was encountered. Advantageously, information about any error condition experienced by the CAM device may be quickly ascertained by a host device in which the CAM device is incorporated.
Abstract: A CAM device includes a CAM array that can implement column redundancy in which a defective column segment in a selected block can be functionally replaced by a selected column segment of the same block, and/or by a spare column segment of the same block.
November 21, 2011
Date of Patent:
March 17, 2015
Netlogic Microsystems, Inc.
Varadarajan Srinivasan, Bindiganavale S. Nataraj, Sandeep Khanna
Abstract: A system and method are provided for reducing a latency associated with timestamps in a multi-core, multi threaded processor. A processor capable of simultaneously processing a plurality of threads is provided. The processor includes a plurality of cores, a plurality of network interfaces for network communication, and a timer circuit for reducing a latency associated with timestamps used for synchronization of the network communication utilizing a precision time protocol.
September 11, 2013
March 12, 2015
NetLogic Microsystems, Inc.
Ahmed SHAHID, Kaushik Kuila, David T. Hass
Abstract: The present invention relates to a low power serial link employing differential return-to-zero signaling. A receiver circuit consistent with some embodiments includes an input circuit for receiving differential serial data signals that form a differential return-to-zero signaling and a clock recovery circuit. The clock recovery circuit is coupled to the input circuit and includes a logic gate configured to generate a clock signal by using said differential serial data signals.
Abstract: A processor includes a plurality of processor cores, a networking output, and a packet ordering device. The packet ordering device determines an ordering for packets that are processed by the processor cores. The packets are released to the networking output in a determined order.
Abstract: Embodiments of circuits and methods are described for decreasing transmitter waveform dispersion penalty (TWDP) in a transmitter. A data stream is received for transmission across a channel and a main data signal is generated from the data stream. At least two cursor signals are generated where each of the at least two cursor signals are shifted at least a portion of a clock period from the main data signal. The at least two cursor signals are subtracted from the main data signal to generate an output data signal with improved TWDP. Other embodiments include generating a main data signal, a pre-cursor signal shifted on previous clock cycle relative to the main data signal, and a post-cursor signal Shifted one subsequent clock cycle relative to the main data signal. The pre and post cursor signals are subtracted from the main data signal to generate an output data signal.
Abstract: Systems, apparatusses, and methods are disclosed for improving performance of a stride-based prefetcher on an out-of-order central processing unit (CPU). The present disclosure teaches a processor system that employs out-of-order stride prefetch units. The out-of-order stride prefetch units are utilized for issuing prefetches for out-of-order stride access patterns. In one or more embodiments, the out-of-order stride prefetch units examine the offsets between past virtual address (VA) accesses and the directions of the past VA accesses in order to generate an estimate of the underlying VA access stride of the executed program code (PC). In at least one embodiment, the out-of-order stride prefetch units use the estimate of the VA access stride in order to generate a prediction of future VA accesses. In some embodiments, after the out-of-order stride prefetch units have generated the prediction of future VA accesses, the out-of-order stride prefetch units prefetch the predicted future VA accesses.
Abstract: The disclosed packet scheduler implements the deficit round robin (DRR) approximation of weighted fair queuing (WFQ), and is capable of achieving complete fairness across several hundred source flows, for example, each of which can be mapped to one of several destination ports. In addition to achieving fairness, the packet scheduler allows the user to map one or more optional strict-priority flows to each port. The packet scheduler keeps these strict-priority flows “outside” of the group of flows for which fairness is enforced. Each destination port can be optionally configured to chop its data packets into sub-packet pieces. The packet scheduler works in two mutually orthogonal dimensions: (1.) it selects destination ports based on a round-robin scheme, or using another method, such as guaranteed rate port scheduling (GRPS), and (2.) it implements optional strict-priority scheduling, and DRR scheduling.
Abstract: A content search system for determining whether an input string matches one or more of a number of patterns embodied by a deterministic finite automaton (DFA) includes a plurality of DFA engines that simultaneously compare sequential overlapping segments of the input string. The overlap region shared by adjacent pairs of input string segments is of a predetermined size. Initially, the first DFA engine is designated as the master engine, and the remaining DFA engines are designated as slave engines whose state results are speculative. Resolution logic compares the state results of the master engine with the state results of the adjacent slave engine to selectively validate the state results of the successor engine, which upon validation becomes the new master engine.
Abstract: In one form, a video processing device (150) includes a memory (110, 130) and a plurality of staged macroblock processing engines (112, 114, 116). The memory (110, 130) is operable to store partially decoded video data decoded from a stream of encoded video data. The plurality of staged macroblock processing engines (112, 114, 116) is coupled to the memory (110, 130) and is responsive to a request to process the partially decoded video data to generate a plurality of macroblocks of decoded video data. In another form, a first a first macroblock of decoded video data having a first location (426) within a first row (408) of a video frame (400) is generated, and a second macroblock of decoded video data having a second location (424) within a second row (410) of the video frame (400) is generated during the generating of the first macroblock.
Abstract: A method and system are described for canceling an echo signal with an echo canceller in the analog domain. In one embodiment, a system includes an echo canceller that includes an interpolation unit, operating in a digital domain, that receives a first digital echo estimate signal from an LMS unit and generates a second digital echo estimate signal without oversampling. A digital-to-analog converter (DAC) receives the second digital echo estimate signal and generates an analog echo estimate signal without oversampling. The echo canceller prevents the DAC from adding a high frequency component to the analog echo estimate signal. A subtractor adds the analog echo signal to an incoming signal having an echo signal. The subtractor generates an analog signal with reduced echo signal in the useful frequency band of the incoming signal.
Abstract: A content addressable memory (CAM) device having any number of rows, each of the rows including a match line connected to a plurality of CAM cells, a match line detector circuit, and an incremental match line charge circuit. The detector circuit generates a feedback signal based on a detected match line voltage. The charge circuit partially pre-charges the match line to an intermediate voltage during a pre-charge phase of a compare operation, and then selectively charges the match line higher towards a supply voltage in response to the feedback signal.
November 29, 2011
Date of Patent:
December 16, 2014
Netlogic Microsystems, Inc.
Sandeep Khanna, Bindiganavale S. Nataraj, Varadarajan Srinivasan
Abstract: An integrated circuit device for delivering power to a load includes a controller circuit, a cascade circuit, and a power delivery circuit. The controller circuit generates a plurality of control signals. The cascade circuit receives the control signals from the controller circuit and sequentially outputs the control signals onto a cascade bus. The power delivery circuit receives the control signals from the controller circuit and delivers an amount of current to the load, in response to one of the control signals.
September 14, 2012
Date of Patent:
December 2, 2014
NetLogic Microsystems, Inc.
Sandeep Khanna, Maheshwaran Srinivasan, De Cai Li, Chetan Deshpande
Abstract: A pipelined search engine supports a tree of search keys therein that utilizes span prefix masks to assist in longest prefix match (LPM) detection when the tree is searched. Each of a plurality of the span prefix masks encodes a prefix length of a search key to which the span prefix mask is associated and a value of another search key in the tree that is a prefix match to the search key to which the span prefix mask is associated.
Abstract: A content addressable memory (CAM) device to dynamically reduces power consumption between a search key and data stored in a plurality of CAM blocks by selectively disabling a number of CAM blocks, requested for the search operation by an external network processor, based upon the contents of the search key.
Abstract: A method and apparatus are disclosed for determining whether an input string of characters matches a pattern. The pattern has the form of an activator expression, a counter expression, and a tail. The method involves monitoring one or more active states associated with the pattern, and comparing each character to the activator expression and the counter expression for each of the one or more active states. An input character match to the activator expression comprises an activator match, and a character match to the counter expression without matching the activator expression comprises a non-activator match. The number of one or more active states corresponds to the number of non-activator to activator character transitions between adjacent received matching characters.
Abstract: An integrated circuit is disclosed. The integrated circuit includes a receive port interface to receive request data at a first data rate from a first host and a transmit port interface. The transmit port interface to transmit response data words across plural serial lanes to a second host at a second data rate. The second data rate is less than a predefined line rate of symbol transfers across the plural serial lanes. The transmit port interface includes shaping logic to transmit a data word stream at the second data rate and selectively insert idle words into the data word stream such that the data words and the idle words are together transferred at the predefined line rate.
Abstract: A guaranteed rate port scheduler (GRPS) is used for serving multiple destination ports simultaneously without under-runs, even if the total bandwidth of the ports is more than the bandwidth capability of the device. Certain network protocols, such as Ethernet, do not allow “gaps” (called under-runs) to occur between bits of a packet on the wire. If a network device is transmitting packets to several such ports at the same time and the combined bandwidth of these ports is more than the device can source, under-runs begin to occur within the transmitted packets. The disclosed GRPS solves this problem by: (a) the GRPS serves only as many destination ports at a given time as can be “handled”, and (b) the GRPS fairly selects new destination ports to serve after every end-of-frame data packet transmission by effectively “de-rating” the statistical bandwidth of each destination port in proportion to the diminished capacity of the device.