Patents Assigned to Vivante Corporation
-
Patent number: 11200495Abstract: A convolution neural network (CNN) model is trained and pruned at a pruning ratio. The model is then trained and pruned one or more times without constraining the model according to any previous pruning step. The pruning ratio may be increased at each iteration until a pruning target is reached. The model may then be trained again with pruned connections masked. The process of pruning, retraining, and adjusting the pruning ratio may also be repeated one or more times with a different pruning target.Type: GrantFiled: September 8, 2017Date of Patent: December 14, 2021Assignee: Vivante CorporationInventors: Xin Wang, Shang-Hung Lin
-
Patent number: 10585623Abstract: A computer system includes a hardware buffer controller. Memory access requests to a buffer do not include an address within the buffer and threads accessing the buffer do not access or directly update any pointers to locations within the buffer. The memory access requests are addressed to the hardware buffer controller, which determines an address from its current state and issues a memory access command to that address. The hardware buffer controller updates its state in response to the memory access requests. The hardware buffer controller evaluates its state and outputs events to a thread scheduler in response to overflow or underflow conditions or near-overflow or near-underflow conditions. The thread scheduler may then block threads from issuing memory access requests to the hardware buffer controller. The buffer implemented may be a FIFO or other type of buffer.Type: GrantFiled: December 11, 2015Date of Patent: March 10, 2020Assignee: VIVANTE CORPORATIONInventor: Mankit Lo
-
Patent number: 10242311Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.Type: GrantFiled: August 8, 2017Date of Patent: March 26, 2019Assignee: VIVANTE CORPORATIONInventor: Mankit Lo
-
Patent number: 9977619Abstract: A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.Type: GrantFiled: November 6, 2015Date of Patent: May 22, 2018Assignee: Vivante CorporationInventor: Mankit Lo
-
Patent number: 9928117Abstract: A computer system includes a hardware synchronization component (HSC). Multiple concurrent threads of execution issue instructions to update the state of the HSC. Multiple threads may update the state in the same clock cycle and a thread does not need to receive control of the HSC prior to updating its states. Instructions referencing the state received during the same clock cycle are aggregated and the state is updated according to the number of the instructions. The state is evaluated with respect to a threshold condition. If it is met, then the HSC outputs an event to a processor. The processor then identifies a thread impacted by the event and takes a predetermined action based on the event (e.g. blocking, branching, unblocking of the thread).Type: GrantFiled: December 11, 2015Date of Patent: March 27, 2018Assignee: Vivante CorporationInventor: Mankit Lo
-
Patent number: 9875084Abstract: A circuit is disclosed that uses a four element dot product circuit (DP4) to approximate an argument t=x/pi for an input x. The argument is then input to a trigonometric function such as Sin Pi( ) or Cos Pi( ). The DP4 circuit calculates x times a representation of the reciprocal of pi. The bits of the reciprocal of pi that are used are selected based on the magnitude of the exponent of x. The DP4 circuit includes four multipliers, two intermediate adders, and a final adder. The outputs of the multipliers, intermediate adders, and final adder are adjusted such that the output of the final adder is a value of the argument t that will provide an accurate output when input to the trigonometric function.Type: GrantFiled: April 28, 2016Date of Patent: January 23, 2018Assignee: Vivante CorporationInventors: Lefan Zhong, Guosong Li, Zhenyu Wang, Rui Zhao
-
Patent number: 9703530Abstract: Mathematical functions are computed in a single pipeline performing a polynomial approximation (e.g. a quadratic approximation, or the like) using data tables for RCP, SQRT, EXP or LOG using a single pipeline according and opcodes. SIN and COS are also computed using the pipeline according to the approximation ((?1)^IntX)*Sin(?*Min(FracX, 1.0?FracX)/Min(FracX, 1.0?FracX). A pipeline portion approximates Sin(?*FracX) using tables and interpolation and a subsequent stage multiplies this approximation by FracX. For input arguments of x close 1.0. LOG 2(x?1)/(x?1) is computed using a first pipeline portion using tables and interpolation and subsequently multiplied by (x?1). A DIV operation may also be performed with input arguments scaled up to avoid underflow as needed. Inverse trigonometric functions may be calculated using a pre-processing stage and post processing stage in order to obtain multiple inverse trigonometric functions from a single pipeline.Type: GrantFiled: April 7, 2015Date of Patent: July 11, 2017Assignee: Vivante CorporationInventors: Lefan Zhong, Wei-Lun Kao
-
Patent number: 9600236Abstract: Mathematical functions are computed in a single pipeline performing a polynomial approximation (e.g. a quadratic approximation, or the like); and one or more data tables corresponding to at least one of the RCP, SQRT, EXP or LOG functions operable to be coupled to the single pipeline according to one or more opcodes; wherein the single pipeline is operable for computing at least one of RCP, SQRT, EXP or LOG functions according to the one or more opcodes. SIN and COS are also computed using the pipeline according to the approximation ((?1)^IntX)*Sin(?*Min(FracX, 1.0?FracX)/Min(FracX, 1.0?FracX). A pipeline portion approximates Sin(?*FracX) using tables and interpolation and a subsequent stage multiplies this approximation by FracX. For input arguments of x close 1.0. LOG 2(x?1)/(x?1) is computed using a first pipeline portion using tables and interpolation and subsequently multiplied by (x?1). A DIV operation may also be performed with input arguments scaled up to avoid underflow as needed.Type: GrantFiled: September 15, 2014Date of Patent: March 21, 2017Assignee: VIVANTE CORPORATIONInventors: Mike M. Cai, Lefan Zhong
-
Patent number: 9460525Abstract: Systems and method for tile-based compression are disclosed. Image data, such as a frame, may be divided into tiles. The tiles may be sized based on a size of a line buffer. Tiles are compressed and decompressed individually. As portions of the image frame are updated, corresponding updated tiles may be compressed and stored. Likewise, as tiles are accessed they may be de-compressed and streamed to a requesting device. In some embodiments, a decoder operable to decompress tiles may be interposed between a memory device and a requesting device. Data encoding one or more compressed tiles may be grouped to enable decompression at a rate of four pixels per clock cycle. Methods for compressing image data including both RGB and RGB? components are disclosed.Type: GrantFiled: June 17, 2013Date of Patent: October 4, 2016Assignee: Vivante CorporationInventors: Lefan Zhong, Halim Theny, Huiming Zhang
-
Patent number: 9349213Abstract: A system for blending includes a memory device, cache, cache controller, and a graphics processing device. The graphics processing device performs blending of a plurality of source images into a single destination image. The graphics processing device performs a method including, for each tile position in the plurality of source images, requesting tiles for the tile position form each source image, blending the tiles individually with a destination tile and overwriting the destination tile in the cache with the result of the blending after each individual blending. The destination tile may be written to memory after each source tile for the each tile position has been blended with the destination tile, such as in response to a cache controller determining that the destination tile is a least recently used (LRU) entry in the cache.Type: GrantFiled: September 9, 2013Date of Patent: May 24, 2016Assignee: VIVANTE CORPORATIONInventors: Haomin Wu, Frido Garritsen
-
Patent number: 9077313Abstract: Disclosed are new approaches to Multi-dimensional filtering with a reduced number of memory reads and writes. In one embodiment, a filter includes first and second coefficients. A block of a data having width and height each equal to the number of one of the first or second coefficients is read from a memory device. Arrays of values from the block are filtering using the first filter coefficients and the results filtered using the second coefficients. The final result may be optionally blended with another data value and written to a memory device. Registers store results of filtering with the first coefficients. The block of data may be read from a location including a source coordinate. The final result of filtering may be written to a destination coordinate obtained by rotating and/or mirroring the source coordinate. The orientation of arrays filtered using the first coefficients varies according to a rotation mode.Type: GrantFiled: October 14, 2011Date of Patent: July 7, 2015Assignee: VIVANTE CORPORATIONInventors: Mike M. Cai, Huiming Zhang
-
Publication number: 20150070393Abstract: A system for blending is disclosed including a memory device, cache, cache controller, and a graphics processing device. The graphics processing device performs blending of a plurality of source images into a single destination image. The graphics processing device performs a method including, for each tile position in the plurality of source images, requesting tiles for the tile position form each source image, blending the tiles individually with a destination tile and overwriting the destination tile in the cache with the result of the blending after each individual blending. The destination tile may be written to memory after each source tile for the each tile position has been blended with the destination tile, such as in response to a cache controller determining that the destination tile is a least recently used (LRU) entry in the cache.Type: ApplicationFiled: September 9, 2013Publication date: March 12, 2015Applicant: Vivante CorporationInventors: Haomin Wu, Frido Garritsen
-
Patent number: 8907964Abstract: A system to process a plurality of vertices to model an object. An embodiment of the system includes a processor, a front end unit coupled to the processor, and cache configuration logic coupled to the front end unit and the processor. The processor is configured to process the plurality of vertices. The front end unit is configured to communicate vertex data to the processor. The cache configuration logic is configured to establish a cache line size of a vertex cache based on a vertex size of a drawing command.Type: GrantFiled: April 10, 2007Date of Patent: December 9, 2014Assignee: Vivante CorporationInventors: Keith Lee, Mike M. Cai
-
Patent number: 8553046Abstract: An apparatus and method for detecting and handling thin lines in a raster image includes reading depth values for each pixel of an n×m block of pixels surrounding a substantially central pixel. Differences are then calculated for selected depth values of the n×m block of pixels to yield multiple difference values. These difference values may then be compared with multiple pre-computed difference values associated with thin lines pre-determined to pass through the n×m block of pixels. If the difference values of the pixel block substantially match the difference values of one of the pre-determined thin lines, the pixel block may be deemed to describe a thin line. The apparatus and method may preclude application of an anti-aliasing filter to the substantially central pixel of the pixel block in the event it describes a thin line.Type: GrantFiled: November 9, 2007Date of Patent: October 8, 2013Assignee: Vivante CorporationInventors: Lefan Zhong, Abdulkadir Utku Diril
-
Patent number: 8554008Abstract: A system to reduce aliasing in a graphical image includes an edge detector configured to read image depth information from a depth buffer. The edge detector also applies edge detection procedures to detect an object edge within the image. An edge style detector is configured to identify a first edge end and a second edge end. The edge style detector also identifies an edge style associated with the detected edge based on the first edge end and the second edge end. The system also includes a restoration module configured to identify pixel data associated with the detected edge and a blending module configured to blend the pixel data associated with the detected edge.Type: GrantFiled: April 13, 2010Date of Patent: October 8, 2013Assignee: Vivante CorporationInventors: Lefan Zhong, Mike M. Cai
-
Patent number: 8487948Abstract: A graphic processing system to compute a texture level of detail. An embodiment of the graphic processing system includes a memory device, a driver, and level of detail computation logic. The memory device is configured to implement a first lookup table. The first lookup table is configured to provide a first level of detail component. The driver is configured to calculate a log value of a second level of detail component. The level of detail computation logic is coupled to the memory device and the driver. The level of detail computation logic is configured to compute a level of detail for a texture mapping operation based on the first level of detail component from the lookup table and the second level of detail component from the driver. Embodiments of the graphic processing system facilitate a simple hardware implementation using operations other than multiplication, square, and square root operations.Type: GrantFiled: December 21, 2011Date of Patent: July 16, 2013Assignee: Vivante CorporationInventors: Mike M. Kai, Jean-Didier Allegrucci, Anthony Ya-Nai Tai
-
Publication number: 20130097212Abstract: Disclosed are new approaches to Multi-dimensional filtering with a reduced number of memory reads and writes. In one embodiment, a filter includes first and second coefficients. A block of a data having width and height each equal to the number of one of the first or second coefficients is read from a memory device. Arrays of values from the block are filtering using the first filter coefficients and the results filtered using the second coefficients. The final result may be optionally blended with another data value and written to a memory device. Registers store results of filtering with the first coefficients. The block of data may be read from a location including a source coordinate. The final result of filtering may be written to a destination coordinate obtained by rotating and/or mirroring the source coordinate. The orientation of arrays filtered using the first coefficients varies according to a rotation mode.Type: ApplicationFiled: October 14, 2011Publication date: April 18, 2013Applicant: Vivante CorporationInventors: Mike M. Cai, Huiming Zhang
-
Publication number: 20130091189Abstract: Methods and apparatus is provided for computing mathematical functions comprising a single pipeline for performing a polynomial approximation (e.g. a quadratic polynomial approximation, or the like); and one or more data tables corresponding to at least one of the RCP, SQRT, EXP or LOG functions operable to be coupled to the single pipeline according to one or more opcodes; wherein the single pipeline is operable for computing at least one of RCP, SQRT, EXP or LOG functions according to the one or more opcodes.Type: ApplicationFiled: November 30, 2012Publication date: April 11, 2013Applicant: Vivante CorporationInventor: Vivante Corporation
-
Patent number: 8416241Abstract: An apparatus and method for rasterizing a primitive in a graphics system is disclosed in one example of the invention as including scanning a first row of tiles, one tile at a time, starting from a first point and scanning in a first direction. Immediately after scanning the first row of tiles, the method includes moving from the first point to a second point in an orthogonal direction relative to the first row. Immediately after moving from the first point to the second point, the method includes scanning a second row of tiles, one tile at a time, starting from the second point and scanning in the first direction. By scanning rows in the same direction immediately prior to and after moving from one row to another, cache utilization is improved.Type: GrantFiled: July 21, 2011Date of Patent: April 9, 2013Assignee: Vivante CorporationInventors: Abdulkadir Utku Diril, Frido Garritsen
-
Publication number: 20130002651Abstract: A graphic processing system to compute a texture level of detail. An embodiment of the graphic processing system includes a memory device, a driver, and level of detail computation logic. The memory device is configured to implement a first lookup table. The first lookup table is configured to provide a first level of detail component. The driver is configured to calculate a log value of a second level of detail component. The level of detail computation logic is coupled to the memory device and the driver. The level of detail computation logic is configured to compute a level of detail for a texture mapping operation based on the first level of detail component from the lookup table and the second level of detail component from the driver. Embodiments of the graphic processing system facilitate a simple hardware implementation using operations other than multiplication, square, and square root operations.Type: ApplicationFiled: December 21, 2011Publication date: January 3, 2013Applicant: Vivante CorporationInventors: Mike M. Cai, Jean-Didier Allegrucci, Anthony Ya-Nai Tai