Patents by Inventor Ari W. Mozes

Ari W. Mozes has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for partitioning models in a database

Patent number: 11176480

Abstract: Systems, methods, and other embodiments are disclosed for partitioning models in a database. In one embodiment, a set of training data is parsed into multiple data partitions based on partition keys, where the data partitions are identified by the partition keys and are used for training data mining models. The multiple data partitions are analyzed to generate partition metrics data. Algorithm data, identifying at least one algorithm for processing the multiple data partitions, and resources data, identifying available modeling resources for processing the multiple data partitions, are read. The partition metrics data, the algorithm data, and the resources data are processed to generate an organization data structure. The organization data structure is configured to control distribution and processing of the multiple data partitions across the available modeling resources to generate a composite model object that includes a separately trained data mining model for each partition of the multiple partitions.

Type: Grant

Filed: August 2, 2016

Date of Patent: November 16, 2021

Assignee: Oracle International Corporation

Inventors: Ari W. Mozes, Boriana L. Milenova, Marcos M. Campos, Mark A. McCracken, Gayathri P. Ayyappan
System and method providing association rule aggregates

Patent number: 10885047

Abstract: Systems, methods, and other embodiments are disclosed for performing data mining. In one embodiment, transaction records are read one at a time. Each transaction record represents a transaction for at least one item and includes an item identifier and a metric value for the item. The number-of-occurrences of at least one candidate item set in the transaction records are counted to generate a total count for the candidate item set. The candidate item set includes one or more items. As the counting proceeds, at least one aggregate metric value associated with the candidate item set is accumulated by summing metric values across the number-of-occurrences for each item represented in the candidate item set. A determination is made as to whether the candidate item set is a frequent item set in the transaction records by comparing the total count to a threshold value.

Type: Grant

Filed: July 1, 2016

Date of Patent: January 5, 2021

Assignee: Oracle International Corporation

Inventors: Ari W. Mozes, Dongfang Bai
SYSTEM AND METHOD PROVIDING ASSOCIATION RULE AGGREGATES

Publication number: 20180004816

Abstract: Systems, methods, and other embodiments are disclosed for performing data mining. In one embodiment, transaction records are read one at a time. Each transaction record represents a transaction for at least one item and includes an item identifier and a metric value for the item. The number-of-occurrences of at least one candidate item set in the transaction records are counted to generate a total count for the candidate item set. The candidate item set includes one or more items. As the counting proceeds, at least one aggregate metric value associated with the candidate item set is accumulated by summing metric values across the number-of-occurrences for each item represented in the candidate item set. A determination is made as to whether the candidate item set is a frequent item set in the transaction records by comparing the total count to a threshold value.

Type: Application

Filed: July 1, 2016

Publication date: January 4, 2018

Inventors: Ari W. MOZES, Dongfang BAI
SYSTEM AND METHOD FOR PARTITIONING MODELS IN A DATABASE

Publication number: 20170308809

Abstract: Systems, methods, and other embodiments are disclosed for partitioning models in a database. In one embodiment, a set of training data is parsed into multiple data partitions based on partition keys, where the data partitions are identified by the partition keys and are used for training data mining models. The multiple data partitions are analyzed to generate partition metrics data. Algorithm data, identifying at least one algorithm for processing the multiple data partitions, and resources data, identifying available modeling resources for processing the multiple data partitions, are read. The partition metrics data, the algorithm data, and the resources data are processed to generate an organization data structure. The organization data structure is configured to control distribution and processing of the multiple data partitions across the available modeling resources to generate a composite model object that includes a separately trained data mining model for each partition of the multiple partitions.

Type: Application

Filed: August 2, 2016

Publication date: October 26, 2017

Inventors: Ari W. MOZES, Boriana L. MILENOVA, Marcos M. CAMPOS, Mark A. MCCRACKEN, Gayathri P. AYYAPPAN
System and method for building decision trees in a database

Patent number: 9135309

Abstract: A computer-implemented method of creating a data mining model in a database management system comprises accepting a database language statement at the database management system, the database language statement indicating a dataset and a data mining model to be created from the dataset, and creating, in the database management system, the indicated data mining model using the indicated dataset, wherein creation and application of the data mining model does not require moving data to a separate data mining engine.

Type: Grant

Filed: November 18, 2011

Date of Patent: September 15, 2015

Assignee: Oracle International Corporation

Inventors: Wei Li, Shiby Thomas, Joseph Yarmus, Ari W. Mozes, Mahesh Jagannath
Binning predictors using per-predictor trees and MDL pruning

Patent number: 8280915

Abstract: Binning of predictor values used for generating a data mining model provides useful reduction in memory footprint and computation during the computationally dominant decision tree build phase, but reduces the information loss of the model and reduces the introduction of false information artifacts. A method of binning data in a database for data mining modeling in a database system, the data stored in a database table in the database system, the data mining modeling having selected at least one predictor and one target for the data, the data including a plurality of values of the predictor and a plurality of values of the target, the method comprises constructing a binary tree for the predictor that splits the values of the predictor into a plurality of portions, pruning the binary tree, and defining as bins of the predictor leaves of the tree that remain after pruning, each leaf of the tree representing a portion of the values of the predictor.

Type: Grant

Filed: February 1, 2006

Date of Patent: October 2, 2012

Assignee: Oracle International Corporation

Inventors: Mahesh Jagannath, Chitra Bhagwat, Joseph Yarmus, Ari W. Mozes
System And Method For Building Decision Trees In A Database

Publication number: 20120066260

Abstract: A computer-implemented method of creating a data mining model in a database management system comprises accepting a database language statement at the database management system, the database language statement indicating a dataset and a data mining model to be created from the dataset, and creating, in the database management system, the indicated data mining model using the indicated dataset, wherein creation and application of the data mining model does not require moving data to a separate data mining engine.

Type: Application

Filed: November 18, 2011

Publication date: March 15, 2012

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventors: Wei LI, Shiby THOMAS, Joseph YARMUS, Ari W. MOZES, Mahesh JAGANNATH
System and method for building decision trees in a database

Patent number: 8065326

Abstract: Decision trees are efficiently represented in a relational database. A computer-implemented method of representing a decision tree model in relational form comprises providing a directed acyclic graph comprising a plurality of nodes and a plurality of links, each link connecting a plurality of nodes, encoding a tree structure by including in each node a parent-child relationship of the node with other nodes, encoding in each node information relating to a split represented by the node, the split information including a splitting predictor and a split value, and encoding in each node a target histogram.

Type: Grant

Filed: February 1, 2006

Date of Patent: November 22, 2011

Assignee: Oracle International Corporation

Inventors: Wei Li, Shiby Thomas, Joseph Yarmus, Ari W. Mozes, Mahesh Jagannath
Frequent itemset counting using subsets of bitmaps

Patent number: 7756853

Abstract: A method and mechanism for performing improved frequent itemset operations is provided. A set of item groups are divided into a plurality of subsets. Each item group is composed of a set of data items. Possible combinations of data items that may frequently appear together in the same item group are referred to as candidate combinations. Candidate combinations comprising a first set of data items are identified, and thereafter the occurrence of each candidate combination in any item group in each subset is counted by comparing item bitmaps, associated with items in the candidate combination, in each subset in turn. The comparison of item bitmaps is performed in volatile memory. A total frequent itemset count that describes the frequency of candidate combinations in items groups across all subsets is obtained. Thereafter, the total frequent itemset count for candidate combinations having a larger number of data items may be determined.

Type: Grant

Filed: August 27, 2004

Date of Patent: July 13, 2010

Assignee: Oracle International Corporation

Inventors: Wei Li, Ari W. Mozes, Hakan Jakobsson
System and method for building decision tree classifiers using bitmap techniques

Patent number: 7571159

Abstract: A method, system, and computer program product for counting predictor-target pairs for a decision tree model provides the capability to generate count tables that is quicker and more efficient than previous techniques. A method of counting predictor-target pairs for a decision tree model, the decision tree model based on data stored in a database, the data comprising a plurality of rows of data, at least one predictor and at least one target, comprises generating a bitmap for each split node of data stored in a database system by intersecting a parent node bitmap and a bitmap of a predictor that satisfies a condition of the node, intersecting each split node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps, and counting bits of each intersected bitmap to generate a count of predictor-target pairs.

Type: Grant

Filed: February 1, 2006

Date of Patent: August 4, 2009

Assignee: Oracle International Corporation

Inventors: Shiby Thomas, Wei Li, Joseph Yarmus, Mahesh Jagannath, Ari W. Mozes
System load based adaptive prefetch

Patent number: 7359890

Abstract: A number, of the blocks of data to be prefetched into a buffer cache, is determined dynamically at run time (e.g. during execution of a query), based at least in part on the load placed on the buffer cache. An application program (such as a database) is responsive to the number (also called “prefetch size”), to determine the amount of prefetching. A sequence of instructions (also called “prefetch size daemon”) computes the prefetch size based on, for example, the number of prefetched blocks aged out before use. The prefetch size daemon dynamically revises the prefetch size based on usage of the buffer cache, thereby to form a feedback loop. Depending on the embodiment, at times of excessive use of the buffer cache, prefetching may even be turned off.

Type: Grant

Filed: May 8, 2002

Date of Patent: April 15, 2008

Assignee: Oracle International Corporation

Inventors: Chi Ku, Arvind Nithrakashyap, Ari W. Mozes
Database index validation mechanism

Patent number: 7272589

Abstract: A method evaluates a plurality of candidate index sets for a workload of database statements in a database system by first generating baseline statistics for each statement in the workload. An index superset is formed by combining an existing or current index set and a proposed index set. A candidate index set is derived from the index superset, the candidate index being one of the plurality of candidate index sets. Statistics for a statement are generated by first creating an execution plan which represents an efficient series of steps for executing the statement given the candidate index set. The execution plan is evaluated, and statistics based on the evaluation of the execution plan are generated and recorded. The cost of the execution plan is then determined and statistics are generated. Statistics for each candidate index set are rolled up and presented to a user or an index tuning mechanism.

Type: Grant

Filed: November 1, 2000

Date of Patent: September 18, 2007

Assignee: Oracle International Corporation

Inventors: Todd P. Guay, Gregory S. Smith, Ari W. Mozes, Gaylen D. Royal
Method and system for sample size determination for database optimizers

Publication number: 20040193629

Abstract: A system and method for determining an adequate sample size for statistics collection is disclosed. A mechanism for automatically determining an adequate sample size for both statistics and histograms is provided. The sample size determination is accomplished via an iterative approach where the process starts with a small sample, and for each attribute which may need more data, the sample size is increased while restricting the information collected to only those attributes that require the larger sample.

Type: Application

Filed: April 6, 2004

Publication date: September 30, 2004

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventor: Ari W. Mozes
Method and system for sample size determination for database optimizers

Patent number: 6732085

Abstract: A system and method for determining an adequate sample size for statistics collection is disclosed. A mechanism for automatically determining an adequate sample size for both statistics and histograms is provided. The sample size determination is accomplished via an iterative approach where the process starts with a small sample, and for each attribute which may need more data, the sample size is increased while restricting the information collected to only those attributes that require the larger sample.

Type: Grant

Filed: May 31, 2001

Date of Patent: May 4, 2004

Assignee: Oracle International Corporation

Inventor: Ari W. Mozes
Systems and methods for queuing data

Publication number: 20040064430

Abstract: A container object data structure for storing metadata associated with multiple queues is provided for processing data elements in first-in, first-out fashion. In one embodiment, the container object is implemented in a database environment providing statement syntax for creating data objects, such as tables and views, to implement user schema. Queue metadata can comprise one or more pointers for data element access and control during one or more queue operations, such as an enqueue, dequeue, or update operation.

Type: Application

Filed: September 27, 2002

Publication date: April 1, 2004

Inventors: Jonathan D. Klein, Amit Ganesh, Chi Young Ku, Ari W. Mozes
Method and system for histogram determination in a database

Patent number: 6691099

Abstract: A method and system for determining when to collect, save, and/or utilize histograms is disclosed. A mechanism for automatically deciding when to collect histograms upon request from the user is provided. The histogram collection decision is based on the columns the user is interested in, the role these columns play in the queries as submitted to the system, and the underlying distribution for these columns, e.g., as seen in a random sample. The user specifies which columns are of interest, and the database is configured to collect column usage information that describes how each column is being used in the workload. This column usage information could be stored in memory and periodically flushed to disk. Given a set of potential columns, the distribution of those columns is viewed in combination with the usage information to determine which columns should have histograms.

Type: Grant

Filed: May 31, 2001

Date of Patent: February 10, 2004

Assignee: Oracle International Corporation

Inventor: Ari W. Mozes