Sequential Access, E.g., String Matching, Etc. (epo) Patents (Class 707/E17.039)
  • Patent number: 7747078
    Abstract: A method, computer program product, apparatus, and system that detects a substring in an input data string by producing a fingerprint of a portion of the data string and comparing the fingerprint of the portion of the data string to at least one predefined fingerprint. The predefined fingerprint may be a fingerprint of a portion of a predefined pattern of interest. If the fingerprints match, further pattern recognition processing may be performed on the input string.
    Type: Grant
    Filed: July 6, 2006
    Date of Patent: June 29, 2010
    Assignee: Intel Corporation
    Inventors: Lukas Kencl, Gianluca Iannaccone, Ramaswamy Ramaswamy
  • Publication number: 20100161566
    Abstract: Techniques are disclosed for adding entities to a group of entity resolution candidates by selecting entities that have a minimum threshold of similarity to a candidate, allowing a greater number of resolutions in an entity resolution system. To resolve an incoming identity record, an initial group of candidates may be selected from known entities by identifying entities that match a candidate building attribute of the incoming identity record. Additional candidates may be selected by identifying entities with some information that is similar to one of the candidate entities.
    Type: Application
    Filed: December 18, 2008
    Publication date: June 24, 2010
    Inventors: Gregery G. Adair, Jeffrey J. Jonas
  • Publication number: 20100153420
    Abstract: A dual-stage regular expression pattern matching method and system is proposed, which is designed for integration to a data processing system, such as a computer platform, a firewall, a network intrusion detention system (NIDS), or a DNA sequence analysis system, for checking whether an input code sequence (such as a network data packet) is matched to specific patterns predefined by regular expressions. The proposed system and method includes a first-stage comparison procedure for comparison of the prefix string of each input code sequence and a second-stage comparison procedure for comparison of the postfix string of the same input code sequence. This feature can be used for processing code sequences having a special pattern without producing an enormous amount of state data that would cause the problem of insufficient memory during operation.
    Type: Application
    Filed: March 5, 2009
    Publication date: June 17, 2010
    Applicant: NATIONAL TAIWAN UNIVERSITY
    Inventors: Chang-Ching Yang, Sheng-De Wang
  • Publication number: 20100153418
    Abstract: A rule-based system for improving accuracy of geocoding results is provided including a communications device configured to transmit a query including a textually identified location and a geocoding accuracy module configured to receive the query from the communications device and successively remove constraints from the textually identified location until a match is located. Related methods and computer program products are also provided.
    Type: Application
    Filed: December 17, 2008
    Publication date: June 17, 2010
    Inventors: Michael Asher, Christopher Giles
  • Publication number: 20100153385
    Abstract: A network search function is disclosed. A network administrator enters a search term. The search function determines whether any items or network devices listed in a network control user interface match the search term. The network administrator can stipulate whether the match be either an explicit match or an implicit match. All of the matches, if any, are automatically highlighted and selected. Thereby, the network administrator can perform an operation on these matches based on the search function, without having to manually locate and then manually click to select the desired items or network devices.
    Type: Application
    Filed: September 7, 2007
    Publication date: June 17, 2010
    Inventor: Animesh Chaturvedi
  • Publication number: 20100145708
    Abstract: We disclose useful components of a method and system that allow identification of music from the song or sound using only the sound of the audio being played. A system built using the method and device components disclosed processes inputs sent from a mobile phone over a telephone or data connection, though inputs might be sent through any variety of computers, communications equipment, or consumer audio devices over any of their associated audio or data networks.
    Type: Application
    Filed: December 2, 2009
    Publication date: June 10, 2010
    Applicant: Melodis Corporation
    Inventors: Aaron Master, Timothy P. Stonehocker
  • Publication number: 20100145977
    Abstract: A method, a computer system, and a computer program product that prioritizes search requests to a database directory by assigning the search requests to one or more templates. Attributes of the search requests, such as an IP address, the portion of the database to which the search is constrained, one or more return attributes, the scope of the search, and/or search filters used, are compared with values of those attributes of the templates. The template whose values of the attributes match the values of the attributes in the search request is selected. This template has a template identifier that is associated with a transaction name of a work unit enclave. The search request is then associated with the work unit enclave and the operating system of the computer system will execute the search request in accordance with the performance goals and priority of the service class into which the work unit enclave is assigned.
    Type: Application
    Filed: December 4, 2008
    Publication date: June 10, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Richard Joseph Brodfuehrer, John Michael Walsh, Kim J. Worm, Barbara Ann Marie Maslak
  • Patent number: 7715625
    Abstract: A image processing device has a reading unit, a graphics area extraction unit, a writing area extraction unit, a character string extraction unit and an association unit. The reading unit reads a document. The graphics area extraction unit extracts a graphics area from the document read by the reading unit. The writing area extraction unit extracts a writing area from the document read by the reading unit. The character string extraction unit extracts a character string presented in the graphics area. The association unit associates information of the writing area with the graphics area based on the character string extracted by the character string extraction unit.
    Type: Grant
    Filed: March 16, 2005
    Date of Patent: May 11, 2010
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Masatoshi Tagawa, Michihiro Tamune, Shaoming Liu, Hiroshi Masuichi, Kiyoshi Tashiro, Kyosuke Ishikawa, Atsushi Itoh, Naoko Sato
  • Publication number: 20100114484
    Abstract: Provided is a general-purpose map matching system enabling high-speed map matching while maintaining a precision of an analysis based on a map matching result even when event data is transmitted from numerous vehicles. The grid road generation unit 8 generates each grid obtained by the division of a region in which a road network exists into a latitude direction and a longitude direction by equal spaces based on data stored in the road network storage unit. Then, combine grids whose sets of roads passing are the same. The event grid matching unit correlates event data collected from a vehicle and a grid. When the number of pieces of event data correlated with the grid is large, the event processing priority determination unit selects a part of the data. The event road matching unit correlates the selected event data and a road in the grid.
    Type: Application
    Filed: March 25, 2008
    Publication date: May 6, 2010
    Applicant: NEC CORPORATION
    Inventors: Kouji Kida, Kenichiro Fujiyama
  • Publication number: 20100094889
    Abstract: A system and method for performing non-binary comparison of biological sequences includes a new measure ?0, which is a non-binary counting measure that is used in a stand alone module called VaSSA-1. This measure obtains substantially more information about sequences and comparisons between them than is gathered by conventional bioinformatics techniques.
    Type: Application
    Filed: December 9, 2009
    Publication date: April 15, 2010
    Applicant: BIOINFORMATICA LLC
    Inventor: Jeffrey M. CLARK
  • Publication number: 20100076980
    Abstract: A client generates an index token for each of a plurality of data objects received from a server as a function of at least one of the plurality of fields of the data object. The client creates an index for the plurality of data objects based on the generated index token for each data object. The client may then utilize the index to search plurality of data objects to identify and render a subset of the data objects.
    Type: Application
    Filed: September 5, 2008
    Publication date: March 25, 2010
    Inventor: Vladimir Dumitrean
  • Publication number: 20100076999
    Abstract: In registering a new document file in an index, the accumulated percentage of the number of registered keys A from registered keys associated with one posting data, including registered data, is computed. The posting data of a registered key associated with the number of posting data items, which is at most a threshold N, is stored in a leaf page of a balanced-plus tree constituted of the registered keys, and the posting data of a registered key associated with the number of posting data items, which is greater than the threshold N, is stored in a page of a posting-storing unit. When the accumulated number i of registered documents is a predetermined document number, the threshold N of the number of posting data items is changed to the maximum number of the posting data items that are associated with a registered key where the accumulated percentage is less than 60 percent.
    Type: Application
    Filed: September 26, 2007
    Publication date: March 25, 2010
    Applicant: Justsystems Corproation
    Inventors: Yasuhisa Okazaki, Takanori Hino, Kyoko Fujita, Mikio Moriya
  • Publication number: 20100076862
    Abstract: A system and method enables a customer to purchase time-sensitive events over a computer network. Customer requests are received over the network to view events for possible purchase. The customer may be queried to determine when the customer is available to ensure that the events are available to the customer. Events are displayed to the customer in a manner to ensure that selected events do not overlap with one another. Displayed events for purchase may be filtered based on customer preferences.
    Type: Application
    Filed: September 10, 2008
    Publication date: March 25, 2010
    Applicant: VEGAS.COM
    Inventor: Howard Lefkowitz
  • Publication number: 20100076972
    Abstract: The invention relates to cross-document entity co-reference systems in which naturally occurring entity mentions in a document corpus are analyzed and transformed into name clusters that represent global entities. In a first aspect of the invention, a name variation module analyzes naturally occurring names of entities extracted from the document corpus and provides an initial set of equivalent names that could refer to the same real world entity. In a second aspect of the invention, a disambiguation module takes the initial set of equivalent names and uses an agglomerative clustering algorithm to disambiguate the potentially co-referent named entities.
    Type: Application
    Filed: December 29, 2008
    Publication date: March 25, 2010
    Applicant: BBN Technologies Corp.
    Inventors: Alex Baron, Marjorie Ruth Freedman, Ralph M. Weischedel, Elizabeth Megan Boschee
  • Publication number: 20100076986
    Abstract: A computer system and method for identifying a matching resume for a job description. The system receives and stores the job description that includes job requirements, each including a required skill or experience-related phrase and a required term of experience. The system receives and stores resumes that include skill or experience-related phrases. When the skill or experience-related phrases include the required skill or experience-related phrase for a job requirement, the system computes a term of experience for the required skill or experience-related phrase. To compute the term of experience, the system associates a contextual use and an experience range with each skill or experience-related phrase. A resume is a match when it includes the required skill or experience-related phrase for each job requirement and the term of experience for the required skill or experience-related phrase in the resume is greater than or equal to the required term of experience.
    Type: Application
    Filed: November 27, 2009
    Publication date: March 25, 2010
    Applicant: ALGOMOD TECHNOLOGIES CORPORATION
    Inventor: Diya B. Obeid
  • Publication number: 20100057812
    Abstract: A method and system include obtaining at a first time a first image of a database having number portability records for each telephone number which has been ported between service providers. The first image is indicative of the LNP records in the database at the first time. At a second time a second image of the database is obtained. The second image is indicative of the records in the database at the second time. The first and second images are compared to determine migration of ported telephone numbers.
    Type: Application
    Filed: November 9, 2009
    Publication date: March 4, 2010
    Applicant: SBC KNOWLEDGE VENTURES, L.P.
    Inventors: Kevin James Moisan, Michael Liu, Wayne Robert Heinmiller, Frederick Michael Armanino
  • Publication number: 20100057737
    Abstract: Techniques for detecting non-occurrence of an event within a time period following the occurrence of another event. In one embodiment, language extensions are provided to a language that enable queries to be formulated for detecting non-occurrences using that language.
    Type: Application
    Filed: August 26, 2009
    Publication date: March 4, 2010
    Applicant: Oracle International Corporation
    Inventors: Anand Srinivasan, Rakesh Komuravelli, Shailendra Mishra
  • Publication number: 20100057733
    Abstract: A method for enabling access to enterprise information may include analyzing text including a plurality of text strings and identifying a defined pattern within the text strings as corresponding to a particular entity. The particular entity may be associated with different classes of information stored in at least two respective different storage environments. The method may further include enabling provision of a selectable option providing access to one of the different classes of information from a corresponding one of the at least two respective different storage environments in response to selection of the selectable option.
    Type: Application
    Filed: September 2, 2008
    Publication date: March 4, 2010
    Inventors: Suresh Ravinarayanan Purisai, Jerry Ibrahim, Jim Schwaller, Lakshmi Preethi Karthikeyan
  • Publication number: 20100049700
    Abstract: A method for probabilistic lossy counting includes: for each element in a current window, determining whether an entry corresponding to a current element is present in a table; in the event an entry corresponding to the current element is present in the table, incrementing a frequency counter associated with the current element; otherwise, inserting an entry into a table, wherein inserting an entry comprises: calculating a probabilistic error bound ? based on an index i of the current window; and inserting the probabilistic error bound ? and a frequency counter into an entry corresponding to the current element in the table; and at the end of the current window, removing all elements from the table wherein the sum of the frequency counter and probabilistic error bound ? associated with the element is less than or equal to the index of the current window.
    Type: Application
    Filed: August 20, 2008
    Publication date: February 25, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xenofontas Dimitropoulos, Paul T. Hurley, Andreas Kind, Marc Stoecklin
  • Publication number: 20100049709
    Abstract: Methods, computer programs, and systems for generating a link title for a URL (Uniform Resource Locator) within a context webpage to be shown as a web result are provided. The method evaluates generation parameters for a plurality of sources for picking words from the link title. Further, the method generates candidates for the link title, and a likelihood is computed for each candidate. When computing the likelihood, the generation parameters, the context webpage and the words are considered. In addition, the method selects a candidate with the highest likelihood from all the computed likelihoods, and presents the URL with the selected candidate as the title.
    Type: Application
    Filed: August 19, 2008
    Publication date: February 25, 2010
    Applicant: Yahoo!, Inc.
    Inventors: Shanmugasundaram Ravikumar, Deepayan Chakrabarti, Kunal Punera
  • Publication number: 20100030780
    Abstract: Provided are, among other things, systems, methods and techniques for identifying related objects in a computer database. In one representative implementation: (a) a feature vector that describes an existing object is obtained; (b) comparison scores are generated between the feature vector and various sample vectors; (c) a set that includes at least one designated vector is identified from among the sample vectors by evaluating the generated comparison scores; (d) a computer database is searched for matches between label(s) for the designated vector(s) and labels for representative vectors for other objects represented in the computer database; and (e) at least one related object is identified based on the identified match(es).
    Type: Application
    Filed: May 11, 2009
    Publication date: February 4, 2010
    Inventors: Kave ESHGHI, Shyam Sundar RAJARAM, Charlie DAGLI, Ira COHEN
  • Publication number: 20100010994
    Abstract: A social mobile network enables discovery of application programs running on the mobile devices. A search for partial or full matches to a group of alphanumeric characters is performed on the data stored on the first mobile communication device on which the search is initiated. The search is also performed on data made available to the user of the first mobile communication device by other users, where each user is associated with a different one of a multitude of mobile communication devices. The sharing of the data and the search for shared data is made via a server with which the mobile communication devices are in communication. The discovery of applications whose names or descriptions are partially or fully matched to the alphanumeric characters is made despite the fact that user was not looking for or aware of the existence of such application programs.
    Type: Application
    Filed: June 29, 2009
    Publication date: January 14, 2010
    Applicant: Servo Software, Inc.
    Inventors: Christof Wittig, John W. Stossel
  • Publication number: 20100005096
    Abstract: A document type identifying apparatus includes in advance a database storing therein keywords used as keys that identify document types in association with each document type. The document type identifying apparatus aligns word strings written on a document and generates partial keyword strings for each keyword by using the keywords stored in the database. The partial keyword strings are to be checked for matching with the word strings written on the document. Then, the document type identifying apparatus checks matching of the grouped and aligned word strings with the partial keyword strings and obtains, for each keyword, each number of matched words with the highest matching rates between the grouped word strings that are successfully matched and the partial keyword strings. Then, each number of matched words is used to calculate each evaluation value to determine the document type.
    Type: Application
    Filed: September 4, 2009
    Publication date: January 7, 2010
    Applicant: FUJITSU LIMITED
    Inventors: Akhiro Minagawa, Hiroaki Takebe, Katsuhito Fujimoto
  • Publication number: 20090327289
    Abstract: Search criteria and potential targets of searches are each represented by a classification of attributes. The search classifications and target classifications are compared to determine whether a target matches or loosely matches the search criteria. The search classifications and target classifications may be modified to increase the chance of a match or loose match. A user can request to modify a classification using a visual interface in which information about the classification is presented. The matching approach may be implemented in conjunction with conventional matching methods to provide classifications. The matching approach is capable of interacting with users of the approach to dynamically alter the classifications being searched based on any given set of search results.
    Type: Application
    Filed: September 3, 2009
    Publication date: December 31, 2009
    Inventor: Michael G. Zentner
  • Publication number: 20090319546
    Abstract: Techniques for extracting hierarchical data stored in multiple records, flattening the hierarchical data, and storing the flattened data in a data warehouse. The data source may be an online transaction processing (OLTP) system that is designed to perform transaction processing and that stores hierarchy data in the form of multiple parent-child relationship records. The hierarchy data extracted from the data source is flattened and stored in a flattened form in a target system such as a data warehouse. A database function such as the SYS_CONNECT_BY_PATH may be used as part of the flattening process.
    Type: Application
    Filed: June 18, 2008
    Publication date: December 24, 2009
    Applicant: Oracle International Corporation
    Inventor: Sadiq Shaik
  • Publication number: 20090319523
    Abstract: An apparatus system and methods are presented for best match search. In one embodiment, the apparatus includes a plurality of functional modules configured to collect user profile information and a service provider criterion, match a service provider profile to at least one of the user profile information and the service provider criterion, calculate a service provider statistic based on service provider data associated with a selected service provider, and generate service provider comparison data in response to the service provider statistic. In the described embodiments, these modules include a profile collection module, a profile match module, a provider analyzer, and a provider comparison module.
    Type: Application
    Filed: June 22, 2009
    Publication date: December 24, 2009
    Applicant: INGENIX, INC.
    Inventors: David R. Anderson, Jean Rawlings, Jerry L. Mansfield, II, Michael Mockus, Ronald D. Myers
  • Publication number: 20090276115
    Abstract: Provided is a method of receiving data from a vehicle onboard computer. The onboard computer is configured to transmit vehicle identification data in response to receipt of an identification request, which is transmitted in a basic communication protocol. The onboard computer is further configured to transmit private operational data in response to receipt of a private data request. The private data request is transmitted in a diagnostic protocol. The method includes connecting a scan tool to the onboard computer, and polling the onboard computer to identify the basic communication protocol. The identification request is then transmitted to the onboard computer. Vehicle identification data is subsequently received from the onboard computer. A protocol database having a plurality of diagnostic protocols is then accessed. Each diagnostic protocol is associated with respective vehicle identification data. The diagnostic protocol is then determined based on the received vehicle identification data.
    Type: Application
    Filed: July 13, 2009
    Publication date: November 5, 2009
    Inventor: Ieon C. Chen
  • Publication number: 20090271405
    Abstract: Disclosed is a system for, and method of, calculating parameters used to determine whether records and entity representations should be linked. The system and method use a symmetric, transitive and reflexive function to allow for linking records and entity representations whose field values differ. The system and method apply iterative techniques such that parameters from each linking iteration are used in the next linking iteration. The system and method need no human interaction in order to calibrate and utilize record matching formulas used for the linking decisions.
    Type: Application
    Filed: April 24, 2009
    Publication date: October 29, 2009
    Applicant: LexisNexis Risk & Information Analytics Grooup Inc.
    Inventor: David Alan BAYLISS
  • Publication number: 20090271404
    Abstract: Disclosed is a system for, and method of, calculating parameters used to determine whether records and entity representations should be linked. The system and method take into consideration interdependent fields, e.g., fields whose constituent field values may be positively or negatively correlated. The system and method apply iterative techniques such that parameters from each linking iteration are used in the next linking iteration. The system and method need no human interaction in order to calibrate and utilize record matching formulas used for the linking decisions.
    Type: Application
    Filed: April 24, 2009
    Publication date: October 29, 2009
    Applicant: LexisNexis Risk & Information Analytics Group, Inc.
    Inventor: David Alan Bayliss
  • Publication number: 20090254553
    Abstract: Matching digital media available in a multi-node system. An example embodiment receives media from media providers. Metadata may also be included with digital media files or stored separately in a database. An example matching system generates, or receives a list of candidate nodes, such as network domains, to search for potential copies of digital media. The list may be defined and/or prioritized based on countries of interest, business sectors of interest, or other business rules. An example system crawls the domains to identify media files that appear on websites that are potential matches of the media files provided by the media providers. The system may download the media files, and evaluate them relative to the provided media files. The system identifies matches and identifies owners or operators of domains that had matching media files. The system generates case records for subsequent licensing or other action regarding the matched media files.
    Type: Application
    Filed: February 2, 2009
    Publication date: October 8, 2009
    Applicant: Corbis Corporation
    Inventors: David N. Weiskopf, Glen Rolfe
  • Publication number: 20090248687
    Abstract: A computer implemented method for analyzing a listing object to define a match to a candidate object among many possible candidate objects is disclosed. The method includes an operation to receive a listing object as an input. The method also includes an operation to generate a set of candidate objects based on characteristics of the listing object. The candidate objecting used to generate a listing-candidate pair defined by pairing the listing object with one of the candidate objects. The method may also include operations to process the listing-candidate pair such as an operation to normalize the listing object into a canonical form. Another operation can generate a matching feature vector for the listing-candidate pair. Where the matching feature vector includes a matching score based on a common feature between the candidate object and the canonical form of the listing object. In another operation, the method analyzes the matching feature vector with a judging committee module to render a match judgment.
    Type: Application
    Filed: March 31, 2008
    Publication date: October 1, 2009
    Applicant: YAHOO! INC.
    Inventors: Qi Su, Wendell Baker
  • Publication number: 20090234826
    Abstract: The data constraint framework solution of the present invention addresses data quality issues by standardizing, verifying, matching, consolidating and merging data records using powerful inexact matching logic and search reduction technologies. The data conditioning framework uses these technologies to more efficiently condition data to improve the quality of data and/or resolve quality data issues such as incomplete, inaccurate and duplicate data records. For example, the data conditioning framework is used to “cleanse” incorrect, incomplete and duplicate data from a data source, such as an information system.
    Type: Application
    Filed: March 17, 2006
    Publication date: September 17, 2009
    Applicant: ActivePrime, Inc.
    Inventor: Clint Bidlack
  • Publication number: 20090234853
    Abstract: A system and method are provided for augmenting information on business directory databases. Using the business name contained in a business directory database and Web data mining technology, the website of a business is found and validated, prior to enriching the database entries.
    Type: Application
    Filed: March 12, 2008
    Publication date: September 17, 2009
    Inventors: Narendra Gupta, Mazin Gilbert
  • Publication number: 20090198692
    Abstract: Analogies among entities may be detected by obtaining associative counts among the entities and computing similarity measures among given entities and other entities, using the associative counts. First and second entities are then identified as being analogies if the first entity has a strongest similarity measure with respect to the second entity and the second entity also has a strongest similarity measure with respect to the first entity. The similarity measures may be calculated using a normalized entropy inverted among a given entity and other entities.
    Type: Application
    Filed: December 22, 2008
    Publication date: August 6, 2009
    Inventors: Manuel Aparicio, IV, Yen-min Huang, David R. Cabana
  • Publication number: 20090187570
    Abstract: An apparatus for controlling subscriptions comprising: a detector operable to detect to a subscription associated with a wildcard topic string; and an analyzer, responsive to the detection of the subscription associated with a wildcard topic string and a topic string of a topic node matching the wildcard topic string, for analyzing a first attribute of the topic node; and means for determining whether a subscriber associated with the subscription should receive a message associated with the topic string of the topic node.
    Type: Application
    Filed: January 21, 2009
    Publication date: July 23, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David Postlethwaite, Jonathan Lee Rumsey, Ian Charles Edwards, Peter Siddall
  • Publication number: 20090187569
    Abstract: An online system and a method for a web-based people picture directory provides collaborative identification of people pictured in the directory. The method includes creating profile templates for each person on earth and storing these profile templates in a central database. Next, populating the profile templates with publicly available basic information and publishing the public profile information in the web-based people picture directory. Next, retrieving a first person's own profile template, uploading the first person's personal pictures and identifying and tagging images of other persons depicted in the first person's personal pictures. Next, cross-correlating and matching the identified images of the other persons depicted in the first person's uploaded personal pictures with profile templates corresponding to the other persons and when there is a match uploading the first person's pictures depicting the identified persons' images into the corresponding profile templates of the depicted persons.
    Type: Application
    Filed: January 14, 2009
    Publication date: July 23, 2009
    Applicant: HUMANBOOK, INC.
    Inventors: DAN LUBARSKI, SERGEY PORFIRIEV, SERGEY PRAZDNICHKOV
  • Publication number: 20090171956
    Abstract: The present invention provides a method for incorporating features from heterogeneous auxiliary datasets into input text data for use in classification, a plurality of heterogeneous auxiliary datasets, such as labeled datasets and unlabeled datasets, are accessed after receiving input text data. A plurality of features are extracted from each of the plurality of heterogeneous auxiliary datasets. The plurality of features are combined with the input text data to generate a set of features which may potentially be used to classify the input text data. Classification features are then extracted from the set of features and used to classify the input text data. In one embodiment, the classification features are extracted by calculating a mutual information value associated with each feature in the set of features and identifying features having a mutual information value exceeding a threshold value.
    Type: Application
    Filed: October 10, 2008
    Publication date: July 2, 2009
    Inventors: Rakesh Gupta, Lev Ratinov
  • Publication number: 20090171955
    Abstract: A computer-based method for character string matching of a candidate character string with a plurality of character string records stored in a database is described. The method includes a) identifying a set of reference character strings in the database, the reference character strings identified utilizing an optimization search for a set of dissimilar character strings, b) generating an n-gram representation for one of the reference character strings in the set of reference character strings, c) generating an n-gram representation for the candidate character string, d) determining a similarity between the n-gram representations, e) repeating steps b) and d) for the remaining reference character strings in the set of identified reference character strings, and f) indexing the candidate character string within the database based on the determined similarities between the n-gram representation of the candidate character string and the reference character strings in the identified set.
    Type: Application
    Filed: December 31, 2007
    Publication date: July 2, 2009
    Inventors: Christopher J. Merz, Thomas McGeehan
  • Publication number: 20090171930
    Abstract: Various embodiments provide a Web browser that employs a relevancy algorithm to make an educated guess as to the likelihood of a user's intended destination when the user begins to enter text into a browser's address bar. In one or more embodiments, the relevancy algorithm employs various parameters and assigns weights to the parameters to arrive at a collection of suggestions to provide to the user. By using various rules, associated weightings, and the relevancy algorithm, relevant suggestions can be provided to a user to facilitate their navigation activities.
    Type: Application
    Filed: December 27, 2007
    Publication date: July 2, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Christopher M. Vaughan, Oliver Wallace, Carlos Yeung, Amit Gupta, Christophe Marle
  • Publication number: 20090157678
    Abstract: A content based load balancing system receives a request for data provided by a resource. The content based load balancing system searches a content history cache for a content history cache entry corresponding to the requested data. The content based history cache then selects a resource node to service the request based on the content history cache entry corresponding to the data.
    Type: Application
    Filed: December 18, 2007
    Publication date: June 18, 2009
    Inventor: Mladen Turk
  • Publication number: 20090157681
    Abstract: The present invention relates to a data processing method and system for checking an interactive communication sequence (ICS) relating to a plurality of users in a communication record by using a variable time window, and checking an interactive communication sequence pattern (ICSP) that is a frequently generated interactive communication sequence from among the checked interactive communication sequences.
    Type: Application
    Filed: December 17, 2008
    Publication date: June 18, 2009
    Applicants: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Man Ho Park, Song In Choi, Jee Hwan Ahn, Byung Wan Lim, Ji Hwan Song, Myoung Ho Kim
  • Publication number: 20090132532
    Abstract: A procedure generation apparatus has, in a storage unit thereof, a database in which a name of input information and a name of output information name are stored, associated with a name of a work. The procedure generation apparatus retrieves one or more candidate work names associated with an input information name from the database, displays the retrieved one or more work names, receives a selection of a work name from among the displayed one or more work names, retrieves one or more candidate output information names associated with the selected work name from the database, displays the retrieved one or more output information names, receives a selection of an output information name from among the displayed output information names, retrieves one or more candidate input information names each having a similar name to the selected output information name, from the database, and displays the retrieved input information name.
    Type: Application
    Filed: November 19, 2008
    Publication date: May 21, 2009
    Inventors: Ichiro HARASHIMA, Koji Shiroyama
  • Publication number: 20090112860
    Abstract: A method and apparatus are provided to support autonomic computing for system configuration. Common base events (CBEs) are generated and, based upon system configuration, are employed to monitor system resources and to resolve system configuration conflicts prior to an error. A symptom database stores a set of rules for the configuration information. The configurations CBEs for the system configuration are compared with the symptom rules, and any discrepancies between the two elements are communicated to a user prior to an occurrence of an error in the system. Accordingly, an autonomic computer system is provided to support system configuration data.
    Type: Application
    Filed: October 29, 2007
    Publication date: April 30, 2009
    Inventors: Hironori Yuasa, Toshimichi Arima, Tomoko Murayama
  • Publication number: 20090112858
    Abstract: A system and method is provided for query processing comprises: creating an index of a database and ordering a set of index candidates from the index into a list based on a set of heuristic rules. A query defining a query path is then reduced into a list of single path expressions. Each index candidate is matched against the list of single path expressions according to the ordering of the index candidates. The matched candidate nodes are also verified to insure that they satisfy the query path.
    Type: Application
    Filed: October 25, 2007
    Publication date: April 30, 2009
    Applicant: International Business Machines Corporation
    Inventors: Guogen Zhang, Ruiping Li, Mengchu Cai
  • Publication number: 20090106244
    Abstract: Exemplary embodiments of the present invention relate to enhanced faceted search support for OLAP queries over unstructured text as well as structured dimensions by the dynamic and automatic discovery of dimensions that are determined to be most “interesting” to a user based upon the data. Within the exemplary embodiments “interestingness” is defined as how surprising a summary along some dimensions is from a user's expectation. Further, multi-attribute facets are determined and a user is optionally permitted to specify the distribution of values that she expects, and/or the distance metric by which actual and expected distributions are to be compared.
    Type: Application
    Filed: August 29, 2008
    Publication date: April 23, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Debabrata Dash, Guy M. Lohman, Nimrod Megiddo, Jun Rao
  • Publication number: 20090100051
    Abstract: Methods and apparatus are described for presenting sponsored search results. A user is enabled to initiate a search from a context. The sponsored search results and organic search results are presented in a search results page in response to the search, an order of the sponsored search results and placement of subsets of the sponsored search results relative to the organic search results in the search results page having been determined with reference to contextual information relating to the context.
    Type: Application
    Filed: October 10, 2007
    Publication date: April 16, 2009
    Applicant: YAHOO! INC.
    Inventors: Rushi Bhatt, Jignashu Parikh, Rajesh Girish Parekh, Pavel Berkhin
  • Publication number: 20090094238
    Abstract: A technique for facilitating identification of a matching search term in one or more images includes selecting at least a portion of an image and creating search enriched metadata for a document that includes the image. The search enriched metadata includes a text portion that provides one or more search terms that are associated with the selected portion of the image and a location portion that provides a location of the selected portion of the image.
    Type: Application
    Filed: October 5, 2007
    Publication date: April 9, 2009
    Inventors: Dwip N. Banerjee, Ranadip Das, Sandeep R. Patil, Venkat Venkatsubra
  • Publication number: 20090089285
    Abstract: Systems and methods for identifying spam hosts are disclosed in which hosts are known to the system and initially classified as spam or non-spam by a baseline classifier. The accuracy of the initial host classifications are then improved by propagating them using a random walk algorithm. The random walk used may be modified in order to obtain a weighted or skewed characterization of the host. The hosts may then be reclassified based on the characterization obtained from the random walk to obtain a final spam/non-spam classification. The final classification may then be used in many different ways including to filter search results based on host classifications so that spam hosts are not displayed or displayed last in a results set.
    Type: Application
    Filed: September 28, 2007
    Publication date: April 2, 2009
    Applicant: Yahoo! Inc.
    Inventors: Debora Donato, Aristides Gionis, Vanessa Murdock, Fabrizio Silvestri
  • Publication number: 20090083267
    Abstract: The present disclosure is directed to a method and system for compressing data. In accordance with a particular embodiment of the present disclosure, at least one data string is received. The at least one data string includes characters. A token string corresponding to the at least one data string is generated. At least one repeated substring in the at least one data string is identified. A refer-back token associated with the at least one repeated substring is generated. The refer-back token indicates a position of the at least one repeated substring and a length of the at least one repeated substring.
    Type: Application
    Filed: February 12, 2008
    Publication date: March 26, 2009
    Applicant: Computer Associates Think, Inc.
    Inventor: Carl Eric Johnson
  • Publication number: 20090077074
    Abstract: To construct an ontology for a target data by re-using an existing ontology, from an aspect of the structure of the class hierarchy according to an object-oriented method and an aspect of the levels of relevance with other properties, the properties that correspond to the data items in the data serving as an ontology construction target and the extraction classes of the properties are determined as property extraction destination candidates for the ontology to be constructed. As a result, it is possible to re-use even a fine difference in the meanings among the properties in the classes. Consequently, it is possible to provide a support for constructing an effective ontology, while reducing the load on the user.
    Type: Application
    Filed: March 19, 2008
    Publication date: March 19, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Akira Hosokawa