Sequential Access, E.g., String Matching, Etc. (epo) Patents (Class 707/E17.039)
  • Publication number: 20110066638
    Abstract: Using a tree configuration wherein node groups of four or more nodes composed of combinations of branch nodes, leaf nodes or empty nodes are linked into a tree form, a bit string search by a search key string is enabled by repeatedly linking to one of the nodes of a node group to which a primary node belongs in response to the bit values of keys of the search key string at the discrimination bit position included in the branch node.
    Type: Application
    Filed: November 17, 2010
    Publication date: March 17, 2011
    Applicant: S. Grants Co., Ltd.
    Inventors: Toshio Shinjo, Koutaro Shinjo, Mitsuhiro Kokubun
  • Publication number: 20110066631
    Abstract: A character string pattern matching method for detecting the presence of at least one of N (N is a natural number equal to or greater than 2) patterns in specific text shifts a detection location across text by a maximum shift length using single-byte character-based layered SHIFT tables, thereby increasing a pattern matching speed as compared with the prior art pattern matching algorithms.
    Type: Application
    Filed: September 22, 2008
    Publication date: March 17, 2011
    Applicant: SEOUL NATIONAL UNIVERSITY INDUSTRY FOUNDATION
    Inventors: Yoon Ho Choi, Seung Woo Seo
  • Publication number: 20110066637
    Abstract: Systems and methods for scanning signatures in a string field. In one implementation, the invention provides a method for signature scanning. The method includes processing one or more signatures into one or more formats that include one or more fingerprints and one or more follow-on search data structures for each fixed-size signature or signature substring such that the number of fingerprints for each fixed-size signature or signature substring is equal to a step size for a signature scanning operation and the particular fixed-size signature or signature substring is identifiable at any location within any string fields to be scanned, receiving a particular string field, identifying any signatures included in the particular string field including scanning for the fingerprints for each scan step size and searching for the follow-on search data structures at the locations where one or more fingerprints are found, and outputting any identified signatures.
    Type: Application
    Filed: October 12, 2010
    Publication date: March 17, 2011
    Inventor: QIANG WANG
  • Publication number: 20110029504
    Abstract: A facility for exposing an index of private documents is described. In a private network, the facility (1) identifies electronic versions of documents that are available inside the private network, including a distinguished document; (2) constructs an index covering the identified electronic versions of documents; and (3) exports the constructed index from the private network to an index publication server. At the index publication server, the facility (1) receives the exported index; (2) receives a query via a public network; and (3) uses an index, based upon the received index, to generate a query result for the received query that contains the distinguished document.
    Type: Application
    Filed: October 5, 2010
    Publication date: February 3, 2011
    Inventors: Martin T. King, Dale L. Grover, Clifford A. Kushier, James Q. Stafford-Fraser
  • Publication number: 20110029505
    Abstract: An exemplary embodiment of the present invention provides a method of processing Web activity data. The method includes obtaining a database of clickstream data comprising a user identifier corresponding with a user ID and a uniform resource locator (URL) corresponding with a Web page visited from the user ID. The method also includes generating a plurality of features based on the URL. Further, the method includes generating a data structure comprising the user ID and the feature. The method also includes generating segment information from the data structure based on the similarity of a URL visitation pattern across different user IDs, wherein each segment in the segment information comprises one or more user IDs and one or more features.
    Type: Application
    Filed: July 31, 2009
    Publication date: February 3, 2011
    Inventors: Martin B. Scholz, Shyam Sundar Rajaram, Rajan Lukose
  • Publication number: 20110029516
    Abstract: A web site usage pattern insight platform may be provided. User behaviors associated with web page requests, including search queries, may be captured and analyzed to provide usage pattern insights. The pattern insights may be aggregated across a plurality of users and may be used to provide recommendations for improving a system that hosts the web pages.
    Type: Application
    Filed: July 30, 2009
    Publication date: February 3, 2011
    Applicant: Microsoft Corporation
    Inventors: Qing Chang, Keiichiro Suzuki, Harini Sridharan, Prashant Kamani, Aleksandr Lyamtsev, Mingyang Zhao, Aditee Kumthekar, Ashutosh Galande, Charles Ainslie, Staya Priya Hotani, Reshma Mehta, Tho Van Nguyen, Yuan Gao, Li Yang, Jin Wu, Shuang Yang, Smridh Thapar
  • Publication number: 20110022617
    Abstract: An NFA circuit adapted to regular expressions and used for multibyte processing enables independent check of in what position the inputted character string matches. A 1-byte NFA converting unit (21) stores one or more regular expressions inputted by an input device (1) in a regular expression storage unit (31), sequentially reads out the regular expressions, and converts them into 1-byte processed NFAs with no ? transition. A multibyte NFA converting unit (22) converts the generated 1-byte processed NFAs into NFAs such that it can be judged in what position the inputted character string to be processed in multibyte matches a pattern on the basis of the operating mode and the number of processing bytes inputted by the input device (1) and processed and stores the NFAs in an NFA storage unit (32). An HDL converting unit (23) generates a hardware description language (HDL) of the NFA circuit from the state transition information relating to the NFAs inputted from a multibyte NFA converting unit (22).
    Type: Application
    Filed: March 19, 2009
    Publication date: January 27, 2011
    Inventor: Norio Yamagaki
  • Publication number: 20110022623
    Abstract: A system and method for enabling information providers using a computer network such as the Internet to influence a position for a search listing within a search result list generated by an Internet search engine. The system and method of the present invention provides a database having accounts for the network information providers. Each account contains at least one search listing having at least three components: a description, a search term comprising one or more keywords, and a bid amount. The network information provider may add, delete, or modify a search listing after logging into his or her account via an authentication process. The network information provider influences the position for a search listing through a continuous online competitive bidding process. The bidding process occurs when the network information provider enters a new bid amount, which is preferably a money amount, for a search listing.
    Type: Application
    Filed: July 23, 2010
    Publication date: January 27, 2011
    Applicant: Yahoo! Inc.
    Inventors: Darren J. Davis, Matthew Derer, Johann Garcia, Larry Greco, Tod E. Kurt, Thomas Kwong, Jonathan C. Lee, Ka Luk Lee, Preston Pfarner, Steve Skovran
  • Publication number: 20110002664
    Abstract: In a system having a program viewing apparatus and a plurality of recording apparatuses connected to a network, a program viewing apparatus 31 allowing easy selection of a recording apparatus includes: a tuner 101, a program information storage unit 121 for storing program list information output from tuner 101; an IP communication unit 102 for transmission/reception to/from the recording apparatus; and a processing unit 110 for performing a process of selecting a recording apparatus. When a user instructs recording of a program he/she is watching, processing unit 110 transmits the program information read from program information storage unit 121 together with a search request, to recording apparatuses including a recording apparatuses 32. If a prescribed similarity relation is found between the received program information and recording history, the recording apparatus 32 notifies the program viewing apparatus 31 that it is eligible for recording the program.
    Type: Application
    Filed: February 20, 2009
    Publication date: January 6, 2011
    Inventors: Hideki Nishimura, Hiroyuki Nakaoka
  • Publication number: 20110004605
    Abstract: A document editing device can edit a document using a markup language, and includes: an operation means; a display means that displays an editing screen for editing the document; a control means that searches a character string of a document displayed on the document editing screen, the character string being a character string to which a character decoration type identical to a search-target character decoration type specified by an operation of the operation means is set.
    Type: Application
    Filed: December 24, 2008
    Publication date: January 6, 2011
    Applicant: Kyocera Corporation
    Inventor: Takenori Tomino
  • Publication number: 20100325104
    Abstract: A location search device, in which when one of buttons in a character input portion is repeatedly pressed, a plurality of characters assigned in advance to the one button is displayed in an input character display portion in a predetermined cyclic sequence. When one of the buttons is pressed, and then, another of the buttons is pressed, the character displayed in the display portion immediately prior to pressing of the other button is set as an input set character. When the character is input through the character input portion following the input set character, a character string is input. A plurality of compound character strings are created by combining the input set character string with the plurality of characters that is pressed next, and search object character strings that partially match the respective compound character strings are acquired from a character string storage portion, as input candidate character strings.
    Type: Application
    Filed: June 2, 2010
    Publication date: December 23, 2010
    Applicant: AISIN AW CO., LTD.
    Inventor: Hiroshi KAWAUCHI
  • Publication number: 20100318521
    Abstract: Embodiments of methods and/or systems for tagging trees are disclosed.
    Type: Application
    Filed: July 2, 2010
    Publication date: December 16, 2010
    Applicant: Robert T. and Virginia T. Jenkins as Trustees of the Jenkins Family Trust Dated 2/8/2002
    Inventor: Jack J. LeTourneau
  • Publication number: 20100306209
    Abstract: A pattern matching method is disclosed. The method includes following steps. A character is searched in a skip table of a pattern such that a flag value and a skip value are returned. The sliding window is shifted according to the skip value when the flag value indicates the character is not a pattern end. The character plus at least one byte preceding the character is hashed when the flag value indicates the character is the pattern end such that a character hashing value is returned. A pattern end portion is hashed, wherein the size of the pattern end portion is equal to the size of the character plus the size of the byte such that a pattern hashing value is returned. The character hashing value is compared with the pattern hashing value. An exact matching process is performed when the character hashing value is equal to the pattern hashing value.
    Type: Application
    Filed: August 13, 2010
    Publication date: December 2, 2010
    Inventors: Tien-Fu Chen, Chieh-Jen Cheng
  • Publication number: 20100306248
    Abstract: A method and system for expanding a document set as a search data source in the field of business related search. The present invention provides a method of expanding a seed document in a seed document set. The method includes identifying one or more entity words of the seed document; identifying one or more topic words identifying one or more topic words related to the based entity word in the seed document where the entity word is located; forming an entity word-topic word pair from each identified topic word and the entity word on the basis of which each topic word is identified; and obtaining one or more expanded documents through web by taking the entity word and topic word in the each entity word-topic word pair as key words at the same time. A system for executing the above method is also provided.
    Type: Application
    Filed: May 25, 2010
    Publication date: December 2, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sheng Hua Bao, Jie Cui, Hui Su, Zhong Su, Li Zhang
  • Publication number: 20100293158
    Abstract: There is provided an information processing apparatus including a search processing section for causing a transmission/reception section to execute processing of transmitting a search request including a search condition to each of one or more information management devices, causing the transmission/reception section to execute processing of receiving, as a response to the search request and from each of the one or more information management devices via a network, content information corresponding to the search condition from among pieces of content information and management subject identification information for identifying the information management device which manages the content information, and correlating the management subject identification information with content identification information and content-related information that are included in the content information received by the transmission/reception section and causing a storage section to store the correlated management subject identificatio
    Type: Application
    Filed: April 27, 2010
    Publication date: November 18, 2010
    Applicant: Sony Corporation
    Inventors: Nobuyoshi Tomita, Yasuaki Honda, Yasuo Endo
  • Publication number: 20100281025
    Abstract: A method of generating recommendations for content items comprises providing a domain ontology where concepts are characterized by a term vector with terms and associated weights. Associated term sets, each of which comprises a set of terms that characterize a content item, are further provided. A concept set is generated for each associated term set by determining the concepts of the domain ontology that match the terms of the associated term set. In addition, a user profile for a user is provided where the user profile comprises at least some of the concepts of the ontology coupled with preference weights. Recommendations for content items are generated based on the plurality of associated concept sets and the user profile. The invention may allow improved and/or facilitated generation of recommendations from text based characterizing data.
    Type: Application
    Filed: May 4, 2009
    Publication date: November 4, 2010
    Applicant: MOTOROLA, INC.
    Inventors: Dorothea Tsatsou, Paul C. Davis, Symeon Papadopoulos, Fotis Menemenis, Ben M. Bratu, George Kalfas, Ioannis Kompatsiaris
  • Publication number: 20100281050
    Abstract: A range-conversion method and system includes receiving data records. Each data record includes one or more data fields and a field value associated with each data field. One or more data fields are identified as a range-based data field. A plurality of text-based range descriptors are defined, such that each text-based range descriptor is associated with a range of field values for one of the range-based data fields.
    Type: Application
    Filed: July 15, 2010
    Publication date: November 4, 2010
    Inventor: Roy Schoenberg
  • Publication number: 20100268721
    Abstract: An information terminal device having a higher degree of convenience than conventional ones is provided. The information terminal device according to the present invention is provided with a display 5, a key input device 6, a memory 7 and a control circuit 2.
    Type: Application
    Filed: December 10, 2008
    Publication date: October 21, 2010
    Applicant: KYOCERA CORPORATION
    Inventor: Takashi Kitano
  • Publication number: 20100268700
    Abstract: Systems and methods for search and search optimization using a pattern in a location identifier is disclosed. In one aspect, embodiments of the present disclosure include a method, which may be implemented on a system, of search and search optimization. The method includes, detecting a set of location identifiers that have a pattern that matches a specified pattern and identifying a set of search results as having content related to the semantic type. The specified pattern can be stored in a computer-readable storage medium and corresponds to a semantic type. The set of search results can include objects associated with the set of location identifiers having the specified pattern.
    Type: Application
    Filed: April 14, 2010
    Publication date: October 21, 2010
    Applicant: Evri, Inc.
    Inventors: James M. Wissner, Nova T. Spivack
  • Publication number: 20100268696
    Abstract: The present invention is directed to a system, method and server to assist account issuers in managing risk, fraud and unauthorized use. A system, method and server for use in pushing advanced warning alerts to issuers based on consumer data element level triggering events and fraud and unauthorized use reports is disclosed. The ability to the push the alerts to issuers with a permissible purpose for receiving the information in the alerts provides a real-time, online and cost effective way of providing issuers with valuable risk management tools.
    Type: Application
    Filed: July 20, 2009
    Publication date: October 21, 2010
    Inventors: Brad Nightengale, Sharon Rowberry
  • Publication number: 20100262617
    Abstract: A method of database update processing for updating efficiently database index keys, when new database index keys are supplied to replace index keys already in the database, generates a delta data between the new and old data comprising insert and delete keys by delete processing from a coupled node tree holding the index keys in the old data using index keys of new data as delete keys, and generates new data by delete and insert processing from and into a coupled node tree holding index keys in old data as index keys using the delete keys and insert keys of the delta data.
    Type: Application
    Filed: June 18, 2010
    Publication date: October 14, 2010
    Applicant: S. Grants Co., Ltd.
    Inventors: Toshio Shinjo, Mitsuhiro Kokubun
  • Publication number: 20100262621
    Abstract: Methods, systems and program product are disclosed for determining a matching level of a text lookup segment with a plurality of source texts in a translation memory in terms of context. In particular, embodiments of the present invention determines any exact matches for the lookup segment in the plurality of source texts, and determines, in the case that at least one exact match is determined, that a respective exact match is an in-context exact (ICE) match for the lookup segment in the case that a context of the lookup segment matches that of the respective exact match. The degree of context matching required can be predetermined, and results prioritized. The invention also includes methods, systems and program products for storing a translation pair of source text and target text in a translation memory including context, and the translation memory so formed. The invention ensures that content is translated the same as previously translated content and reduces translator intervention.
    Type: Application
    Filed: October 27, 2009
    Publication date: October 14, 2010
    Inventors: Russ Ross, Kevin Gillespie, Oliver Christ, Daniel Brockmann
  • Publication number: 20100257159
    Abstract: An information search apparatus is provided. The information search apparatus includes: a character string input unit configured to obtain a character string from a client; a character string information search unit configured to obtain information that includes the character string from an index DB; a similarity calculation unit configured to calculate degree of similarity between the character string and searched information; and an output unit configured to output the searched information in descending order of the degree of similarity.
    Type: Application
    Filed: September 10, 2008
    Publication date: October 7, 2010
    Applicant: Nippon Telegraph and Telephone Corporation
    Inventors: Yukio Uematsu, Kengo Fujioka, Syunsuke Konagai, Ryoji Kataoka
  • Publication number: 20100250599
    Abstract: An approach is provided for integrating place metadata provided by a community of metadata builders, including receiving registration data that indicates one or more values for a corresponding one or more attributes that describe a place. A place is associated with a geographic location. Providing an indication of match between the registration data and metadata for a predetermined place is also initiated. In some embodiments, a new entry for a set of predetermined places is generated based on validating the registration data and a negligible degree of match. In some embodiments, a unique identifier for the place is included in indication of match for either a new place represented by the registration data or a matching predetermined place.
    Type: Application
    Filed: June 4, 2009
    Publication date: September 30, 2010
    Applicant: Nokia Corporation
    Inventors: Andreas SCHMIDT, Alexander Grosse
  • Publication number: 20100250596
    Abstract: Methods and apparatus are provided for discovering minimal conditional functional dependencies (CFDs). CFDs extend functional dependencies by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. A disclosed CFDMiner algorithm, based on techniques for mining closed itemsets, discovers constant minimal CFDs. A disclosed CTANE algorithm discovers general minimal CFDs based on the levelwise approach. A disclosed FastCFD algorithm discovers general minimal CFDs based on a depth-first search strategy, and an optimization technique via closed-itemset mining to reduce search space.
    Type: Application
    Filed: March 26, 2009
    Publication date: September 30, 2010
    Inventors: Wenfei Fan, Ming Xiong
  • Publication number: 20100250598
    Abstract: Methods and systems are described that involve recognizing complex entities from text documents with the help of structured data and Natural Language Processing (NLP) techniques. In one embodiment, the method includes receiving a document as input from a set of documents, wherein the document contains text or unstructured data. The method also includes identifying a plurality of text segments from the document via a set of tagging techniques. Further, the method includes matching the identified plurality of text segments against attributes of a set of predefined entities. Lastly, a best matching predefined entity is selected for each text segment from the plurality of text segments. In one embodiment, the system includes a set of documents, each document containing text or unstructured data. The system also includes a database storage unit that stores a set of predefined entities, wherein each entity contains a set of attributes.
    Type: Application
    Filed: March 30, 2009
    Publication date: September 30, 2010
    Inventors: Falk Brauer, Wojciech Barczynski, Hong-Hai Do, Alexander Loser, Marcus Schramm
  • Publication number: 20100250562
    Abstract: A method of analyzing words in an arbitrary text document comprises identifying a candidate name of an inhabited area in an arbitrary text, searching and isolating strings to the left and the right of the candidate name, comparing these strings to a map database comprising addresses containing the candidate name, and thereby determining a complete address from the strings matching the map database and the candidate name. A method for searching for a service or product on the World Wide Web comprises providing a global database of web pages indexed by words and locations. The global database is searched using keywords describing the service or product and using a search location. The search process returns a list of web pages matching the keywords and the search location.
    Type: Application
    Filed: March 24, 2009
    Publication date: September 30, 2010
    Inventor: Ivica Siladic
  • Publication number: 20100242109
    Abstract: In embodiments of the present invention improved capabilities are described for reducing computer file access time associated with on-access scanning through predictive preemptive scanning, where the prediction may be enabled through the development and use of a file access performance cost mapping of a computing facility's file system. In a first step, file access information describing a pattern of each of a plurality of computer files that have been accessed in a computer file system may be collected. In a second step, the file access information may be processed to generate a file access performance cost statistic for each of the plurality of computer files, where the file access performance cost statistic may be a measure of the time aggregate effect on the computing facility's system performance associated with the access of the file. In a third step, the file access performance cost statistic may be maintained for each of the plurality of files accessed by the computing facility.
    Type: Application
    Filed: March 17, 2009
    Publication date: September 23, 2010
    Inventor: Graham J. Lee
  • Publication number: 20100232690
    Abstract: An image processing apparatus includes a character recognition unit configured to perform character recognition on a plurality of character images in a document image to acquire a character code corresponding to each character image, and a generation unit configured to generate an electronic document, wherein the electronic document includes the document image, a plurality of character codes acquired by the character recognition unit, a plurality of glyphs, and data which indicates the glyphs to be used to render each of the character codes, wherein each of the plurality of glyphs is shared and used by different character codes based on the data when rendering characters that correspond to the plurality of character codes acquired by the recognition unit.
    Type: Application
    Filed: June 23, 2008
    Publication date: September 16, 2010
    Applicant: CANON KABUSHIKI KAISHA
    Inventors: Tomotoshi Kanatsu, Makoto Enomoto, Taeko Yamazaki
  • Publication number: 20100235374
    Abstract: Systems and methods for reducing file sizes for files delivered over a network are disclosed. A method comprises receiving a first file comprising sequences of data; creating a hash table having entries corresponding to overlapping sequences of data; receiving a second file comprising sequences of data; comparing each of the sequences of data in the second file to the sequences of data in the hash table to determine sequences of data present in both the first and second files; and creating a third file comprising sequences of data from the second file and representations of locations and lengths of said sequences of data present in both the first and second files.
    Type: Application
    Filed: May 28, 2010
    Publication date: September 16, 2010
    Inventors: Henk Bots, Srikanth Devarajan, Saravana Annamalaisami
  • Publication number: 20100235392
    Abstract: A system and method for an entropy-based near-match analysis identifies target files that are almost, but not identical, to a reference file. A computing processor computes entropies of the reference and target files, and determines the likeness of the target files to the references file based on the computed entropies. The computing processor determines a near match between the target file and the reference file if the likeness of the two files is within a user-defined tolerance level. According to one embodiment of the invention, the information entropy is a weighted value that takes into account the size of the file.
    Type: Application
    Filed: March 11, 2010
    Publication date: September 16, 2010
    Inventors: Shawn McCreight, Dominik Weber
  • Publication number: 20100223280
    Abstract: A contextual similarity measurement system computes a similarity model for a reference document using a prediction by partial match method. The system further computes a similarity measure between a compared document and the reference document using the similarity for the reference document.
    Type: Application
    Filed: February 27, 2009
    Publication date: September 2, 2010
    Inventor: JAMES PAUL SCHNEIDER
  • Publication number: 20100223292
    Abstract: A method resolves ambiguous spotted entity names in a data corpus by determining an activation level value for each of a plurality of nodes corresponding to a single ambiguous entity name. The activation levels for each of the nodes may be modified by inputting outside domain knowledge corresponding to the nodes to increase the activation value of the nodes, spotting entity names corresponding to the nodes to increase the activation value of the nodes, searching the data corpus to spot newly posted entity names to increase the activation value of the nodes, and searching the data corpus to reduce or deactivate the activation value of the nodes by eliminating false positives. The ambiguous entity name is assigned to the node determined to have the highest activation level and is then outputted to a user.
    Type: Application
    Filed: February 27, 2009
    Publication date: September 2, 2010
    Applicant: International Business Machines Corporation
    Inventors: Varun Bhagwan, Tyrone W.A. Grandison, Daniel F. Gruhl, Jan H. Pieper
  • Publication number: 20100223305
    Abstract: Techniques for managing memory usage in a processing system are provided. This may be achieved by receiving a data stream including multiple tuples and determining a query plan that was generated for a continuous query applied to the multiple tuples in the data stream. The query plan may include one or more operators. Before scheduling an operator in the query plan, it is determined when an eviction is to be performed based a level of free memory of the processing system. An eviction candidate is determined and a page associated with the eviction candidate is evicted from the memory to a persistent storage.
    Type: Application
    Filed: March 2, 2009
    Publication date: September 2, 2010
    Applicant: Oracle International Corporation
    Inventors: Hoyong Park, Namit Jain, Anand Srinivasan, Shailendra Mishra
  • Patent number: 7788235
    Abstract: An extrusion detection system prevents the release of sensitive data from an enterprise. The system includes administration module for broadcasting taint instructions, each of which include a definition of sensitive data. The system also includes a plurality of extrusion detection nodes. Each node marks sensitive data as tainted responsive to the taint instructions, marks data that depends on sensitive data as tainted. When the potential release of tainted data is detected, an action is executed responsive to the taint instructions.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: August 31, 2010
    Assignee: Symantec Corporation
    Inventor: Matthew Yeo
  • Publication number: 20100217596
    Abstract: In one aspect, a method for processing media includes accepting a query. One or more language patterns are identified that are similar to the query. A putative instance of the query is located in the media. The putative instance is associated with a corresponding location in the media. The media in a vicinity of the putative instance is compared to the identified language patterns and data characterizing the putative instance of the query is provided according to the comparing of the media to the language patterns, for example, as a score for the putative instance that is determined according to the comparing of the media to the language patterns.
    Type: Application
    Filed: February 24, 2009
    Publication date: August 26, 2010
    Applicant: Nexidia Inc.
    Inventors: Robert W. Morris, Jon A. Arrowood, Mark A. Clements, Kenneth King Griggs, Peter S. Cardillo, Marsal Gavalda
  • Publication number: 20100217770
    Abstract: A method for automatically sensing a set of elements in a computer system, wherein each element in the set has an associated character body from a plurality of character bodies, and each character body comprises character strings which characterize a respective element, the performance of the method involving a search for at least one prescribed character string within the character bodies and use of the at least one character string to ascertain at least one property for at least one element, and association of this at least one ascertained property with at least one category, and this involving a user of the method being provided with a taxonomy which is inherent of the set of elements.
    Type: Application
    Filed: February 17, 2010
    Publication date: August 26, 2010
    Inventor: Peter Ernst
  • Publication number: 20100211591
    Abstract: An exemplary string processing method for specific byte string processing with word-related instructions includes: loading a plurality of first predetermined strings; comparing a specific string with the loaded first predetermined strings simultaneously, thereby generating a plurality of comparison results corresponding to the specific string; and generating a string processing result according to the comparison results. A string processing apparatus uses the string processing method.
    Type: Application
    Filed: February 16, 2009
    Publication date: August 19, 2010
    Inventors: Chuan-Hua Chang, Chi-Chang Lai, Hong-Men Su
  • Publication number: 20100211960
    Abstract: Among other disclosed subject matter, a computer-implemented method for characterizing user information includes receiving a plurality of identifiers associated with respective users. The method includes identifying, using the plurality of identifiers, any information portions in an information collection relating to at least one of the users, the information collection reflecting network activities by the users. The method includes generating a record that includes the plurality of identifiers associated with the corresponding information portions. The method includes identifying at least one of the information portions as corresponding to a category established for user classification. The method includes identifying a subset of the plurality of identifiers as associated with the category; and.
    Type: Application
    Filed: February 17, 2009
    Publication date: August 19, 2010
    Applicant: GOOGLE INC.
    Inventors: Sarah Sirajuddin, Xuefu Wang, Angshuman Guha, Oren E. Zamir, Aitan Weinberg
  • Publication number: 20100205123
    Abstract: The present invention relates to systems and methods for identifying and removing unwanted or harmful electronic text (e.g., spam). In particular, the present invention provides systems and methods utilizing inexact string matching methods and machine learning and non-learning methods for identifying and removing unwanted or harmful electronic text.
    Type: Application
    Filed: August 8, 2007
    Publication date: August 12, 2010
    Applicant: TRUSTEES OF TUFTS COLLEGE
    Inventors: D. Sculley, Gabriel Wachman, Carla E. Brokley
  • Publication number: 20100205173
    Abstract: In one implementation, a method is provided for increasing relevance of database search results. The method includes receiving a subject query string and determining a trained edit distance between the subject query string and a candidate string using trained cost factors derived from a training set of labeled query transformations. A trained cost factor includes a conditional probability for mutations in labeled non-relevant query transformations and a conditional probability for mutations in labeled relevant query transformations. The candidate string is evaluated the for selection based on the trained edit distance. In some implementations, the cost factors may take into account the context of a mutation. As such, in some implementations multi-dimensional matrices are utilized which include the trained cost factors.
    Type: Application
    Filed: April 22, 2010
    Publication date: August 12, 2010
    Applicant: YAHOO! INC.
    Inventor: John M. Carnahan
  • Publication number: 20100198850
    Abstract: Disclosed herein is an improved architecture for regular expression pattern matching. Improvements to pattern matching deterministic finite automatons (DFAs) that are described by the inventors include a pipelining strategy that pushes state-dependent feedback to a final pipeline stage to thereby enhance parallelism and throughput, augmented state transitions that track whether a transition is indicative of a pattern match occurring thereby reducing the number of necessary states for the DFA, augmented state transition that track whether a transition is indicative of a restart to the matching process, compression of the DFA's transition table, alphabet encoding for input symbols to equivalence class identifiers, the use of an indirection table to allow for optimized transition table memory, and enhanced scalability to facilitate the ability of the improved DFA to process multiple input symbols per cycle.
    Type: Application
    Filed: February 10, 2010
    Publication date: August 5, 2010
    Applicant: Exegy Incorporated
    Inventors: Ron K. Cytron, David Edward Taylor, Benjamin Curry Brodie
  • Publication number: 20100191753
    Abstract: Described is a technology in which sequential data, such as application program command sequences, are processed into patterns, such as for use in analyzing program usage. In one aspect, sequential data may be first transformed via state machines that remove repeated data, group similar data into sub-sequences, and/or remove noisy data. The transformed data is then segmented into units. A pattern extraction mechanism extracts patterns from the units into a pattern set, by calculating a stability score (e.g., a mutual information score) between succeeding units, selecting the pair of units having the most stability (e.g., the highest score), and adding corresponding information for that pair into the pattern set. Pattern extraction is iteratively repeated until a stopping criterion is met, e.g., the pattern set reaches a defined size, or when the stability score is smaller than a pre-set threshold.
    Type: Application
    Filed: January 26, 2009
    Publication date: July 29, 2010
    Applicant: Microsoft Corporation
    Inventors: Jie Su, Min Chu, Wenli Zhu, Jian Wang
  • Publication number: 20100185637
    Abstract: Methods for matching a candidates with a target utilizing extract, transform and load (ETL) metadata utilizing a computer, the candidates originating from a number of secondary data sources are presented including: causing the computer to receive the target from a target data source; causing the computer to fetch the candidates from the number of secondary data sources; causing the computer to process match rules, the match rules configured for determining whether the candidates match with the target, where the ETL metadata provides data for the processing; if the number of match rules determines a potential candidate match, causing the computer to score the potential candidate match utilizing a weighting method, the weighting method corresponding with a degree of importance of the match, where the potential candidate match corresponds with one of candidates; and causing the computer to display the potential candidate match.
    Type: Application
    Filed: January 14, 2009
    Publication date: July 22, 2010
    Applicant: International Business Machines Corporation
    Inventors: Richard K. Morris, Neville T. Myatt
  • Publication number: 20100185691
    Abstract: Disclosed are methods and apparatus for performing named entity recognition. A set of candidates and corresponding contexts are obtained, each of the set of candidates being a potential seed example of an entity. The contexts of at least a portion of the set of candidates are compared with contexts of a set of seed examples of the entity such that a subset of the set of candidates are added to the set of the seed examples. A set of rules are created from the set of seed examples obtained in the comparing step. A final set of seed examples of the entity is generated by executing the set of rules against the set of candidates.
    Type: Application
    Filed: January 20, 2009
    Publication date: July 22, 2010
    Applicant: Yahoo! Inc.
    Inventors: Utku Irmak, Reiner Kraft
  • Patent number: 7761453
    Abstract: A method and system for indexing and searching a database of iris images having a system to expedite a process of matching a subject to millions (more or less) of templates within a database is disclosed. The system may progressively match an iris image template to an iris template in a database by progressing from a top layer of the database to a lower layer of the database. Such matching or retrieval may use a subject code as a query or probe and then find a similarity measure for the features of codes or templates in the database. A multi-stage hierarchal clustering process may be used to compress codes and/or templates.
    Type: Grant
    Filed: March 2, 2007
    Date of Patent: July 20, 2010
    Assignee: Honeywell International Inc.
    Inventor: Rida M. Hamza
  • Publication number: 20100174724
    Abstract: A method of social networking is disclosed in which users may indicate areas of interest to them and/or search for other users with the same or similar interests. Users may specify one or more areas of interest to them and enter those interests in a database, by means of a list of “tags” or keywords, called a taglist. The values of the tags may be weighted. A user wishing to find other users with similar interests may input a search taglist which is compared with the other taglists stored in the database, and a list of the users with the closest matching taglists is returned. The method may also be used to characterize documents, projects, media files or other data objects, so that such items may be searched in similar fashion.
    Type: Application
    Filed: January 1, 2010
    Publication date: July 8, 2010
    Inventors: David Robert Wallace, Marilynn Klamkin, Kali Donovan
  • Patent number: 7747078
    Abstract: A method, computer program product, apparatus, and system that detects a substring in an input data string by producing a fingerprint of a portion of the data string and comparing the fingerprint of the portion of the data string to at least one predefined fingerprint. The predefined fingerprint may be a fingerprint of a portion of a predefined pattern of interest. If the fingerprints match, further pattern recognition processing may be performed on the input string.
    Type: Grant
    Filed: July 6, 2006
    Date of Patent: June 29, 2010
    Assignee: Intel Corporation
    Inventors: Lukas Kencl, Gianluca Iannaccone, Ramaswamy Ramaswamy
  • Publication number: 20100161566
    Abstract: Techniques are disclosed for adding entities to a group of entity resolution candidates by selecting entities that have a minimum threshold of similarity to a candidate, allowing a greater number of resolutions in an entity resolution system. To resolve an incoming identity record, an initial group of candidates may be selected from known entities by identifying entities that match a candidate building attribute of the incoming identity record. Additional candidates may be selected by identifying entities with some information that is similar to one of the candidate entities.
    Type: Application
    Filed: December 18, 2008
    Publication date: June 24, 2010
    Inventors: Gregery G. Adair, Jeffrey J. Jonas
  • Publication number: 20100153420
    Abstract: A dual-stage regular expression pattern matching method and system is proposed, which is designed for integration to a data processing system, such as a computer platform, a firewall, a network intrusion detention system (NIDS), or a DNA sequence analysis system, for checking whether an input code sequence (such as a network data packet) is matched to specific patterns predefined by regular expressions. The proposed system and method includes a first-stage comparison procedure for comparison of the prefix string of each input code sequence and a second-stage comparison procedure for comparison of the postfix string of the same input code sequence. This feature can be used for processing code sequences having a special pattern without producing an enormous amount of state data that would cause the problem of insufficient memory during operation.
    Type: Application
    Filed: March 5, 2009
    Publication date: June 17, 2010
    Applicant: NATIONAL TAIWAN UNIVERSITY
    Inventors: Chang-Ching Yang, Sheng-De Wang