Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You and Your Team.

Learn More →

Disambiguating USPTO inventor names with semantic fingerprinting and DBSCAN clustering

Disambiguating USPTO inventor names with semantic fingerprinting and DBSCAN clustering The aim of this study is to present a novel approach based on semantic fingerprinting and a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN), which can be used to convert investor records into 128-bit semantic fingerprints. Inventor disambiguation is a method used to discover a unique set of underlying inventors and map a set of patents to their corresponding inventors. Resolving the ambiguities between inventors is necessary to improve the quality of the patent database and to ensure accurate entity-level analysis. Most existing methods are based on machine learning and, while they often show good performance, this comes at the cost of time, computational power and storage space.Design/methodology/approachUsing DBSCAN, the meta and textual data in inventor records are converted into 128-bit semantic fingerprints. However, rather than using a string comparison or cosine similarity to calculate the distance between pair-wise fingerprint records, a binary number comparison function was used in DBSCAN. DBSCAN then clusters the inventor records based on this distance to disambiguate inventor names.FindingsExperiments conducted on the PatentsView campaign database of the United States Patent and Trademark Office show that this method disambiguates inventor names with recall greater than 99 per cent in less time and with substantially smaller storage requirement.Research limitations/implicationsA better semantic fingerprint algorithm and a better distance function may improve precision. Setting of different clustering parameters for each block or other clustering algorithms will be considered to improve the accuracy of the disambiguation results even further.Originality/valueCompared with the existing methods, the proposed method does not rely on feature selection and complex feature comparison computation. Most importantly, running time and storage requirements are drastically reduced. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The Electronic Library Emerald Publishing

Disambiguating USPTO inventor names with semantic fingerprinting and DBSCAN clustering

Loading next page...
 
/lp/emerald-publishing/disambiguating-uspto-inventor-names-with-semantic-fingerprinting-and-M00PERQVcE
Publisher
Emerald Publishing
Copyright
© Emerald Publishing Limited
ISSN
0264-0473
DOI
10.1108/el-12-2018-0232
Publisher site
See Article on Publisher Site

Abstract

The aim of this study is to present a novel approach based on semantic fingerprinting and a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN), which can be used to convert investor records into 128-bit semantic fingerprints. Inventor disambiguation is a method used to discover a unique set of underlying inventors and map a set of patents to their corresponding inventors. Resolving the ambiguities between inventors is necessary to improve the quality of the patent database and to ensure accurate entity-level analysis. Most existing methods are based on machine learning and, while they often show good performance, this comes at the cost of time, computational power and storage space.Design/methodology/approachUsing DBSCAN, the meta and textual data in inventor records are converted into 128-bit semantic fingerprints. However, rather than using a string comparison or cosine similarity to calculate the distance between pair-wise fingerprint records, a binary number comparison function was used in DBSCAN. DBSCAN then clusters the inventor records based on this distance to disambiguate inventor names.FindingsExperiments conducted on the PatentsView campaign database of the United States Patent and Trademark Office show that this method disambiguates inventor names with recall greater than 99 per cent in less time and with substantially smaller storage requirement.Research limitations/implicationsA better semantic fingerprint algorithm and a better distance function may improve precision. Setting of different clustering parameters for each block or other clustering algorithms will be considered to improve the accuracy of the disambiguation results even further.Originality/valueCompared with the existing methods, the proposed method does not rely on feature selection and complex feature comparison computation. Most importantly, running time and storage requirements are drastically reduced.

Journal

The Electronic LibraryEmerald Publishing

Published: Jun 3, 2019

Keywords: Cluster analysis; Patent analysis; Inventor name disambiguation; Semantic fingerprinting

References