Journal of Biomedical Informatics 82 (2018) 63–69 Contents lists available at ScienceDirect Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin Identifying and characterizing highly similar notes in big clinical note datasets a,b, a c a Rodney A. Gabriel , Tsung-Ting Kuo , Julian McAuley , Chun-Nan Hsu UCSD Health Department of Biomedical Informatics, University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA Department of Anesthesiology, University of California, San Diego, 200 West Arbor Dr, San Diego, CA 92103, USA Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA ARTIC L E I NF O ABSTRAC T Keywords: Background: Big clinical note datasets found in electronic health records (EHR) present substantial opportunities Electronic medical record to train accurate statistical models that identify patterns in patient diagnosis and outcomes. However, near-to- De-deduplication exact duplication in note texts is a common issue in many clinical note datasets. We aimed to use a scalable Natural language processing algorithm to de-duplicate notes and further characterize the sources of duplication. Methods: We use an approximation algorithm to minimize pairwise comparisons consisting of three phases: (1) Minhashing with Locality Sensitive Hashing; (2) a
Journal of Biomedical Informatics – Elsevier
Published: Jun 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.
Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.
All the latest content is available, no embargo periods.
“Hi guys, I cannot tell you how much I love this resource. Incredible. I really believe you've hit the nail on the head with this site in regards to solving the research-purchase issue.”Daniel C.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud
“I must say, @deepdyve is a fabulous solution to the independent researcher's problem of #access to #information.”@deepthiw
“My last article couldn't be possible without the platform @deepdyve that makes journal papers cheaper.”@JoseServera