Access the full text.
Sign up today, get DeepDyve free for 14 days.
R. Keeling, Rishi Chhatwal, Nathaniel Huber-Fliflet, Jianping Zhang, Fusheng Wei, Haozhen Zhao, Shi Ye, Han Qin (2019)
Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review2019 IEEE International Conference on Big Data (Big Data)
Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho (2019)
Document Expansion by Query PredictionArXiv, abs/1904.08375
Emily Alsentzer, John Murphy, Willie Boag, W. Weng, Di Jin, Tristan Naumann, Matthew McDermott (2019)
Publicly Available Clinical BERT EmbeddingsArXiv, abs/1904.03323
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Shubham Chatterjee, Laura Dietz (2022)
BERT-ER: Query-specific BERT Entity Representations for Entity RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
G. Salton, Anita Wong, Chung-Shu Yang (1975)
A vector space model for automatic indexingCommun. ACM, 18
Tiange Zhu, Raphaël Fournier-S’niehotta, P. Rigaux, Nicolas Travers (2022)
A Framework for Content-Based Search in Large Music CollectionsBig Data Cogn. Comput., 6
B. Billerbeck, J. Zobel (2005)
Document expansion versus query expansion for ad-hoc retrieval
S. Robertson, Karen Jones (1976)
Relevance weighting of search termsJ. Am. Soc. Inf. Sci., 27
(2020)
LEGAL-BERT: the muppets straight out of law school
I. Markov, M. Rijke (2019)
What Should We Teach in Information Retrieval?ACM SIGIR Forum, 52
G. Furnas, T. Landauer, L. Gomez, S. Dumais (1987)
The vocabulary problem in human-system communicationCommun. ACM, 30
Kenneth Church (2017)
Word2VecNat. Lang. Eng., 23
Dor Bank, Noam Koenigstein, R. Giryes (2021)
AutoencodersDeep Learning in Science
George Zerveas (2022)
Mitigating Bias in Search Results Through Contextual Document Reranking and Neutrality RegularizationProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
Xuan Shan, Chuanjie Liu, Yiqian Xia, Qi Chen, Yusi Zhang, Angen Luo, Y. Luo (2020)
BISON: BM25-weighted Self-Attention Framework for Multi-Fields Document SearchArXiv, abs/2007.05186
Qiang Lu, Jack Conrad (2012)
Bringing Order to Legal Documents - An Issue-based Recommendation System Via Cluster Association
N. Jaleel, James Allan, W. Croft, Fernando Diaz, L. Larkey, Xiaoyan Li, Mark Smucker, C. Wade (2004)
UMass at TREC 2004: Novelty and HARD
C. Xiong, Z. Dai, J. Callan, Z. Liu, R. Power (2017)
End-to-End neural ad-hoc ranking with kernel poolingConference on Research and Development in Information Retrieval (SIGIR’17)
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, Jimmy Lin (2020)
Document Ranking with a Pretrained Sequence-to-Sequence Model
J. Rocchio (1971)
Relevance feedback in information retrieval
Ilias Chalkidis, Manos Fergadiotis, Nikolaos Manginas, Eva Katakalou, Prodromos Malakasiotis (2021)
Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations
Ye Wu, H. Ting, T. Lam, Ruibang Luo (2021)
BioNumQA-BERT: answering biomedical questions using numerical facts with a deep language representation modelProceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
Huan Koh, Jiaxin Ju, Ming Liu, Shirui Pan (2022)
An Empirical Survey on Long Document Summarization: Datasets, Models, and MetricsACM Computing Surveys, 55
Honglei Zhuang, Zhen Qin, Shuguang Han, Xuanhui Wang, Michael Bendersky, Marc Najork (2021)
Ensemble Distillation for BERT-Based Ranking ModelsProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval
Emhimed Alatrish, Dusan Tosic, Nikola Milenkovic (2014)
Building ontologies for different natural languagesComput. Sci. Inf. Syst., 11
J. Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W. Croft, Xueqi Cheng (2019)
A Deep Look into Neural Ranking Models for Information RetrievalArXiv, abs/1903.06902
Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, Yingfei Sun (2020)
PARADE: Passage Representation Aggregation for Document RerankingACM Transactions on Information Systems
Karen Jones, S. Walker, S. Robertson (2000)
A probabilistic model of information retrieval: development and comparative experiments - Part 1Inf. Process. Manag., 36
Xueguang Ma, Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin (2022)
Document Expansion Baselines and Learned Sparse Lexical Representations for MS MARCO V1 and V2Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter Liu (2019)
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive SummarizationArXiv, abs/1912.08777
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, Illia Polosukhin (2017)
Attention is All you Need
Wenhao Zhu, Xiaoyu Zhang, Liang Ye, Qiuhong Zhai (2023)
Query Context Expansion for Open-Domain Question AnsweringACM Transactions on Asian and Low-Resource Language Information Processing, 22
Zhi Zheng, Kai Hui, Ben He, Xianpei Han, Le Sun, Andrew Yates (2020)
BERT-QE: Contextualized Query Expansion for Document Re-rankingArXiv, abs/2009.07258
Jonas Pfeiffer, Samuel Broscheit, Rainer Gemulla, Mathias Göschl (2018)
A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval
Gabriel Shenouda, A. Bossard, Oussama Ayoub, Christophe Rodrigues (2022)
SummVD : An efficient approach for unsupervised topic-based text summarization
Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, Russell Power (2017)
End-to-End Neural Ad-hoc Ranking with Kernel PoolingProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
Hang Li, Jun Xu (2014)
Semantic Matching in Search
V. Boteva, D. Ghalandari, Artem Sokolov, S. Riezler (2016)
A Full-Text Learning to Rank Dataset for Medical Information Retrieval
Tomas Mikolov, Kai Chen, G. Corrado, J. Dean (2013)
Efficient Estimation of Word Representations in Vector Space
S. Naseri, J. Dalton, A. Yates, J. Allan (2021)
CEQE: contextualized embeddings for query expansionEuropean Conference on Information Retrieval (ECIR’21), online
Yumeng Wang, Lijun Lyu, Avishek Anand (2022)
BERT Rankers are Brittle: A Study using Adversarial Document PerturbationsProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval
Soyeong Jeong, Jinheon Baek, chaeHun Park, Jong Park (2021)
Unsupervised Document Expansion for Information Retrieval with Stochastic Text GenerationArXiv, abs/2105.00666
Dr. Azad, A. Deepak (2017)
Query Expansion Techniques for Information Retrieval: a SurveyInf. Process. Manag., 56
This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.Design/methodology/approachTo generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.FindingsThe authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.Originality/valueIn this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.
International Journal of Web Information Systems – Emerald Publishing
Published: Nov 28, 2023
Keywords: Unsupervised document expansion; BERT; Information retrieval; BM25
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.