{"id":553,"date":"2019-11-09T09:03:23","date_gmt":"2019-11-09T09:03:23","guid":{"rendered":"http:\/\/guires.uk\/newsroom\/?p=553"},"modified":"2019-11-18T06:18:10","modified_gmt":"2019-11-18T06:18:10","slug":"knowledge-extraction-from-scientific-research-articles-using-semantic-extraction-explore-a-use-case-and-perspective","status":"publish","type":"post","link":"https:\/\/guires.uk\/newsroom\/use-cases\/knowledge-extraction-from-scientific-research-articles-using-semantic-extraction-explore-a-use-case-and-perspective\/","title":{"rendered":"Knowledge Extraction from Scientific Research Articles using Semantic Extraction: Explore a Use Case and Perspective"},"content":{"rendered":"\n<h3 class=\"h3color\">The Challenge<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Semantic extraction is a natural language processing technique which identifies\nand extract&nbsp;entities(for example, people, information,\nlocations, companies from news articles, etc.),<strong>&nbsp;facts<\/strong>,&nbsp;<strong>attributes<\/strong>,&nbsp;<strong>concepts<\/strong>, and&nbsp;<strong>events<\/strong>&nbsp;to populate meta-data fields. The\npurpose of this is to enable the analysis of enterprise unstructured content,\nsuch as text documents, emails, images, reports,\nand&nbsp;other&nbsp;business-critical content.&nbsp; Modern Scientific \/\nscholarly content services look for content or information extract such as\nkeywords, clauses, sentences, scientific claims (core finding of an article) or\nparagraphs extracted from PDF files or in word or HTML. Several methods have\nbeen applied to extract such information such as rule-based linguistic\napproaches, statistical approaches, machine learning approaches (supervised,\nunsupervised and semi-supervised) and domain specific approaches. The choice of\nmethod should facilitate extraction of a great deal of relevant information\nthat makes sense <\/p>\n\n\n\n<h3 class=\"h3color\">Opportunity <\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Semantic extraction for extracting disease codes and names from the\nalready extracted recent research papers. There are two ways of doing this \u2013 Rule-Based\nmatching and Machine learning<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Rule-based\nmatching is done by matching approximate disease names with the research paper\nand finding out the similar one from the disease names. This will not work if a\nsingle character does not match while matching, so this is not a better option to\nreach a &nbsp;solution. While using Machine\nlearning we need to train a model which already has disease names and label as\ndisease for those names, so that the system can train on it. After training the\nmodel, we then can test the model by giving only our research paper to them and\nto see whether the model extracts the disease names correctly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The International Classification of Diseases (ICD), is a medical classification\nlist of codes for diagnoses and procedures. ICD codes have been adopted widely\nby physicians and other health care providers for reimbursement, storage and\nretrieval of diagnostic information. The process of assigning ICD codes to a\npatient is time-consuming and error prone. Clinical coders need to extract key\ninformation from Electronic Medical Records (EMRs) and assign correct codes\nbased on category, anatomic site, laterality and severity. The amount of\ninformation and complex hierarchy greatly increases the difficulty. &nbsp;By Applying ICD code as input, corresponding\ndiseases can be extracted at the first stage. For instance, Diseases of the\nrespiratory system:&nbsp;J00-J99.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J00-J06\">J00-J06<\/a>&nbsp;&nbsp;Acute\n     upper respiratory infections<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J09-J18\">J09-J18<\/a>&nbsp;&nbsp;Influenza\n     and pneumonia<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J20-J22\">J20-J22<\/a>&nbsp;&nbsp;Other\n     acute lower respiratory infections<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J30-J39\">J30-J39<\/a>&nbsp;&nbsp;Other\n     diseases of the upper respiratory tract<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J40-J47\">J40-J47<\/a>&nbsp;&nbsp;Chronic\n     lower respiratory diseases<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J60-J70\">J60-J70<\/a>&nbsp;&nbsp;Lung\n     diseases due to external agents<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J80-J84\">J80-J84<\/a>&nbsp;&nbsp;Other\n     respiratory diseases principally affecting the interstitium<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J85-J86\">J85-J86<\/a>&nbsp;&nbsp;Suppurative\n     and necrotic conditions of the lower respiratory tract<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J90-J94\">J90-J94<\/a>&nbsp;&nbsp;Other\n     diseases of the pleura<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J95-J95\">J95-J95<\/a>&nbsp;&nbsp;Intraoperative\n     and postprocedural complications and disorders of the respiratory system,\n     not elsewhere classified<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J96-J99\">J96-J99<\/a>&nbsp;&nbsp;Other\n     diseases of the respiratory system<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For\nexample, for J60, we get &#8211; Coal worker\u2019s pneumoconiosis. Subsequently, we can\ncode based on study design, followed by target population such as coal miners,\nand \u2018risk factors. Now the next step for this would be to search for recent research\npapers using python, by building a model. &nbsp;With deep learning bidirectional LSTM, model\nhad been built that created a word and tag dictionary, test and train sets,\nextracting features, training of bi-directional LSTM model and predicting on\ntest set. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Research\npapers have been gathered from PUBMED website. This website contains more than\n30 million citations for biomedical literature from MEDLINE, life science\njournals and online books. Title, Abstract, Date of journal publication, Author\ninformation, copyright, Keywords from the journal, method and results from\njournals, DOI, link to paper if it is open for all users.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Paper\nExtraction Process<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Code and\nOutput-<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We need to connect to the PubMed site for getting information about our search. We have also sorted the search by recent. The output of this search will be in XML file.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"560\" height=\"300\" src=\"https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/sd.jpg\" alt=\"\" class=\"wp-image-554\" srcset=\"https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/sd.jpg 560w, https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/sd-300x161.jpg 300w\" sizes=\"auto, (max-width: 560px) 100vw, 560px\" \/><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"> From our output here we have printed a snippet of title and abstract of the recent research paper. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"999\" height=\"480\" src=\"https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/gs.jpg\" alt=\"\" class=\"wp-image-555\" srcset=\"https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/gs.jpg 999w, https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/gs-300x144.jpg 300w, https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/gs-768x369.jpg 768w\" sizes=\"auto, (max-width: 999px) 100vw, 999px\" \/><\/figure><\/div>\n\n\n\n<h3 class=\"h3color\">Why Guires<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Guires\nData analytics mission is to democratize AI for healthcare industries that\nmainly involve modern scientific\/scholarly content services such as Chemical\nAbstracts Services for chemistry-related articles, Web of Knowledge, CiteSeer.\nIST, DBLP. The team of data science expert uses the power of AI to solve\nbusiness and social challenges and we specialize inautomatic extraction of\nsemantic information from wide range of digital resources such as metadata\nextraction, document summarization and keyword extraction techniques. We apply a\nwide range of techniques and approaches to extract a single or multiple claim\nfrom a scientific article using text mining models such as text clustering,\nassociation rule extraction, K-means Algorithm, information visualization, word\ncloud, subsequently ML approaches such as least-square support vector machines.\n<\/p>\n\n\n\n<h3 class=\"h3color\">Guires offers innovative solutions:<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Guires Text mining and Machine\nlearning approach helps you build and deploy text mining to improve business\nprocess <\/li><li>Guires deploy semantic data\nmodeling as a layer to your knowledge-centric architecture by integrating your\nenterprise data virtually. <\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Get Semantic Extraction working\nfor you. Contact Guires expert.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Challenge Semantic extraction is a natural language processing technique which identifies and extract&nbsp;entities(for example, people, information, locations, companies from news articles, etc.),&nbsp;facts,&nbsp;attributes,&nbsp;concepts, and&nbsp;events&nbsp;to populate meta-data fields. The purpose of this is to enable the analysis of enterprise unstructured content, such as text documents, emails, images, reports, and&nbsp;other&nbsp;business-critical content.&nbsp; Modern Scientific \/ scholarly content services [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":557,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[],"class_list":["post-553","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-use-cases"],"_links":{"self":[{"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/posts\/553","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/comments?post=553"}],"version-history":[{"count":2,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/posts\/553\/revisions"}],"predecessor-version":[{"id":892,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/posts\/553\/revisions\/892"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/media\/557"}],"wp:attachment":[{"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/media?parent=553"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/categories?post=553"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/tags?post=553"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}