{"id":589,"date":"2019-11-09T09:40:54","date_gmt":"2019-11-09T09:40:54","guid":{"rendered":"http:\/\/guires.uk\/newsroom\/?p=589"},"modified":"2019-11-11T04:50:12","modified_gmt":"2019-11-11T04:50:12","slug":"knowledge-extraction-from-scientific-research-articles-using-semantic-extraction-explore-a-use-case-and-perspective-2","status":"publish","type":"post","link":"https:\/\/guires.uk\/newsroom\/use-cases\/knowledge-extraction-from-scientific-research-articles-using-semantic-extraction-explore-a-use-case-and-perspective-2\/","title":{"rendered":"Knowledge Extraction from Scientific Research Articles using Semantic Extraction: Explore a Use Case and Perspective"},"content":{"rendered":"\n<h3 class=\"h3color\">The Challenges<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Semantic extraction is a natural language processing technique which identifies and extract&nbsp;entities(for example, people, information, locations, companies from news articles, etc.),<strong>&nbsp;facts<\/strong>,&nbsp;<strong>attributes<\/strong>,&nbsp;<strong>concepts<\/strong>, and&nbsp;<strong>events<\/strong>&nbsp;to populate meta-data fields. The purpose of this is to enable the analysis of enterprise unstructured content, such as text documents, emails, images, reports, and&nbsp;other&nbsp;business-critical content.&nbsp; Modern Scientific \/ scholarly content services look for content or information extract such as keywords, clauses, sentences, scientific claims (core finding of an article) or paragraphs extracted from PDF files or in word or HTML. Several methods have been applied to extract such information such as rule-based linguistic approaches, statistical approaches, machine learning approaches (supervised, unsupervised and semi-supervised) and domain specific approaches. The choice of method must able to extract a great deal of relevant information that makes sense <\/p>\n\n\n\n<h3 class=\"h3color\">Opportunity <\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Semantic extraction for extracting disease codes and names from the\nalready extracted recent research papers. There are two ways of doing this \u2013 Rule-Based\nmatching and Machine learning<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Rule-based\nmatching is done by matching approximate disease names with the research paper\nand finding out the similar one from the disease names. This will not catch if\na single character does not match while matching, so this is not that better\noption for our solution. While using Machine learning we need to train a model\nwhich already has disease names and label as disease for those names so that\nthe system can train on it. After training the model, we then can test the\nmodel by giving only our research paper to them and to see whether the model\nextracts the disease names correctly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The International Classification of Diseases (ICD), is a medical\nclassification list of codes for diagnoses and procedures. ICD codes have been\nadopted widely by physicians and other health care providers for reimbursement,\nstorage and retrieval of diagnostic information. The process of assigning ICD\ncodes to a patient visit is time-consuming and error prone. Clinical coders need\nto extract key information from Electronic Medical Records (EMRs) and assign\ncorrect codes based on category, anatomic site, laterality and severity. The\namount of information and complex hierarchy greatly increase the difficulty. &nbsp;By Applying ICD code as input and\ncorresponding diseases can be extracted at the first stage. For instance, Diseases\nof the respiratory system:&nbsp;J00-J99.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J00-J06\">J00-J06<\/a>&nbsp;&nbsp;Acute\n     upper respiratory infections<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J09-J18\">J09-J18<\/a>&nbsp;&nbsp;Influenza\n     and pneumonia<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J20-J22\">J20-J22<\/a>&nbsp;&nbsp;Other\n     acute lower respiratory infections<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J30-J39\">J30-J39<\/a>&nbsp;&nbsp;Other\n     diseases of the upper respiratory tract<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J40-J47\">J40-J47<\/a>&nbsp;&nbsp;Chronic\n     lower respiratory diseases<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J60-J70\">J60-J70<\/a>&nbsp;&nbsp;Lung\n     diseases due to external agents<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J80-J84\">J80-J84<\/a>&nbsp;&nbsp;Other\n     respiratory diseases principally affecting the interstitium<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J85-J86\">J85-J86<\/a>&nbsp;&nbsp;Suppurative\n     and necrotic conditions of the lower respiratory tract<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J90-J94\">J90-J94<\/a>&nbsp;&nbsp;Other\n     diseases of the pleura<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J95-J95\">J95-J95<\/a>&nbsp;&nbsp;Intraoperative\n     and postprocedural complications and disorders of the respiratory system,\n     not elsewhere classified<\/li><li><a href=\"https:\/\/www.icd10data.com\/ICD10CM\/Codes\/J00-J99\/J96-J99\">J96-J99<\/a>&nbsp;&nbsp;Other\n     diseases of the respiratory system<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For\nexample, for J60, we get &#8211; Coal worker\u2019s pneumoconiosis. Subsequently, we can\ncode based on study design, followed by target population such as coal miners,\nand \u2018risk factors. Now next step for this would be to search for recent\nResearch papers using python by building a model. &nbsp;With deep learning bidirectional LSTM, model\nhad been built that create word and tag dictionary, test and train sets,\nextracting features, training of bi-directional LSTM model and predicting on\ntest set. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Research\npapers are scraped from PUBMED organisations website. This website contains\nmore than 30 million citations for biomedical literature from MEDLINE, life\nscience journals and online books. Title, Abstract, Date of journal\npublication, Author information, copyright, Keywords from the journal, method\nand results from journals, DOI, link to paper if it is open for all users.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Paper\nExtraction Process<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Code and\nOutput-<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We need to connect to the PubMed site for getting information about our search. We have also sorted the search by recent. The output of this search will be in XML file.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"531\" height=\"284\" src=\"https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/image-25.png\" alt=\"\" class=\"wp-image-590\" srcset=\"https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/image-25.png 531w, https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/image-25-300x160.png 300w\" sizes=\"auto, (max-width: 531px) 100vw, 531px\" \/><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"> From our output here we have printed a snippet of title and abstract of the recent research paper. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"567\" height=\"273\" src=\"https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/image-26.png\" alt=\"\" class=\"wp-image-591\" srcset=\"https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/image-26.png 567w, https:\/\/guires.uk\/newsroom\/wp-content\/uploads\/2019\/11\/image-26-300x144.png 300w\" sizes=\"auto, (max-width: 567px) 100vw, 567px\" \/><\/figure><\/div>\n\n\n\n<h3 class=\"h3color\">Why Guires<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Guires\nData analytics mission is to democratize AI for healthcare industries that\nmainly involve modern scientific\/scholarly content services such as Chemical\nAbstracts Services for chemistry-related articles, Web of Knowledge, CiteSeer.\nIST, DBLP. The team of data science expert use the power of AI to solve\nbusiness and social challenges. Our Automatic extraction of semantic\ninformation from wide range of digital resources such as metadata extraction,\ndocument summarization and keyword extraction techniques. We apply a wide range\nof techniques and approaches to extract a single or multiple claim from a\nscientific article using text mining models such as text clustering,\nassociation rule extraction, K-means Algorithm, information visualization, word\ncloud, subsequently ML approaches such as least-square support vector machines.\n<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">How can you make the most of semantic extraction? Let us help you get started.<\/p>\n\n\n\n<h3 class=\"h3color\">Guires offers innovative solutions:<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Guires Text mining and Machine\nlearning approach helps you build and deploy text mining to improve business\nprocess <\/li><li>Guires deploy semantic data\nmodeling as a layer to your knowledge-centric architecture by integrating your\nenterprise data virtually. <\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Get Semantic Extraction working\nfor you. Contact Guires expert.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Challenges Semantic extraction is a natural language processing technique which identifies and extract&nbsp;entities(for example, people, information, locations, companies from news articles, etc.),&nbsp;facts,&nbsp;attributes,&nbsp;concepts, and&nbsp;events&nbsp;to populate meta-data fields. The purpose of this is to enable the analysis of enterprise unstructured content, such as text documents, emails, images, reports, and&nbsp;other&nbsp;business-critical content.&nbsp; Modern Scientific \/ scholarly content services [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":715,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[],"class_list":["post-589","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-use-cases"],"_links":{"self":[{"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/posts\/589","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/comments?post=589"}],"version-history":[{"count":2,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/posts\/589\/revisions"}],"predecessor-version":[{"id":716,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/posts\/589\/revisions\/716"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/media\/715"}],"wp:attachment":[{"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/media?parent=589"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/categories?post=589"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/guires.uk\/newsroom\/wp-json\/wp\/v2\/tags?post=589"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}