Patient Specific Mortality Prediction in the Icu by Rf Classifier Based Fr: Essay Fountain

Download PDF

Mortality prediction of intensive care unit (ICU) patients facilitates hospital benchmarking and has the opportunity to provide caregivers with useful summaries of patient health at the bedside. The development of novel models for mortality prediction is a popular task in machine learning, with researchers typically seeking to maximize measures. We present a modified binary classification methods designed to address the problem of imbalance that is common in clinical datasets. Our methods exploit the class imbalance to achieve a unique transformation of the features such that the transformed features are well separated. We derive new combinations that further improve our methods’ classification accuracy.

We demonstrate the efficacy of our methods on MIMIC dataset, which was used in Computing in Cardiology Challenge 2012. An advantage of our methods is that they are based on semi or fully optimization of traditional learning algorithms, which are still gives better performance than any advanced nonlinear learning algorithms such as multi-kernel SVM and deep neural networks.

Introduction

Cardiovascular diseases (CVD) are the number one killer in the world. In low-income countries people have problem with non-existent or poor care. This is major reason deaths caused cardiovascular diseases in these countries. Electrocardiogram has been used to evaluate state of patient’s heart. Quality of measurement is fundamental requirement of applicability the record. Diagnose diseases of heart timely increase chances to recovery. The lack of specialists in many countries increase the need of easy and efficient measuring device, which can send measured data to specialist. We have developed scoring system in order to inform user about quality of measured ECG. This method can be used to quickly detect useless record and decrease number of worst quality records send to the specialist. Our approach reducereduces requirements of user experience with assessment of electrocardiogram. We have focused to inapplicable signals because we sup-pose lower cost for re-measuring than providing useless record specialist. Scoring system algorithm is divided into three steps:

  1. Separate signal into bins.
  2. Application of four rules to bins.
  3. Computation of mortality score.

Background

Intensive care units (ICUs) provides support to the most severely ill patients in a hospital, offering radical lifesaving treatments. Patients are monitored closely within the ICU to assist in the early detection and correction of deterioration before it becomes fatal, an approach has been demonstrated to improve outcomes. Quantifying patient health and predicting future outcomes is an important area of critical care research. One of the most immediately relevant outcomes to the ICU is patient mortality, leading many studies toward development of mortality prediction models. Typically, researchers seek to improve on previously published measures of performance such as sensitivity and specificity, but other goals may include improved model interpretability and novel feature extraction. Recent advances in both machine learning and hospital networking have facilitated better prediction models using more detailed granular data. Interpreting studies that report advances in mortality prediction performance, however, is often a challenge, because like-for-like comparison is prevented by the high degree of heterogeneity amongst studies. For example, approaches may differ in areas such as exclusion criteria, data cleaning, creation of training and test sets, and so on, making it unclear where performance improvements have been gained. In many areas of machine learning, datasets such as ImageNet have facilitated benchmarking and comparison between studies. Key to these datasets is that they are publicly available to researchers, allowing code and data to be shared together to create reproducible studies. Barriers to data sharing in healthcare have limited the accessibility of highly granular clinical data and largely prevented publication of reproducible studies, but with freely-available datasets such as the Medical Information Mart for Intensive Care (MIMIC-III) end-to-end reproducible studies are attainable]. The use of mortality prediction models to evaluate ICUs as a whole has found great success, both for identifying useful policies and comparing patient populations. In order to focus contributions to the state of the art in mortality prediction, however, it should be clear where performance is being gained and further gains might be achieved.

In this study, we review publications that have reported performance of mortality prediction models based on the Medical Information Mart for Intensive Care (MIMIC) database and attempt to reproduce their studies. We then compare the performance reported in the studies against gradient boosting and logistic regression models using features extracted from MIMIC. The goal of this exercise is twofold: the primary hypothesis is that textual description of patient selection criteria is insufficient to reproduce studies; the secondary hypothesis is that data extraction using domain knowledge remains an often overlooked but useful tool to improve model performance.

Data Mining Methods

Association Rule: AR is another important branch in DM techniques. Instead of seeking a satisfied classification result directly, the relationship between different attributes is an important goal. Regarding ICU treatment as a manufacturing process, it will be more predictable if each process is delicately.

Decision Tree: DT is a typical supervised learning approach with decisions determined by multistage. The tree structure starts from a condition or a pattern which was usually the most informative and based on the branches that conditions selected, constructing subtrees iteratively until the class of objects is determined at certain loaf node. By splitting tree node, the probability of certain class was improved.

Fuzzy Rule: A FR is defined as a conditional statement in the form: IFx is A Then y is B, where x and y are linguistic variables; A and B are the linguistic values determined by fuzzy sets on the universe of discourse X and Y, respectively. Fuzzy logic is used in clinical support systems since it is a powerful approach for approximate reasoning.

Adaboost: Adaboost which is short for adaptive boosting, perfomrs the classification by generating a group weak classifier initially and determining the results with voting strategies. During the construction of weak classifier, weights of samples are adjusted after each iteration and the increase of wieghts lead to further learnig for those misclassified samples.

Random Forest: RF is similar to adaboos except for two differences. Firstly it is the ensemble of DT and secondly the size of each samplign keeps identical to the number of samples. However, accompanied with the boosting on accuracy is the loss on the model interpretation to some degree.

Methods

ICU Database

The information consists of records out of 12000 ICU patients, lasted at least 48 hours in the ICU. Records were divided in three sets: A, B and C, each one consisting of 4000 records. Set A was used to develop the predictor whereas sets B and C were used for validation purpose. Up to 41 variables were recorded once, more than once or not at all, during the first 48 hours after admission to the ICU. These variables were divided into three groups: general descriptors, outcomes related descriptors and time series. General descriptors mainly were defined as age (AGE), gender (GEN), height (HEI), ICU type (ICU) and weight (WEI). These descriptors were collected when the patient was admitted into the ICU and they appear at the beginning of each record. General descriptors mainly were defined as age (AGE), gender (GEN), height (HEI), ICU type (ICU) and weight (WEI). These descriptors were collected when the patient was admitted into the ICU and they appear at the beginning of each record. Outcome related descriptors were defined as SAPS score, SOFA score, length of stay in hospital (LOS), number of days between admission and death (SUR) and in hospital death. The average (standard deviation) for age, uncorrected height, and uncorrected initial weights are 64. 5 years, 169. 5 centimeters, and 81. 2 kg; 43% were females, and 56. 1% males. The largest number of patients was admitted to the medical ICU (35. 8%), followed by the surgical (28. 4%), cardiac surgery recovery (21. 1%), and coronary (21. 1%) ICUs. These descriptors were available only for training set A.

Scoring CriteriaDue to its unambiguous definition and use in previous similar studies, we used in-hospital death as the outcome variable to be predicted in the challenge. We defined the scoring criteria as:Algorithms required to classify each case as a survivor (at least until discharge from the hospital) or as a non-survivor. The final event score earned by each algorithm was dependent on the counts of true positive (TP), false negatives (FN), and false positives (FP) when tested on set C. We defined sensitivity and positive predictivity as usual: The score defined as the smaller of these measures: This criterion was chosen as a reasonable tradeoff between accuracy of discrimination and prognostic value.

The data was first converted from times-tamped measurements into features usable in a supervised classification setting. The overall development process involved: Preprocessing, Classification, Extracting Features Decision, training and validating.

Preprocessing

This preprocessing method focused on removing outliers by using thresholds and domain knowledge. When this preprocessing method detected an outlier, its value was set to missing, and later be replaced by mean imputed values. Domain knowledge pre-processing involved first correcting human transcription errors (such as recording temperature in degrees Fahrenheit rather than Celsius), then re-moving values which were unphysiologically (by applying upper and lower bounds). For features where limits were not obvious (e. g. heavy tailed distributions (urine out-put)), no thresholding was applied.

Classification

Method-1: Based on limitations in existing studies, it is observed that if some data properties such as uncertain sampling and high dimension is further exploited, it can play an important role in prediction. Therefore, we customized classification part of our method based on following requirements:

  • Preserve inherent knowledge of while performing preprocessing with minimum subjective interventions.
  • Produce multiple view of patient prediction independently and design ensemble strategies delicately.
  • Promote as much interpretability of prediction results as possible while remaining accuracy measurements for both classes.

Method-2: We modified our ASEL-1 algorithm by replacing the manually inherent selection-based classification part with a partially adaboost and random forest classifier. ASEL-2 achieves a classification accuracy of 90%. The mean values of the remaining 10 continuous–valued features, when change their ranks by training the classifier incrementally, do not improve the accuracy obtained by the first 10 chosen features. Moreover, addition of any of these features individually to the chosen 10 features also does not improve the accuracy. We do not observe any increase in accuracy over that achieved by our three chosen 10 features sets.

These 10 features setsset with the highest magnitude coefficients for models using length 1, 2, and 3 patterns respectively. Variables are listed in decreasing order of magnitude. Their magnitude coefficient are dynamic values.

Results

The ability of the proposed methods (ASEL-1, ASEL-2) to predict mortality of ICU patients based on information collected in the first 48 hours was evaluated by comparing other researchers’ methods results on the same dataset.

Discussion and future work

Given that the data sets were created from a diverse population with a wide variety of life-threatening conditions, with frequent missing and occasionally incorrectly recorded observations, idiosyncrasies of care administration, and highly unbalanced class sizes, which make mortality detection challenging and difficult. Moreover, certain physiological measurements, such as systolic blood pressure, can be more reflective of medical interventions than the genuine state of the patient per sensitivity.

We took a simplistic AR mixed with FR learning approach and achieved reasonable performance in Set-A, then modified it by adding a partially adaboost with random forest classifier and increase our previous method performance. But further work is needed to achieve a useful patient-specific predictive model. Currently our algorithms givesgive better results than challenge winner and some other mortality prediction algorithm used in hospital (without optimization).

Assessing patient stability through the application of the fuzzy rules routine would benefit from further exploration. Work is needed to identify whether this approach has any value prediction of patient outcomes. In its current state the fuzzy rules routine is of limited benefit and is likely to mischaracterize stability in records with limited time resolution, where there are too few data points to identify maxima and minima. Given that our collaboration is new, the progress made here is a good starting point and provides a basis for developing more successful predictive algorithms. Our focus is applying and optimizing our methods on larger datasets and collaborate with university hospitals to make our research to use in clinical approaches.

Order a unique copy of this paper
(550 words)

Approximate price: $22

Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Order your essay today and save 25% with the discount code: THANKYOUPlace Order
+