Introduction to the Logistic Regression Model Multiple Logistic Regression Interpretation of the Fitted Logistic Regression Model Model-Building Strategies and Methods for Logistic Regression Assessing the Fit of the Model Application of Logistic Regression with Different Sampling Models Logistic Regression for Matched Case-Control Studies Special Topics References Index.

ResearchGate Logo

Discover the world's research

  • 20+ million members
  • 135+ million publications
  • 700k+ research projects

Join for free

... This is similar to the second layer in stacking approaches, where the meta-classifier is trained to ideally combine the model predictions to get the final predictions [10]. Different classifier types were studied as a metaclassifier, and we found the logistic regression [31] outperformed the other classifiers in most cases. That is because there is almost a linear relationship between the probabilistic outputs in the new training set. ...

... Therefore, we train an ensemble of linear and non-linear models as base classifiers for stacking inside the clusters: SVM, RF, Artificial Neural Network (ANN), eXtreme Gradient Boosting (xGBoost), logistic regression, and K-Nearest Neighbors (KNN). For a detailed description of these algorithms we refer to [26], [27], [31]- [34], respectively. During the training we take into consideration tuning the hyper-parameter for these classifiers. ...

... Logistic Regression is a model in statistical science, which in its basic form uses a function to construct a binary dependent variable classifier, and the independent variables are a set of (influential) factors [21]. From a mathematical point of view, a binary model has a dependent variable with two possible values, e.g., forward/backward, which can be labeled as "0" and "1", where the corresponding probability values of labels are in [0, 1]. ...

Background Ubiquitylation is an important post-translational modification of proteins that not only plays a central role in cellular coding, but is also closely associated with the development of a variety of diseases. The specific selection of substrate by ligase E3 is the key in ubiquitylation. As various high-throughput analytical techniques continue to be applied to the study of ubiquitylation, a large amount of ubiquitylation site data, and records of E3-substrate interactions continue to be generated. Biomedical literature is an important vehicle for information on E3-substrate interactions in ubiquitylation and related new discoveries, as well as an important channel for researchers to obtain such up to date data. The continuous explosion of ubiquitylation related literature poses a great challenge to researchers in acquiring and analyzing the information. Therefore, automatic annotation of these E3-substrate interaction sentences from the available literature is urgently needed. Results In this research, we proposed a model based on representation and attention mechanism based deep learning methods, to automatic annotate E3-substrate interaction sentences in biomedical literature. Focusing on the sentences with E3 protein inside, we applied several natural language processing methods and a Long Short-Term Memory (LSTM)-based deep learning classifier to train the model. Experimental results had proved the effectiveness of our proposed model. And also, the proposed attention mechanism deep learning method outperforms other statistical machine learning methods. We also created a manual corpus of E3-substrate interaction sentences, in which the E3 proteins and substrate proteins are also labeled, in order to construct our model. The corpus and model proposed by our research are definitely able to be very useful and valuable resource for advancement of ubiquitylation-related research. Conclusion Having the entire manual corpus of E3-substrate interaction sentences readily available in electronic form will greatly facilitate subsequent text mining and machine learning analyses. Automatic annotating ubiquitylation sentences stating E3 ligase-substrate interaction is significantly benefited from semantic representation and deep learning. The model enables rapid information accessing and can assist in further screening of key ubiquitylation ligase substrates for in-depth studies.

... Logistic regression is more suitable when the dependent variable is dichotomous or binary (Cucchiara, 1992). Unlike the traditional OLS regression, which relies on the values of predictor variables along with weights generated by the model for a proper estimation for the value of the outcome variable, logistic regression model identifies the likelihood of a particular result or conclusion for each alternative of 0 or 1 values (Çokluk, 2010 where i and t denote the firm and year indexes respectively; CSRA denotes corporate social responsibility assurance; BSIZE represents the board size; BIND represents the board independence; CHIND represents the chairman's independence; BMET represents the frequency of board meetings; BFEM represents the percentage of female members on the board; EXFEM denotes the percentage of female directors at the executive/top management level; CEOINT denotes the CEOs' global working experience; BFIN is the percentage of board members with financial expertise; TNUR represents the average tenure of board members in years; SIZE is the number of employees at the yearend; LEV represents the level of debts as a proxy for financial risk; ROA represents the firm's profitability; SUBS is the number of subsidiaries of a firm, as a proxy of the degree of complexity; ESG represents the score of CSR performance; SECT denotes the effect of industry classification and ε it shows the error term for the model. ...

Purpose – This study investigates the relationship between the attributes of corporate boards in the UK companies and their tendency to assure their corporate social responsibility (CSR) reports. Design/Methodology/Approach – From the agency theory perspective, we examine the impact of board attributes on the assurance of CSR reports for the FTSE 350 during 2016 – 2019. We used annual integrated reports, companies' websites, and Thomson Reuters Eikon Database for data collection and the logistic regression for data analysis. Findings – The results confirm that some board attributes significantly influence a company's decision to assure its CSR reports. While board size, board tenure, the presence of female board members and female executive directors, and CEOs' global working experience positively contribute to CSR assurance (CSRA) decisions, the chairman's independence negatively contributes to it. However, board independence, board meetings, and board financial expertise demonstrate no effect on the CSRA decision. Research limitations/implications – We focus on some attributes of board members, but we did not consider board diversity in its broader meaning. Moreover, the effect of board committees and their attributes on CSRA was not addressed. We also did not consider the impact of scope, quality level of assurance service and the differences between assurance providers on companies' decisions to neither undertake CSRA nor choose between assurance providers. Practical implications – Our study provides insights into the increasing demand on voluntary assurance to boost the credibility of CSR reports and the role of the board of directors (BOD) in taking this initiative. The findings highlight the importance of board diversity (e.g., gender) in improving transparency and sustainability reporting, which can help policymakers and regulators in shaping future governance policies. Additionally, the findings refer to a drawback in the UK Corporate Governance (CG) code regarding the chairman's independence, which requires corrective actions from the Financial Reporting Council (FRC). The findings raise concern over the small share of audit firms in the assurance service market, despite the growing demand for these services in the UK, which may require more attention to these services from the audit firms. Social implications – Companies are increasingly pressurized, especially after the COVID-19 pandemic, to discharge their accountability to stakeholders and to act in a socially responsible manner in their business activities. CSR reporting is one of the main tools that companies use to communicate their social activities. Understanding the determinants of voluntary CSRA helps to increase the credibility of CSR reports and the favorable response to social pressure. Originality/value – We add empirical evidence to the limited literature on CSRA about the role of the BOD in undertaking companies' social responsibility, improving CSR reporting, and reducing information asymmetry. It also highlights the significance of maintaining a balanced BOD in terms of gender, experience, and tenure, in minimizing the risk of perpetuating non-transparent integrated reporting. Keywords- Corporate Social Responsibility Assurance, Board Attributes, Board Diversity, UK, FTSE 350. Paper Type- Research paper.

... Bei der logistischen Regression wird die abhängige Variable als binär betrachtet. Logistische Regression ist weit verbreitet und findet beispielsweise auch innerhalb neuronaler Netze Anwendung(Cucchiara 2012; Karlaftis und Vlahogianni 2011). Die beiden vorgestellten Beispiele -lineare und logistische Regressionsmodelle -werden den statistischen bzw. ...

Allein für Deutschland wird erwartet, dass mit Dienstleistungen und Produkten, die auf dem Einsatz von Künstlicher Intelligenz (KI) basieren, im Jahr 2025 Umsätze in Höhe von 488 Milliar­ den Euro generiert werden – damit würde ein Anteil von 13 Prozent am Bruttoinlandsprodukt erreicht. Dabei ist die Erklärbarkeit von Entscheidungen, die durch KI getroffen werden, in wichtigen Anwendungsbranchen eine Voraussetzung für die Akzeptanz bei den Nutzenden, für Zulassungs­ und Zertifizierungsverfahren oder das Einhalten der durch die DSGVO geforderten Transparenzpflichten. Die Erklärbarkeit von KI­Produkten gehört damit, zumindest im europäischen Kontext, zu den wichtigen Markterfolgsfaktoren. Die vorliegende Studie wurde durch die Begleitforschung zum Innovationswettbewerb „Künstliche Intelligenz als Treiber für volkswirtschaftlich relevante Ökosysteme" (KI-Inno- vationswettbewerb) im Auftrag des Bundesministeriums für Wirtschaft und Energie er- stellt. Die Studie basiert auf den Ergebnissen einer Online-Umfrage sowie Tiefeninterviews mit KI-Expert:innen aus Wirtschaft und Wissenschaft. Die Studie fasst den aktuellen Stand der Technik und zum Einsatz von erklärbarer KI (Explainable Artificial Intelligence, XAI) zusammen und erläutert ihn anhand praxisnaher Use Cases.

... Logistic regression is the most commonly used statistical model for predicting binary classification problems [7]. The L1-regularized logistic regression has received extensive attention in machine learning [12], medicine [1,25] and other fields, due to the model with L1 regularization can avoid overfitting. ...

  • Dongxia Wang
  • Yongmei Lei
  • Jinyang Xie
  • Guozheng Wang

The distributed alternating direction method of multipliers (ADMM) is an effective algorithm for solving large-scale optimization problems. However, its high communication cost limits its scalability. An asynchronous lazy ADMM algorithm based on hierarchical sparse allreduce communication mode (HSAC-ALADMM) is proposed to reduce the communication cost of the distributed ADMM: firstly, this paper proposes a lazily aggregate parameters strategy to filter the transmission parameters of the distributed ADMM, which reduces the payload of the node per iteration. Secondly, a hierarchical sparse allreduce communication mode is tailored for sparse data to aggregate the filtered transmission parameters effectively. Finally, a Calculator-Communicator-Manager framework is designed to implement the proposed algorithm, which combines the asynchronous communication protocol and the allreduce communication mode effectively. It separates the calculation and communication by multithreading, thus improving the efficiency of system calculation and communication. Experimental results for the L1-regularized logistic regression problem with public datasets show that the HSAC-ALADMM algorithm is faster than existing asynchronous ADMM algorithms. Compared with existing sparse allreduce algorithms, the hierarchical sparse allreduce algorithm proposed in this paper makes better use of the characteristics of sparse data to reduce system time in multi-core cluster.

... The keywords for each sentence are respectively via text rank [39] and TF-IDF [40]. In the next step, the commonly used classification models are selected based on the features to classify the texts (mainly divided into 3 categories, positive, negative and neutral): k-Nearest Neighbor (KNN) [41] [42], logistic regression (LR) [43], support vector machine (SVM) [44] and gradient boosting decision tree (GBDT) [45]. ...

Purpose Previous research has found hemispheric asymmetries in the utilization of proprioceptive information. It is undetermined, however, if there is any change in asymmetry in proprioceptive function when external stimulation, such as vibration, is presented. The present study was to investigate the immediate effects of vibration stimulation (VS) on bilateral ankle proprioception. Materials and methods Forty-six recreational male basketball players were included. Proprioception was assessed by using the active movement extent discrimination apparatus (AMEDA) in standing, and vibration was provided by using a vibrating form roller on the peroneal or gastrocnemius muscles. Results When participants were divided into high score and low score groups, according to the median of the baseline proprioceptive performance, VS (irrespective of whether vibrating the peroneal or gastrocnemius muscles) significantly improved left non-dominant ankle proprioception in the low proprioceptive performer group (p = 0.019), while significantly deteriorated right dominant ankle proprioception in the high proprioceptive performer group (p = 0.011). Conclusions The results found that external stimuli reversely affect proprioception in better and worse performing groups. This suggests that there are differences in the processing of external stimulus signals on different bilateral hemispheres and in different groups (high score vs low score groups), which may be related to hemispheric asymmetry and stochastic resonance. Therefore, it is necessary to explore more specific interventions in the future.

  • Shunxin Guo
  • Hong Zhao Hong Zhao

Hierarchical classification is a research hotspot in machine learning due to the widespread existence of data with hierarchical class structures. Existing hierarchical classification methods based on granular computing can effectively reduce the computational complexity by considering the granularity of classes. However, their predictive accuracy is affected by inter-level error propagation within the hierarchy. In this paper, we propose a hierarchical classification method with multi-path selection based on coarse- and fine-grained class relationships, which mitigates the inter-level error propagation problem. Firstly, we use a top-down recursive method to calculate the probabilities of the hierarchical classes by logistic regression classification. Secondly, the current class probability is calculated by combining the parent and current classes probabilities. We select multiple possible fine-grained classes at the current level according to their sibling relationships. Compared with existing methods, the proposed method reduces the possibility of misclassification from the upper layer. Finally, the multi-path prediction result is provided to a classical classifier for final prediction. Our hierarchical classification method is evaluated on six benchmark datasets to demonstrate that it provides better classification performance than existing state-of-the-art hierarchical methods.

  • Liang Yu
  • Dandan Zhou
  • Lin Gao
  • Yunhong Zha

Predicting the response of each individual patient to a drug is a key issue assailing personalized medicine. Our study predicted drug response based on the fusion of multiomics data with low-dimensional feature vector representation on a multilayer network model. We named this new method DREMO (Drug Response prEdiction based on MultiOmics data fusion). DREMO fuses similarities between cell lines and similarities between drugs, thereby improving the ability to predict the response of cancer cell lines to therapeutic agents. First, a multilayer similarity network related to cell lines and drugs was constructed based on gene expression profiles, somatic mutation, copy number variation (CNV), drug chemical structures, and drug targets. Next, low-dimensional feature vector representation was used to fuse the biological information in the multilayer network. Then, a machine learning model was applied to predict new drug-cell line associations. Finally, our results were validated using the well-established GDSC/CCLE databases, literature, and the functional pathway database. Furthermore, a comparison was made between DREMO and other methods. Results of the comparison showed that DREMO improves predictive capabilities significantly.

Although the least-squares regression (LSR) has achieved great success in regression tasks, its discriminating ability is limited since the margins between classes are not specially preserved. To mitigate this issue, dragging techniques have been introduced to remodel the regression targets of LSR. Such variants have gained certain performance improvement, but their generalization ability is still unsatisfactory when handling real data. This is because structure-related information, which is typically contained in the data, is not exploited. To overcome this shortcoming, in this article, we construct a multioutput regression model by exploiting the intraclass correlations and input-output relationships via a structure matrix. We also discriminatively enlarge the regression margins by embedding a metric that is guided automatically by the training data. To better handle such structured data with ordinal labels, we encode the model output as cumulative attributes and, hence, obtain our proposed model, termed structure-exploiting discriminative ordinal multioutput regression (SEDOMOR). In addition, to further enhance its distinguishing ability, we extend the SEDOMOR to its nonlinear counterparts with kernel functions and deep architectures. We also derive the corresponding optimization algorithms for solving these models and prove their convergence. Finally, extensive experiments have testified the effectiveness and superiority of the proposed methods.

  • Paul D Sampson Paul D Sampson
  • Peter Guttorp

Estimation of the covariance structure of spatial processes is a fundamental prerequisite for problems of spatial interpolation and the design of monitoring networks. We introduce a nonparametric approach to global estimation of the spatial covariance structure of a random function Z(x, t) observed repeatedly at times ti (i = 1, …, T) at a finite number of sampling stations xi (i = 1, 2, …, N) in the plane. Our analyses assume temporal stationarity but do not assume spatial stationarity (or isotropy). We analyze the spatial dispersions var(Z(xi, t) − Z(xj, t)) as a natural metric for the spatial covariance structure and model these as a general smooth function of the geographic coordinates of station pairs (xi, xj). The model is constructed in two steps. First, using nonmetric multidimensional scaling (MDS) we compute a two-dimensional representation of the sampling stations for which a monotone function of interpoint distances δij approximates the spatial dispersions. MDS transforms the problem into one for which the covariance structure, expressed in terms of spatial dispersions, is stationary and isotropic. Second, we compute thin-plate splines to provide smooth mappings of the geographic representation of the sampling stations into their MDS representation. The composition of this mapping f and a monotone function g derived from MDS yields a nonparametric estimator of var(Z(xa, t) − Z(xb, t)) for any two geographic locations xa and xb (monitored or not) of the form g(|f(xa) − f(xb)|). By restricting the monotone function g to a class of conditionally nonpositive definite variogram functions, we ensure that the resulting nonparametric model corresponds to a nonnegative definite covariance model. We use biorthogonal grids, introduced by Bookstein in the field of morphometrics, to depict the thin-plate spline mappings that embody the nature of the anisotropy and nonstationarity in the sample covariance matrix. An analysis of mesoscale variability in solar radiation monitored in southwestern British Columbia demonstrates this methodology.

  • George Seber George Seber

Introduction Multivariate Normal Distribution Wishart Distribution Hotelling's T2 Distribution Multivariate Beta Distributions Rao's Distribution Multivariate Skewness and Kurtosis

  • Thomas R. Fears
  • Charles C. Brown

There are a number of possible designs for case-control studies. The simplest uses two separate simple random samples, but an actual study may use more complex sampling procedures. Typically, stratification is used to control for the effects of one or more risk factors in which we are interested. It has been shown (Anderson, 1972, Biometrika 59, 19-35; Prentice and Pyke, 1979, Biometrika 66, 403-411) that the unconditional logistic regression estimators apply under stratified sampling, so long as the logistic model includes a term for each stratum. We consider the case-control problem with stratified samples and assume a logistic model that does not include terms for strata, i.e., for fixed covariates the (prospective) probability of disease does not depend on stratum. We assume knowledge of the proportion sampled in each stratum as well as the total number in the stratum. We use this knowledge to obtain the maximum likelihood estimators for all parameters in the logistic model including those for variables completely associated with strata. The approach may also be applied to obtain estimators under probability sampling.