Skip to main content

Metabolic profiles to predict long-term cancer and mortality: the use of latent class analysis



Metabolites are genetically and environmentally determined. Consequently, they can be used to characterize environmental exposures and reveal biochemical mechanisms that link exposure to disease. To explore disease susceptibility and improve population risk stratification, we aimed to identify metabolic profiles linked to carcinogenesis and mortality and their intrinsic associations by characterizing subgroups of individuals based on serum biomarker measurements. We included 13,615 participants from the Swedish Apolipoprotein MOrtality RISk Study who had measurements for 19 biomarkers representative of central metabolic pathways. Latent Class Analysis (LCA) was applied to characterise individuals based on their biomarker values (according to medical cut-offs), which were then examined as predictors of cancer and death using multivariable Cox proportional hazards models.


LCA identified four metabolic profiles within the population: (1) normal values for all markers (63% of population); (2) abnormal values for lipids (22%); (3) abnormal values for liver functioning (9%); (4) abnormal values for iron and inflammation metabolism (6%). All metabolic profiles (classes 2–4) increased risk of cancer and mortality, compared to class 1 (e.g. HR for overall death was 1.26 (95% CI: 1.16–1.37), 1.67 (95% CI: 1.47–1.90), and 1.21 (95% CI: 1.05–1.41) for class 2, 3, and 4, respectively).


We present an innovative approach to risk stratify a well-defined population based on LCA metabolic-defined subgroups for cancer and mortality. Our results indicate that standard of care baseline serum markers, when assembled into meaningful metabolic profiles, could help assess long term risk of disease and provide insight in disease susceptibility and etiology.


Cancer is a multi-pathway disease, assembled as a heterogeneous and hierarchically organized system, and still one of the major causes of death worldwide – with an increasing burden given the aging population [1,2,3]. Cancer data has grown exponentially in the last decade with new advanced technologies resulting in highly diverse, mixed data types and huge volumes of information (e.g.: 542045 is the number of publications retrieved in PubMed when searching the terms ‘cancer’ AND ‘data’ on August 2017). Due to the nature of this emerged “Big Cancer Data” and the demand for high-sensitive and high-specific biomarkers, there is a need for significant sample sizes and advanced mathematical and statistical models [4, 5] capable of extracting relevant clinical and biological information [6, 7]. These more systematic-based approaches, replacing single biomarker analyses by multiple profiling testing, may provide new avenues for biomarker development in cancer diagnosis and management [8, 9]. Recent studies have adopted these integrative approaches assessing multiple serum markers simultaneously for cancer diagnosis [10,11,12,13]. Furthermore, the concept of the exposome has been introduced into the field of cancer epidemiology [14]. It refers to every non-genetic exposure to which an individual is subjected from conception to death [14, 15] . Specifically, metabolites, part of the internal exposome, are both genetically and environmentally determined and can consequently be used to characterize environmental exposures and reveal biochemical mechanisms that link exposure to disease [15,16,17,18]. Hence, the internal distribution of metabolites and their interactions might help unravelling cancer susceptibility in a population.

With the overall goal of identifying statistical methods to stratify individuals based on their underlying risk of developing cancer and risk of increasing mortality, we conducted a data driven approach utilizing standard serum markers available from routine health check-ups to study susceptibility to cancer and death in a well-defined cohort of 13,615 participants from the AMORIS study (Apolipoprotein MOrtality RISk) [19, 20]. More specifically, the study was set out to explore population heterogeneity and cancer susceptibility by investigating serum metabolic profiles using latent class analysis (LCA). This data reduction method clusters covariates based on models of data distribution probabilities. It allows for evaluation of clusters of biomarkers linked to carcinogenesis and their intrinsic associations, which ultimately helps us assess their possible role in predicting long-term cancer and mortality.


Characteristics of the study population

A total of 1,956 individuals (14.37%) developed cancer after at least 3 years of follow-up, including 655 breast and genito-urinary cancers, 330 cases of digestive cancer, 133 cases of respiratory cancers and 129 lymphatic and hematopoietic cancers during a mean follow-up time for cancer of 16.6 years, median follow-up time in the cohort of 17.22 years with a minimum of 3.01 years and a maximum of 24.77. Three thousand one hundred fifty-eight participants (23.20%) died during a mean follow-up of 17.3 years, comprising 706 cancer-specific deaths. Study population characteristics by cancer status is illustrated in Table 1.

Table 1 Characteristics of the study population by cancer status defined at the end of the follow up period. All the serum markers are dichotomized using standard clinical cut-offs

Latent class analysis characterizes the study population into four metabolic profiles

LCA was executed using the dichotomized values of the biomarkers to facilitate the biological interpretation of the results. The Chi-squared distribution criterion for model selection indicated a best fit model comprehend of four LCA classes, while AIC and BIC stabilized at 4 classes (Fig. 1a, b) [21]. All the criterions did not converge to a local maximum from class 12 onwards. The class allocation of the observations (individuals), the class conditional probability of each biomarker and the latent mixing proportions were obtained when running poLCA package in R statistical language.

Fig. 1
figure 1

a Line-graph depicting the goodness of fit indicators AIC and BIC. The model that best fits the dataset comprehends of four latent classes as determined by the minimum value reached by AIC and BIC criterions before stabilization of the values. The criterion did not converge to a local maximum from class 12 onwards. b Line-graph depicting the goodness of fit indicators (X^2 (1) (Chi-square). The model that best fits the dataset comprehends of four latent classes as determined by the minimum value reached by Chi-square. The criterions did not converge to a local maximum from class 12 onwards

Table 2 and Fig. 2 outline the LCA-derived classes with the estimated class population proportions, the class conditional probabilities of belonging to each latent class for each of the biomarkers and the biological interpretation of the LCA-derived classes. The four mutually exclusive classes characterize the population in metabolic profiles based on class conditional probabilities: (1) those with probabilities for all abnormal values of the markers under 0.3; therefore, considered the normal class (63% of population); (2) those with abnormal values for lipid markers (22%); (3) those with abnormal values for liver function markers (9%); (4) those with abnormal values for iron and inflammation metabolism (6%).

Table 2 Predicted class memberships of the clinically abnormal biomarkers cut-off values for the 4 latent classes model. Estimated class population shares for the four different LCA classes
Fig. 2
figure 2

Class Membership Probabilities for abnormal clinical values of the serum markers for the four LCA – derived metabolic classes. The four different biomarker profiles are represented in the graph

A validation of the characterization of the population performed with the Latent class methodology is outlined in Additional file 1: Table S3. The baseline clinical characteristics of the individuals by LCA-derived metabolic classes (Additional file 1: Table S3) replicate the results displayed in Table 2 for the class conditional probabilities.

LCA derived metabolic profiles as cancer and mortality predictors

We then investigated the prediction capabilities of the four LCA-derived metabolic profiles to estimate overall cancer risk, specific cancer types risk, cancer mortality and overall mortality, assigning the reference level to the healthy metabolic profile Class 1 (Tables 3 and 4).

Table 3 Hazard ratios and 95% confidence interval for the association of LCA-derived metabolic classes and overall cancer risk and cancer specific risk
Table 4 Hazard ratios and 95% confidence interval for the association of LCA- derived metabolic classes and all causes death and Cancer death

All metabolic profiles increased risk of cancer and mortality compared to Class 1. For instance, individuals in Class 3 (abnormal liver function profile) had a higher risk of overall cancer (HR: 1.28 (95% CI: 1.10–1.50)), but also a worse cancer-specific survival and overall survival as compared to those in Class 1 (Tables 3 and 4). Class 2 (abnormal lipid profile) and Class 4 (abnormal iron markers and inflammatory) were positively associated with overall death, while Class 2 was also associated with cancer–specific death. The results were consistent for both time-scales (Tables 3 and 4).

When assessing the risk of specific cancer types, several patterns occurred (Tables 3 and 4). Individuals in Class 2 (abnormal lipid markers) presented a higher risk of lymphatic and hematopoietic tissue cancer (HR: 1.72 (95% CI: 1.15–2.56)). There was a greater risk of digestive cancers in individuals in Class 3 (abnormal values of liver enzymes) (HR: 2.12 (95% CI: 1.54–2.91)), while individuals in Class 4 (abnormal iron markers and inflammation) were exposed to a higher risk of buccal and oral system cancers in comparison with the individuals in Class 1 (HR: 3.94 (95% CI 1.38–11.30)) (Table 3).

Moreover, the connective tissue and endocrine glands cancer risk was higher in individuals grouped in liver metabolic profile (HR: 2.65 (95% CI: 1.00–7.02) and in participants belonging to the iron markers and inflammation (HR: 3.00 (95% CI: 1.11–8.11)). Similar associations were observed when using the age scale for the multivariable cox proportional hazard regression model (Tables 3 and 4).


We demonstrated that standard of care baseline serum markers when assembled into meaningful metabolic profiles can help stratify the population for cancer risk, cancer mortality and overall mortality. More specifically, we observed that abnormal values for markers of the lipid metabolism, liver function and inflammatory and iron metabolism distinguish participants into metabolic profiles, which are predictive of long term cancer risk and/or mortality.

Metabolic profiles

Among the biological pathways addressed in our LCA, abnormalities in the lipid metabolism were the most common. Hyperlipidemia was present in about a quarter of the study population explaining the largest abnormal metabolic profile. The weight of the lipid profile in the analysis was consistent with the reported global prevalence of hypercholesterolemia among adults (37% for males and 40% for females) as reported in the Global Health Observatory in 2008 estimates by the World Health Organization (WHO) and the results from the Swedish population in the WHO MONICA project [22]. Dyslipidemias are associated with higher risk of CVD and other chronic diseases such as cancer, as also observed in our study [23]. Liver dysfunction, iron deficiency and altered inflammatory markers profiles also distinguished important subgroups in our study population. About 9% of our population had abnormal values for markers of liver functioning (GGT, AST and ALT), which is similar to the results obtained in a population-based survey in the United States that estimated abnormal alanine aminotransferase (ALT) was present in 9% of respondents in absence of viral hepatitis C or excessive alcohol consumption [24]. Moreover, these enzymes are known to be linked to cancer because of their role in preserving the intracellular homeostasis of the oxidative stress [25,26,27], which is concordant with the results of these analyses. The iron profile and inflammatory markers clustered 6% of individuals in the study, which was predominantly driven by low levels of serum iron and TIBC, as well as high levels of CRP and leukocytes. This could potentially point towards anemia of inflammation, a chronic inflammation presenting low iron values, that occurs because the iron deficiency provides the body with infection resistance, which demonstrates the tightly connection between the inflammatory response and the iron and its homeostasis [28]. This condition has been reported in more than 30% of cancer patients at time of diagnosis.

Metabolic profiles as a risk factor for long term cancer and mortality

The above-described three classes of abnormal metabolic profiles were all associated with an increased risk of cancer and worse survival, as compared to the healthy class. The findings therefore confirm the key importance of these metabolisms in the maintenance of the intracellular homeostasis and how their unbalance can be related with the etiology of cancer disease and mortality [2]. The LCA adapted in this study thus illustrates how a biomarker-wide approach can help assess markers of the blood exposome in the context of carcinogenesis and mortality [29] (Fig. 3).

Fig. 3
figure 3

Study statistical pipeline describing the methodology followed in the project. We explored the blood exposome using metabolic markers of the population to assess how population heterogeneity is associated with cancer risk and mortality

More specifically, individuals presenting abnormal liver function markers carried worse outcomes in terms of overall cancer risk and cancer death, and a positive association with digestive, connective and endocrine cancers diagnosis. Moreover, the participants with this profile had a higher probability of overall death. These results are consistent with previous published data. A positive association between elevated GGT and overall cancer risk, with no interaction of ALT, was found in the AMORIS cohort previously [30], and it was also reported in other large cohort studies [31, 32]. These studies also found strong associations with elevated levels of GGT and digestive and respiratory cancer incidence. Elevated GGT has been associated with mortality from all causes, liver disease, cancer and diabetes, while ALT only showed associations with liver disease death in a large US cohort [33]. However, in a study based on an elderly population it was found that GGT was associated with increased cardiovascular disease mortality, and ALP and AST with increased cancer-related mortality [34]. Moreover, a meta-analysis evaluating the associations between liver enzymes and all-cause mortality found positive independent associations of baseline levels of GGT and ALP with all-cause mortality [35]. In the present study, the liver biomarker profile was positive associated with all the outcomes studied, suggesting a key role of this pathway in the development of cancer, probably related with its active role maintaining the intracellular redox regulation. Further investigations are necessary to establish the potential of the altered liver enzyme profile as a tool for cancer risk stratification.

Individuals allocated to the lipid profile presented positive associations with cancer mortality, and overall mortality and higher risk of lymphatic and hematopoietic cancers. The link between hyperlipidemia and mortality has been studied broadly, with associations with established links for cancer and all-cause mortality [36,37,38]. The association between lipids and lymphatic and hematopoietic cancers is more controversial, as other studies found an inverse association for these cancers and high levels of serum cholesterol [39, 40]. However, a systematic literature review from 2016 found no association [41].

Participants clustered in the unbalanced iron profile and inflammation had an increased risk of endocrine, buccal and oral cancers and were observed to have a higher risk of all-causes death. Altered inflammation and iron metabolisms are key metabolic ‘hallmarks of cancer’ [2, 42, 43]. Our observation of an association with an increased risk of buccal and oral cancer corroborates previous findings in AMORIS [42].

Population heterogeneity and risk stratification: the need for data reduction techniques

The modulation effect of population heterogeneity on the association between potential risks factors and disease is a new avenue to understand the variability of risk in the population [44]. For instance, in a targeted metabolomics exercise Shan et al. performed a principal component analysis and time to event analysis identifying metabolic profiles to predict risk of CVD [13]. Another study used Monte Carlo Cross Validation and Lasso logistic regression to evaluate serum biomarkers as an alternative to fecal immunochemical testing to improve detection of colorectal cancer [11]. In 2010, the European Prospective Investigation on Cancer and Nutrition (EPIC) cohort reported that a specific prediagnostic plasma phospholipid fatty acid profile could predict the risk of gastric cancer [45]. As rationalized in the HELIX project, these multiple profiling approaches aim to identify groups of individuals in the population that share a similar exposome that might account for differences on the specific risk of study [46].Together with these studies, our systematic data integration approach based on LCA demonstrates the potential of investigating population heterogeneity using metabolic profiling as risk factors for long term cancer risk and mortality prediction. However, in order to establish the prediction capability of these LCA metabolic profiles and implement their use in a clinical setting, further studies to validate the results whilst allowing to measure sensitivity and specificity, will need to be conducted such as a nested case-control in AMORIS that could determine the predictive capabilities of the metabolic profiles to estimate cancer risk and mortality.

Strengths and limitations

The present study has been conducted in a large and well-defined population, applying a multi-faced approach covering main biological pathways to assess biomarker profiles that could indicate cancer risk, cancer survival and mortality. The major strength of these analyses lies in the innovative avenue to study population heterogeneity and susceptibility to disease and mortality in a large cohort of participants with multiple measurements, all measured on fresh blood samples on the same day at the same clinical laboratory. We included all the markers available in the cohort for a large population (n > 13000), however not every marker of the central metabolic pathways was available in the database (i.e. Complete Blood Count). Life-style factors established as cancer risk factors such as tobacco smoking, low physical activity, poor diet, alcohol intake, obesity and hypertension were partially available in AMORIS which limited their used in the study. To mitigate the lack of some of these external factors such as BMI, the analyses have been adjusted for Charlson Comorbidity Index which includes comorbidities such as obesity and hypertension. The lack of others life-style factors such as alcohol consumption was mitigated by using information on serum biomarkers such as gamma glutamyl transferase and other liver enzymes. All participants were selected by analyzing blood samples from health check-ups in non-hospitalized individuals from the greater Stockholm area ensuring good internal validity in the study. Future studies will benefit from a longitudinal approach with repeated serum markers measurements that will capture the population phenotypic variations in relation to disease over long periods of time and will help to improve our understanding of the biomarkers’ impact on carcinogenesis and mortality.


Our findings support the recently expressed need for a shift from the classical epidemiological approach of assessing one exposure to a systemic approach with multiple exposures. The LCA adapted in this study illustrates how a biomarker-wide approach can help assess population susceptibility to disease and provide insight into disease etiology in the context of carcinogenesis and mortality (Fig. 3). Given the environmental and genetic modulation of metabolic molecules, metabolic profiling based on standard of care serum markers could become a useful non-invasive predictive signature for risk stratification and an important area of research for mechanisms and clinical relevance.


Study design and study population

The AMORIS study, a large prospective cohort study, has been described in detail elsewhere [19, 47, 48]. Briefly, the AMORIS database is based on linkages with the Central Automation Laboratory (CALAB) database, which analyzed fresh blood samples from subjects from the greater Stockholm area. All individuals were either healthy individuals referred for clinical laboratory testing as part of a general health check-up or outpatients between 1985 and 1996. The AMORIS cohort has been linked to several Swedish national registries such as the National Cancer Register, the Patient Register, the Cause of Death Register, the consecutive Swedish Censuses during 1970–1990, and the National Register of Emigration, using the Swedish 10-digit personal identity number. These linkages provide detail information on demographics, lifestyle, socio-economic status, vital status, cancer diagnosis, comorbidities and emigration. The AMORIS study conformed to the declaration of Helsinki and was approved by the ethics board of the Karolinska Institute.

From the AMORIS cohort, we included all individuals aged 20 years or older with measurements for the following serum biomarkers (n = 13,615), which were all measured on the same day, using fully automated methods with automatic calibration performed on fresh blood samples, at the same laboratory (CALAB) of high quality according to international blinded testing [49] (Additional file 1: Table S1 and Table S2): total cholesterol (TC) (mmol/L), triglycerides (TG) (mmol/L), apolipoprotein A-1 (ApoA-I) (g/L), apolipoprotein B (ApoB) (g/L), high density lipoprotein (HDL) (mmol/L), low density lipoprotein (LDL) (mmol/L), glucose (mmol/L), fructosamine (FAMN) (mmol/L), gamma-glutamyl transferase (GGT) (IU/L), alanine aminotransferase (ALT) (IU/L), aspartate aminotransferase (AST) (IU/L), albumin (g/L), leukocytes (WBC) (109 cells/L), C-reactive protein (CRP) (mg/L), iron (FE) (μmol/L), total iron binding capacity (TIBC) (mg/dL), creatinine (μmol/L), phosphate (mmol/L) and calcium (mmol/L). All methods have previously been described [48].

These biomarkers were selected to reflect common metabolic pathways: lipid (TC, TG, ApoA-I, ApoB, HDL and LDL) and glucose metabolism (Glucose, FAMN), liver function (GGT, ALT and AST), inflammation (Albumin, WBC and CRP), iron metabolism (FE and TIBC), kidney function (Creatinine) and phosphate (Phosphate and Calcium). The blood metabolites included in the analysis were all the standard serum markers available from routine health check-ups. Most of the markers included have been previously studied individually in AMORIS, however no systemic integrative approach to examine the metabolic markers interactions and susceptibility to cancer has been conducted to date [30, 42, 50,51,52,53,54,55,56,57,58,59]. All participants were free from cancer at time of study entry and none were diagnosed with cancer within the first three years of follow-up to avoid reverse causation.

The main exposure variables for the analyses were the above-mentioned metabolic biomarkers, for which the values were categorized using standardized clinical cut-offs based on recognized medical criteria to facilitate interpretation of the results (Additional file 1: Table S2). The main outcomes were first cancer diagnosis, as registered in the National Cancer Register using ICD-9 for the years 1987–1992, ICD-O/2 for years 1993–2004 and for year 2005 onwards has been coded in ICD-O/3), and mortality. As secondary outcomes, we explored those cancer types for which there were more than 30 events during follow-up. Likewise, cancer mortality was explored. Follow-up time was assessed specifically for each of the outcomes studied. For cancer diagnosis, follow-up time was defined as time from blood drawn until date of first cancer diagnosis, death, emigration or study closing date (31st of December 2012), whichever occurred first. The follow-up time for death was described as time from blood drawn until date of death, emigration or study closing date (31st of December 2012), whichever occurred first.

Information on the following potential confounders was also incorporated: age, sex and comorbidities. The latter was quantified using the Charlson Comorbidity Index (CCI) calculated based on data from the National Patient Register. The CCI comprises 19 disease categories, all assigned a weight. The sum of an individual’s weights was used to create the CCI ranging from no comorbidity to severe comorbidity (0, 1, 2, and ≥ 3) [60].

Data analysis

First, we calculated Pearson correlation coefficients to measure the strength of association between the biomarkers included in the analysis. Pearson’s correlation analyses showed strong correlation between the different biomarkers in the lipid metabolism (TC, LDL and ApoB (r > 0.7); HDL and ApoA-I (r > 0.8)). We replaced the individual lipid biomarkers by the established ApoB/ApoA-I ratio and log (TG/HDL) ratio [20, 49, 61, 62] to avoid collinearity and to comply with the principle of local independence as required by latent class analysis [63]. Most of the markers were normally distributed except from the liver biomarkers.

Latent Class Analysis (LCA) [63, 64] is a model-based clustering method that reduces the dimension of the data by clustering covariates into latent classes, using a probabilistic model that describes the data distribution, and it assesses the probability that individuals belong to certain latent classes. LCA avoids the use of a linear combination or a random distance definition to reduce the number of covariates [65] and has recently been employed in health sciences [21, 66]. More specifically, we applied LCA to characterize different classes of individuals based on their metabolic profiles [67] and to evaluate intrinsic associations between the biomarkers, using the poLCA package [68] in R statistical programming language. We first determined the optimal number of LCA-derived classes by executing step-wise models with different numbers of classes, starting with the null model and adding one extra class in each model until reaching the total number of biomarkers in the data, while the model kept converging into a local maximum likelihood. The criterions used for model selection (Akaike information criterion (AIC), Bayesian information criterion (BIC) and Chi-squared distribution) were evaluated to estimate the best goodness of fit model and to define the optimal number of LCA-derived metabolic classes that characterized our dataset. To identify which sets of biomarkers predominantly explained each latent class, how the classes were distributed across the study population and which individuals were allocated to each class, we assessed the conditional probabilities, mixed proportions and class memberships of the best fitted latent class model.

Once each subject was assigned to its LCA-derived metabolic class, we conducted multivariable Cox proportional hazard regression to examine whether the LCA-derived metabolic classes were associated with long term risk of overall cancer as well as specific cancer types. In addition, we evaluated how the classes were associated with all cause-death and cancer-specific death. All models were adjusted for age, sex, and CCI. We performed a sensitivity analysis using age as a time-scale, as this is potentially a strong confounder. Moreover, Schoenfeld residuals were tested to ensure the proportional hazard assumption of the multivariable cox proportional hazard regression analysis.

Data management and statistical analyses were performed using Statistical Analysis Systems (SAS) release 4.3 (SAS Institute, Cary, NC) and R version 3.0.2 (R Foundation for Statistical Computing, Vienna, Austria).

Availability of data and materials

The authors can confirm that for ethical and legal reasons imposed there are restrictions to the allowance of general public access to the data underlying the findings of this study. The database is formed of not only the AMORIS cohort but is a merged database. This includes AMORIS plus information from the Swedish National Patient Registry, the National Cause of Death Registry, SWEDEHEART, the Work Lipids, Fibrinogen study, the Cohort of Swedish Men Study, the Swedish Mammography Cohort, the cohort of 60-year-old subjects in Stockholm, the Sollentuna Primary Prevention study and the National Prescribed Drug Register.

The merged database from these sources contain sensitive information and is therefore anonymized and located in a security server with restricted access at the institute of Environmental medicine, Karolinska Institutet in Stockholm.

Professor Maria Feychting ( ) and Sofia Carlsson ( ) are both members of the Steering Committee of the AMORIS cohort and are based on the Unit of Epidemiology, Institute of Environmental Medicine hosting the database. They would both be able to respond to external requests for data access given that the interested party can obtain approval from the data owners including the National Board of Health and Welfare in Sweden ( and Statistics Sweden ( as well as from the owners of the research registers at Karolinska Institutet, Stockholm. Sweden.

To ensure persistent and long-term database storage and availability, AMORIS cohort database is stored at the Institute of Environmental Medicine and the storage follows the principles kept at Karolinska Institutet. The database can be accessed after permission and considering the restrictions by remote access through a secure LAN solution.



Akaike information criterion


Alanine Aminotransferase


Apolipoprotein MOrtality RISk Study


Apolipoprotein A-1


Apolipoprotein B


Aspartate Aminotransferase


Bayesian information criterion


Central Automation Laboratory


Charlson Comorbidity Index


C-reactive protein






Gamma-glutamyl Transferase


High Density Lipoprotein


International Classification of Diseases 9th Revision


International Classification of Diseases for Oncology 2nd Revision


International Classification of Diseases for Oncology 3rd Revision


Latent Class analysis


Low Density Lipoprotein


Statistical Analysis Systems


Total Cholesterol




Total Iron Binding Capacity




World Health Organization


  1. Global Burden of Disease Cancer C, Fitzmaurice C, Dicker D, Pain A, Hamavid H, Moradi-Lakeh M, et al. The global burden of cancer 2013. JAMA Oncol. 2015;1(4):505–27 PubMed PMID: 26181261. Pubmed Central PMCID: 4500822.

    Article  Google Scholar 

  2. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100(1):57–70 PubMed PMID: 10647931.

    Article  CAS  Google Scholar 

  3. Global Burden of Disease Cancer C, Fitzmaurice C, Allen C, Barber RM, Barregard L, Bhutta ZA, et al. Global, regional, and National Cancer Incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 32 cancer groups, 1990 to 2015: a systematic analysis for the global burden of disease study. JAMA Oncol. 2017;3(4):524–48 PubMed PMID: 27918777.

    Article  Google Scholar 

  4. Blair RH, Trichler DL, Gaille DP. Mathematical and statistical modeling in cancer systems biology. Front Physiol. 2012;3:227 PubMed PMID: 22754537. Pubmed Central PMCID: 3385354.

    Article  Google Scholar 

  5. Dupont WD, Blume JD, Smith JR. BUilding and validating complex models of breast cancer risk. JAMA Oncol. 2016;2(10):1271–72.

  6. Poste G. Bring on the biomarkers. Nature. 2011;469(7329):156–7 PubMed PMID: 21228852. Epub 2011/01/14. eng.

    Article  CAS  Google Scholar 

  7. Zhang Y. News & views: bring on the biomarkers—It's time for the “big science” approach. Clin Chem. 2011;57(6):928–9.

    Article  CAS  Google Scholar 

  8. Brooks JD. Translational genomics: the challenge of developing cancer biomarkers. Genome Res. 2012;22(2):183–7 PubMed PMID: 22301132. Pubmed Central PMCID: 3266026.

    Article  CAS  Google Scholar 

  9. Beltran H, Rubin MA. New strategies in prostate cancer: translating genomics into the clinic. Clin Cancer Res. 2013;19(3):517–23 PubMed PMID: 23248095. Epub 2012/12/19. eng.

    Article  CAS  Google Scholar 

  10. Bünger S, Haug U, Kelly M, Posorski N, Klempt-Giessing K, Cartwright A, et al. A novel multiplex-protein array for serum diagnostics of colon cancer: a case–control study. BMC Cancer. 2012;12:393 PubMed PMID: PMC3502594.

    Article  Google Scholar 

  11. Wild N, Andres H, Rollinger W, Krause F, Dilba P, Tacke M, et al. A combination of serum markers for the early detection of colorectal cancer. Clin Cancer Res. 2010;16(24):6111–21 PubMed PMID: 20798228. Epub 2010/08/28. eng.

    Article  CAS  Google Scholar 

  12. Noto D, Cefalu AB, Barbagallo CM, Ganci A, Cavera G, Fayer F, et al. Baseline metabolic disturbances and the twenty-five years risk of incident cancer in a Mediterranean population. Nutr Metab Cardiovasc Dis. 2016;12 PubMed PMID: 27511705. Epub 2016/08/12. Eng.

  13. Shah SH, Sun JL, Stevens RD, Bain JR, Muehlbauer MJ, Pieper KS, et al. Baseline metabolomic profiles predict cardiovascular events in patients at risk for coronary artery disease. Am Heart J. 2012;163(5):844–50 e1. PubMed PMID: 22607863. Epub 2012/05/23. Eng.

    Article  CAS  Google Scholar 

  14. Wild CP. The exposome: from concept to utility. Int J Epidemiol. 2012;41(1):24–32 PubMed PMID: 22296988. Epub 2012/02/03. Eng.

    Article  Google Scholar 

  15. Wild CP, Scalbert A, Herceg Z. Measuring the exposome: a powerful basis for evaluating environmental exposures and cancer risk. Environ Mol Mutagen. 2013;54(7):480–99 PubMed PMID: 23681765. Epub 2013/05/18. eng.

    Article  CAS  Google Scholar 

  16. Nicholson G, Rantalainen M, Maher AD, Li JV, Malmodin D, Ahmadi KR, et al. Human metabolic profiles are stably controlled by genetic and environmental variation. Mol Syst Biol. 2011;7:525 PubMed PMID: 21878913. Pubmed Central PMCID: PMC3202796. Epub 2011/09/01. Eng.

    Article  Google Scholar 

  17. Cui Y, Balshaw DM, Kwok RK, Thompson CL, Collman GW, Birnbaum LS. The Exposome: embracing the complexity for discovery in environmental health. Environ Health Perspect. 2016;124(8):A137–40 PubMed PMID: 27479988. Pubmed Central PMCID: PMC4977033. Epub 2016/08/02. eng.

    Article  Google Scholar 

  18. Czene K, Lichtenstein P, Hemminki K. Environmental and heritable causes of cancer among 9.6 million individuals in the Swedish family-cancer database. Int J Cancer. 2002;99(2):260–6 PubMed PMID: 11979442. Epub 2002/04/30. Eng.

    Article  CAS  Google Scholar 

  19. Walldius G, Jungner I, Kolar W, Holme I, Steiner E. High cholesterol and triglyceride values in Swedish males and females: increased risk of fatal myocardial infarction. First report from the AMORIS (apolipoprotein related MOrtality RISk) study. Blood Press Suppl. 1992;4:35–42 PubMed PMID: 1345333. Epub 1992/01/01. eng.

    CAS  PubMed  Google Scholar 

  20. Walldius G, Jungner I, Holme I, Aastveit AH, Kolar W, Steiner E. High apolipoprotein B, low apolipoprotein A-I, and improvement in the prediction of fatal myocardial infarction (AMORIS study): a prospective study. Lancet. 2001;358(9298):2026–33 PubMed PMID: 11755609. Epub 2002/01/05. eng.

    Article  CAS  Google Scholar 

  21. Lacey RJ, Strauss VY, Rathod T, et al. Clustering of pain and its associations with health in people aged 50 years and older: cross-sectional results from the North Staffordshire Osteoarthritis Project. BMJ Open. 2015;5:e008389. doi:

  22. Tolonen H, Keil U, Ferrario M, Evans A, Project WM. Prevalence, awareness and treatment of hypercholesterolaemia in 32 populations: results from the WHO MONICA Project. Int J Epidemiol. 2005;34(1):181–92 PubMed PMID: 15333620.

    Article  Google Scholar 

  23. Reiner Ž, Catapano AL, De Backer G, Graham I, Taskinen M-R, Wiklund O, et al. ESC/EAS guidelines for the management of dyslipidaemias. The task force for the management of dyslipidaemias of the European Society of Cardiology (ESC) and the European atherosclerosis society (EAS). Eur Heart J. 2016;37(39):2999–3058.

  24. Ioannou GN, Boyko EJ, Lee SP. The prevalence and predictors of elevated serum aminotransferase activity in the United States in 1999-2002. Am J Gastroenterol. 2006;101(1):76–82 PubMed PMID: 16405537. Epub 2006/01/13. eng.

    Article  Google Scholar 

  25. Mason JE, Starke RD, Van Kirk JE. Gamma-glutamyl transferase: a novel cardiovascular risk biomarker. Prev Cardiol. 2010;13(1):36–41 PubMed PMID: 20021625. Epub 2009/12/22. eng.

    Article  CAS  Google Scholar 

  26. Teppala S, Shankar A, Li J, Wong TY, Ducatman A. Association between serum gamma-glutamyltransferase and chronic kidney disease among US adults. Kidney Blood Press Res. 2010;33(1):1–6 PubMed PMID: 20090360. Epub 2010/01/22. eng.

    Article  CAS  Google Scholar 

  27. Lim JS, Yang JH, Chun BY, Kam S, Jacobs DR Jr, Lee DH. Is serum gamma-glutamyltransferase inversely associated with serum antioxidants as a marker of oxidative stress? Free Radic Biol Med. 2004;37(7):1018–23 PubMed PMID: 15336318. Epub 2004/09/01. eng.

    Article  CAS  Google Scholar 

  28. Wessling-Resnick M. Iron homeostasis and the inflammatory response. Annu Rev Nutr. 2010;30:105–22 PubMed PMID: PMC3108097.

    Article  CAS  Google Scholar 

  29. Rappaport SM, Barupal DK, Wishart D, Vineis P, Scalbert A. The blood exposome and its role in discovering causes of disease. Environ Health Perspect. 2014;122(8):769–74 PubMed PMID: 24659601. Pubmed Central PMCID: 4123034.

    Article  Google Scholar 

  30. Van Hemelrijck M, Jassem W, Walldius G, Fentiman IS, Hammar N, Lambe M, et al. Gamma-glutamyltransferase and risk of cancer in a cohort of 545,460 persons - the Swedish AMORIS study. Eur J Cancer. 2011;47(13):2033–41 PubMed PMID: 21486691. Epub 2011/04/14. eng.

    Article  Google Scholar 

  31. Strasak AM, Rapp K, Brant LJ, Hilbe W, Gregory M, Oberaigner W, et al. Association of gamma-glutamyltransferase and risk of cancer incidence in men: a prospective study. Cancer Res. 2008;68(10):3970–7 PubMed PMID: 18483283. Epub 2008/05/17. eng.

    Article  CAS  Google Scholar 

  32. Strasak AM, Pfeiffer RM, Klenk J, Hilbe W, Oberaigner W, Gregory M, et al. Prospective study of the association of gamma-glutamyltransferase with cancer incidence in women. Int J Cancer. 2008;123(8):1902–6 PubMed PMID: 18688855. Epub 2008/08/09. eng.

    Article  CAS  Google Scholar 

  33. Ruhl CE, Everhart JE. Elevated serum alanine aminotransferase and gamma-glutamyltransferase and mortality in the United States population. Gastroenterology. 2009;136(2):477–85 e11. PubMed PMID: 19100265. Epub 2008/12/23. eng.

    Article  CAS  Google Scholar 

  34. Koehler EM, Sanna D, Hansen BE, van Rooij FJ, Heeringa J, Hofman A, et al. Serum liver enzymes are associated with all-cause mortality in an elderly population. Liver Int. 2014;34(2):296–304 PubMed PMID: 24219360. Epub 2013/11/14. eng.

    Article  CAS  Google Scholar 

  35. Kunutsor SK, Apekey TA, Seddoh D, Walley J. Liver enzymes and risk of all-cause mortality in general populations: a systematic review and meta-analysis. Int J Epidemiol. 2014;43(1):187–201.

    Article  Google Scholar 

  36. Rose G, Shipley MJ. Plasma lipids and mortality: a source of error. Lancet. 1980;1(8167):523–6 PubMed PMID: 6102243. Epub 1980/03/08. eng.

    Article  CAS  Google Scholar 

  37. Schupf N, Costa R, Luchsinger J, Tang MX, Lee JH, Mayeux R. Relationship between plasma lipids and all-cause mortality in nondemented elderly. J Am Geriatr Soc. 2005;53(2):219–26 PubMed PMID: 15673344. Epub 2005/01/28. eng.

    Article  Google Scholar 

  38. Akerblom JL, Costa R, Luchsinger JA, Manly JJ, Tang M-X, Lee JH, et al. Relation of plasma lipids to all-cause mortality in Caucasian, African-American and Hispanic elders. Age Ageing. 2008;37(2):207–13 PubMed PMID: PMC2715146.

    Article  Google Scholar 

  39. Neaton JD, Blackburn H, Jacobs D, et al. Serum cholesterol level and mortality findings for men screened in the multiple risk factor intervention trial. Arch Intern Med. 1992;152(7):1490–500.

    Article  CAS  Google Scholar 

  40. Kagan A, McGee DL, Yano K, Rhoads GG, Nomura A. Serum cholesterol and mortality in a Japanese-American population: the Honolulu heart program. Am J Epidemiol. 1981;114(1):11–20 PubMed PMID: 7246518. Epub 1981/07/01. eng.

    Article  CAS  Google Scholar 

  41. Radišauskas R, Kuzmickienė I, Milinavičienė E, Everatt R. Hypertension, serum lipids and cancer risk: a review of epidemiological evidence. Medicina (Mex). 2016;52(2):89–98.

    Article  Google Scholar 

  42. Gaur A, Collins H, Wulaningsih W, Holmberg L, Garmo H, Hammar N, et al. Iron metabolism and risk of cancer in the Swedish AMORIS study. Cancer Causes Control. 2013;24(7):1393–402 PubMed PMID: 23649231. Pubmed Central PMCID: PMC3675271. Epub 2013/05/08. eng.

    Article  Google Scholar 

  43. Beguin Y, Aapro M, Ludwig H, Mizzen L, Osterborg A. Epidemiological and nonclinical studies investigating effects of iron in carcinogenesis--a critical review. Crit Rev Oncol Hematol. 2014;89(1):1–15 PubMed PMID: 24275533.

    Article  Google Scholar 

  44. Wang M, Spiegelman D, Kuchiba A, Lochhead P, Kim S, Chan AT, et al. Statistical methods for studying disease subtype heterogeneity. Stat Med. 2016;35(5):782–800 PubMed PMID: 26619806. Pubmed Central PMCID: 4728021. Epub 2015/12/02. eng.

    Article  Google Scholar 

  45. Chajès V, Jenab M, Romieu I, Ferrari P, Dahm CC, Overvad K, et al. Plasma phospholipid fatty acid concentrations and risk of gastric adenocarcinomas in the European prospective investigation into cancer and nutrition (EPIC-EURGAST). Am J Clin Nutr. 2011;94(5):1304–13.

    Article  Google Scholar 

  46. Vrijheid M, Slama R, Robinson O, Chatzi L, Coen M, van den Hazel P, et al. The human early-life exposome (HELIX): project rationale and design. Environ Health Perspect. 2014;122(6):535–44 PubMed PMID: 24610234. Pubmed Central PMCID: PMC4048258. Epub 2014/03/13. eng.

    Article  Google Scholar 

  47. Van Hemelrijck M, Harari D, Garmo H, Hammar N, Walldius G, Lambe M, et al. Biomarker-based score to predict mortality in persons aged 50 years and older: a new approach in the Swedish AMORIS study. Int J Mol Epidemiol Genet. 2012;3(1):66–76 PubMed PMID: 22493753. Pubmed Central PMCID: 3316450. Epub 2012/04/12. eng.

    PubMed  PubMed Central  Google Scholar 

  48. Walldius G, Malmstrom H, Jungner I, de Faire U, Lambe M, Van Hemelrijck M, et al. The AMORIS cohort. Int J Epidemiol. 2017;02 PubMed PMID: 28158674. Epub 2017/02/06. eng.

  49. Jungner I, Marcovina SM, Walldius G, Holme I, Kolar W, Steiner E. Apolipoprotein B and A-I values in 147576 Swedish males and females, standardized according to the World Health Organization-International Federation of Clinical Chemistry First International Reference Materials. Clin Chem. 1998;44(8 Pt 1):1641–9 PubMed PMID: 9702950. Epub 1998/08/14. eng.

    CAS  PubMed  Google Scholar 

  50. Van Hemelrijck M, Walldius G, Jungner I, Hammar N, Garmo H, Binda E, et al. Low levels of apolipoprotein A-I and HDL are associated with risk of prostate cancer in the Swedish AMORIS study. Cancer Causes Contr. 2011;22(7):1011–9 PubMed PMID: 21562751. Epub 2011/05/13. eng.

    Article  Google Scholar 

  51. Van Hemelrijck M, Garmo H, Holmberg L, Walldius G, Jungner I, Hammar N, et al. Prostate cancer risk in the Swedish AMORIS study: the interplay among triglycerides, total cholesterol, and glucose. Cancer. 2011;117(10):2086–95 PubMed PMID: 21523720. Epub 2011/04/28. eng.

    Article  Google Scholar 

  52. Van Hemelrijck M, Holmberg L, Garmo H, Hammar N, Walldius G, Binda E, et al. Association between levels of C-reactive protein and leukocytes and cancer: three repeated measurements in the Swedish AMORIS study. Cancer Epidemiol Biomark Prev. 2011;20(3):428–37 PubMed PMID: 21297038. Pubmed Central PMCID: PMC3078551. Epub 2011/02/08. eng.

    Article  Google Scholar 

  53. Melvin JC, Seth D, Holmberg L, Garmo H, Hammar N, Jungner I, et al. Lipid profiles and risk of breast and ovarian cancer in the Swedish AMORIS study. Cancer Epidemiol Biomark Prev. 2012;21(8):1381–4 PubMed PMID: 22593241. Epub 2012/05/18. eng.

    Article  CAS  Google Scholar 

  54. Van Hemelrijck M, Garmo H, Hammar N, Jungner I, Walldius G, Lambe M, et al. The interplay between lipid profiles, glucose, BMI and risk of kidney cancer in the Swedish AMORIS study. Int J Cancer. 2012;130(9):2118–28 PubMed PMID: 21630265. Epub 2011/06/02. eng.

    Article  Google Scholar 

  55. Wulaningsih W, Garmo H, Holmberg L, Hammar N, Jungner I, Walldius G, et al. Serum lipids and the risk of gastrointestinal malignancies in the Swedish AMORIS study. J Cancer Epidemiol. 2012;2012:792034 PubMed PMID: 22969802. Pubmed Central PMCID: 3437288.

    Article  Google Scholar 

  56. Van Hemelrijck M, Hermans R, Michaelsson K, Melvin J, Garmo H, Hammar N, et al. Serum calcium and incident and fatal prostate cancer in the Swedish AMORIS study. Cancer Causes Contr. 2012;23(8):1349–58 PubMed PMID: 22710746. Epub 2012/06/20. eng.

    Article  Google Scholar 

  57. Wulaningsih W, Michaelsson K, Garmo H, Hammar N, Jungner I, Walldius G, et al. Inorganic phosphate and the risk of cancer in the Swedish AMORIS study. BMC Cancer. 2013;13:257 PubMed PMID: 23706176. Pubmed Central PMCID: PMC3664604. Epub 2013/05/28. eng.

    Article  CAS  Google Scholar 

  58. Wulaningsih W, Michaelsson K, Garmo H, Hammar N, Jungner I, Walldius G, et al. Serum calcium and risk of gastrointestinal cancer in the Swedish AMORIS study. BMC Public Health. 2013;13(1):663 PubMed PMID: 23866097. Pubmed Central PMCID: 3729677. Epub 2013/07/20. Eng.

    Article  CAS  Google Scholar 

  59. Wulaningsih W, Holmberg L, Garmo H, Zethelius B, Wigertz A, Carroll P, et al. Serum glucose and fructosamine in relation to risk of cancer. PLoS One. 2013;8(1):e54944 PubMed PMID: 23372798. Pubmed Central PMCID: PMC3556075. Epub 2013/02/02. eng.

    Article  CAS  Google Scholar 

  60. Quan H, Li B, Couris CM, Fushimi K, Graham P, Hider P, et al. Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol. 2011;173(6):676–82 PubMed PMID: 21330339.

    Article  Google Scholar 

  61. Kuyl JM, Mendelsohn D. Observed relationship between ratios HDL-cholesterol/total cholesterol and apolipoprotein A1/apolipoprotein B. Clin Biochem. 1992;25(5):313–6 PubMed PMID: 1490290. Epub 1992/10/01. eng.

    Article  CAS  Google Scholar 

  62. Dobiásová M, Frohlich J. The plasma parameter log (TG/HDL-C) as an atherogenic index: correlation with lipoprotein particle size and esterification rate inapob-lipoprotein-depleted plasma (FERHDL). Clin Biochem. 2001;34(7):583–8.

    Article  Google Scholar 

  63. Magidson J, Vermunt JK. “Latent class models”. The Sage handbook of quantitative methodology for the social sciences. 2004:175–98.

  64. Wood PK, Hagenaars JA, McCutcheon AL. Applied latent class analysis, Kluwer, Dordrecht, 2002, pp. 476. J Classif. 2008;25(1):143–5 English.

    Article  Google Scholar 

  65. Chadeau-Hyam M, Campanella G, Jombart T, Bottolo L, Portengen L, Vineis P, et al. Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers. Environ Mol Mutagen. 2013;54(7):542–57.

    Article  CAS  Google Scholar 

  66. Kongsted A, Nielsen AM. Latent class analysis in health research. J Phys. 2016;12 PubMed PMID: 27914733. Epub 2016/12/05. eng.

  67. Haughton D, Legrand P, Woolford S. Review of three latent class cluster analysis packages: latent gold, poLCA, and MCLUST. Am Stat. 2009;63(1).

  68. Lewis JB, Linzer DA. poLCA: An R Package for Polytomous Variable Latent Class Analysis. J Stat Softw. 2011;42.

Download references


The authors are grateful to all sample and data donors who participated in the AMORIS study.


This work was supported by King’s College London (Salaries for AS, HG, AG, MVH), Karolinska Institutet (IJ, GW, ML LH), Cancer Research UK grant (C45074/A26553) (PI:MVH) and the Gunner and Ingmar Jungner Foundation for Laboratory Medicine. ( who provide donations to fund the AMORIS database. 

Author information

Authors and Affiliations



AS designed the study, analysed the data and wrote the primary manuscript. AG was responsible in designing and conceptualising study, reviewing manuscript and supervising the project. MVH conceptualised and designed the study, over saw the study and was a major contributor in writing the manuscript. SG contributed to revising and reviewing manuscript and was responsible for submission. HG provided analysis, statistical input and reviewed the manuscript. ML, LH, IJ, GW, NH provided data acquisition, quality control of data, study design, data interpretation as well as reviewing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sundeep Ghuman.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the ethics board of Karolinska Institutet who waived the need for consent and conformed to the declaration of Helsinki.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:

Table S1. Laboratory fully automated methods with automatic calibration were performed at one accredited laboratory (CALAB to measure the serum biomarkers examine in the study. Table S2. Panel of serum markers describing standard medical cut-offs information. Table S3. Characteristics of the study population by LCA-derived metabolic classes. (DOCX 28 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santaolalla, A., Garmo, H., Grigoriadis, A. et al. Metabolic profiles to predict long-term cancer and mortality: the use of latent class analysis. BMC Mol and Cell Biol 20, 28 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: