Development of the nuclear translocation assay
We have developed a high-throughput assay to systematically identify a protein's potential for nuclear translocation according to the level of luciferase reporter activity (Figure 1). Our system is composed of three constructs. The first construct, ACT, encodes for a transactivation domain (TA) that is fused with the coding sequence of a domain A and a coding sequence (CDS) that we test for its ability to translocate to the nucleus. The second construct, BIND, encodes for a GAL4-DNA binding domain that is fused with the coding sequence of a domain B. The fusion proteins encoded in the ACT and BIND constructs can interact with each other via the selected interacting domains A and B. The third construct, a pG5luc vector containing five GAL4-DNA binding sites upstream of a minimal TATA box, which drives the expression of the luciferase (luc+) gene, acts as the reporter for the interaction between ACT and BIND constructs. The Gal4 DNA binding domain sequence used in the BIND construct contains a NLS that is sufficient for GAL4 nuclear localization [30–32]. Therefore, the fusion proteins generated by the BIND construct are constitutively able to enter the nucleus. We designed our system so that translocation of the fusion protein encoded by the ACT construct depends on the presence of a NLS in the target CDS; we have carefully engineered the interacting domain A and the transactivation domain TA which are capable of activating expression of the luciferase reporter gene and do not possess any localization signals. Therefore, the domain A::TA::CDS fusion protein is able to enter the nucleus only if the target CDS contains one or several NLSs. It interacts with BIND via the interacting-partners pair and reconstitutes an active GAL4 transcription factor that will induce the expression of the luciferase reporter gene (Figure 1A). On the other hand, the luciferase reporter gene will not be induced if the CDS lacks motifs encoding for NLSs (Figure 1B).
Optimization of interacting partners in ACT and BIND constructs
A key feature of the system is the interaction of the ACT and BIND fusion proteins in the nucleus via domains A and B. This interacting pair, A and B, must satisfy the following criteria: 1) their interaction is well-characterized, 2) both domains are as small as possible so as not to be a limiting factor for the generation of fusion protein constructs containing large investigated CDSs, 3) the interaction is easily detected by the luciferase reporter expression, yet its affinity is weak enough that the ACT fusion proteins are seldom transported into the nucleus by associating with the BIND protein, 4) domain A does not possess any NLSs, and 5) domain B does not possess transactivation activity.
Satisfying criteria 1 and 2, we selected TIP-1 and rhotekin as the domains A and B, in which the reported interaction is mediated by the small domains, the PDZ domain of TIP-1 and the C-terminus sequence of rhotekin [33]. Further, the interaction affinity between the PDZ domain and its binding peptide has been reported as relatively weak (KD around 10-7 M) [34]. We independently confirmed this interaction with the mammalian two-hybrid system from which the method reported herein is derived [35]. After we confirmed that the GFP-TIP-1 expression in mammalian cells is not localized in the nucleus (data not shown), we decided to further tailor rhotekin. Using the mammalian two-hybrid system, we tested a series of GAL4 DNA binding domain::rhotekin mutants fusion in which progressive deletion of rhotekin N-termini, Rhot443aa, Rhot257aa, Rhot111aa, and Rhot20aa were co-transfected with VP16 transactivation-TIP-1 fusion and the luciferase reporter plasmid into CHO-K1 cells. GAL4-Rhot20aa (remaining of the 20 last amino acids) was the optimal choice because we could maximize the signal resulting from the interaction with TIP-1 and minimize the background signal noise (detection of luciferase in the absence of an interacting partner; data not shown).
Selection of the transactivation protein
We selected a transactivation domain (TA) to fuse to the TIP-1 PDZ domain that would 1) result in a small fusion protein and not interfere with the translocation potential of the added CDS, 2) possess a strong transactivation activity inducing the expression of the luciferase reporter, and 3) not induce translocation to the nucleus except when fused with a tested CDS possessing a NLS. We turned to our previous protein-protein interaction work in which we had systematically screened for protein self-activity: that is, a protein that when fused to Gal4 DNA-binding domain is able to interact with the transcriptional machinery and induce the expression of the reporter gene in the mammalian two-hybrid system [29]. TNNC2 (troponin C type 2) appeared as the optimal choice as it fulfilled all of our requirements (data not shown).
BIND construct and high-throughput ACT construct preparation
Each ACT construct bearing a CDS of interest was created by a two-step PCR reaction. The CDS of each target gene was amplified with specific forward and reverse primers (Figure 2A) that produce two common sequences Tag1 and Tag2 at 5'- and 3'- terminus, respectively (red and green boxes in the first PCR products in Figure 2B). We also generated two common resources of PCR-amplified flanking fragments: the first one containing CMV-TIP-1-TNNC2 and the second one containing a SV40 poly-adenylation site (Figure 2B). Both resources of common DNA fragments were purified prior to use. Next, those PCR products were directly subjected to an overlapping PCR where the two common tag-derived sequences were used as margins to connect the DNA fragments of CMV-TIP-1-TNNC2, the target gene, and SV40 (Figure 2B). This two-step PCR reaction is performed without any intermediate purification steps, which further enhances the throughput of large collection preparations. The length of the PCR products was confirmed by 1% agarose electrophoresis (see Additional file 1). Using this approach, we could successfully amplify ACT constructs of up to 4.0 kb.
To generate BIND constructs, we employed a similar strategy; the DNA fragments for CMV-Gal4, and SV40 were amplified from the pBIND vector, purified, and used in an overlapping PCR to connect the DNA fragments of CMV-Gal4, Rhot20aa, and SV40 (Figure 2C).
Selection of cells and conditions for the assay
To test if CDSs of interest can translocate to the nucleus, we rely on the detection of the interaction between TIP-1 and Rhotekin (fused to the queried CDS), both of which can be expressed only transiently. Thus, the assay only requires the transfection of PCR products, which is a process that is easily automated and systematic. As a proof of concept, we tested the system using MT1M, a metallothionein protein annotated to predominantly localize in the nucleus, and SNX3, a member of the sorting nexin family involved in cytoplasmic trafficking of proteins. The ACT, BIND, and luciferase reporter constructs were transfected into the CHO-K1 cell line using lipofection. As we expected, we found that MT1M containing ACT constructs induced high reporter activity, while the induction of the luciferase reporter gene was marginal for the ACT construct containing SNX3 CDS (Figure 3A).
Next we explored whether the type of cell line in which we performed our assay influenced the results. The ACT constructs for MT1M and SNX3, together with the BIND and luciferase reporter constructs, were transfected into the same number of CHO-K1 and HeLa cells. We observed that MT1M shows higher luciferase activity than SNX3 in both cell lines although CHO-K1 cells shows higher luciferase counts than HeLa cells (Figure 3B). Thus, the use of non-human mammalian cell line (CHO-K1) did not seem to impair the in vivo assay, and we decided to use CHO-K1 cells for further analysis.
Large proteins generally translocate to the nucleus more slowly than smaller ones. We therefore evaluated the adequacy of incubating for 20 hours post-transfection before lysis of cells in the luciferase reporter assay (see Additional file 2). We selected three coding sequences representative of a wide range of protein sizes: CRIP1 (77 aa), NANOG (305 aa), and ARNT2 (717 aa), and estimated their translocation after incubation for 20, 30, and 40 hours. We did not observe any significant differences in the read-out intensities or ratios for any of the three sampled coding sequences, suggesting that 20 hours of incubation is sufficient for obtaining a robust luciferase reporter gene activation even for large coding sequences.
Next, we investigated if the presence of a strong nuclear exclusion signal affected the assay read-out (see Additional File 3). We made artificial constructs in which we fused the nuclear export sequence (NES) of the protein kinase inhibitor α (PKIA) to the carboxy terminus of two coding sequences that are able to be translocated to the nucleus (according to our luciferase reporter assay): NANOG and ELK1 (Figure 4 and Additional File 4). We then measured and compared the nuclear translocation of each of those two nuclear protein fusions to their respective PKIA NES fusion counterparts. The addition of the strong PKIA NES did not affect the nuclear translocation of NANOG. In contrast, the addition of PKIA NES to the carboxy terminus of ELK1 resulted in a drastic decrease in the luciferase ratio compared to that obtained with the native ELK1 ACT construct. The analysis of the sub-cellular localization of the GFP fusion version of those constructs corroborated the results of our luciferase-based reporter assay. Together, those results showed that our assay, as well as the GFP-fusion based assays, may be affected by the balance between the nuclear localization signal and the nuclear export signal of any given sequence.
Small-scale validation of the assay
To test the ability to detect the translocation of proteins in the nucleus, we analyzed two sets of genes with nuclear localization reported in HPRD [36]. The first set was composed of 12 genes annotated as nuclear proteins (ALX4, IRF3, NANOG, MSX1, ELK1, NEUROD6, TLX2, DLX6, PAPOLG, ARNT2, ANKRD2, and HNRPA1) and the second set was composed of 10 genes annotated as cytoplasmic proteins (ASMT, FAH, FARSLA, ODF2L, PRKAR1A, NRGN, CRIP1, CDKN2B, CLIC5, and LGALS4). For each gene in those two sets, we performed the nuclear translocation assay in triplicate and conducted sub-cellular localization experiments by generating GFP-fused proteins. The gene-specific primers used to generate the 22 GFP constructs for the sub-cellular localization experiments were similar to those used to fuse the first PCR products of our luciferase reporter system. We then compared the results obtained from our luciferase reporter assay with our GFP sub-cellular localization experiments, HPRD annotation, and sequence-based sub-cellular localization in-silico predictions (PSORT II [37]) (Figure 4 and Additional file 4).
We considered assays reporting an average 5-fold ratio of the luciferase signal with the BIND construct co-transfected to luciferase signal without the co-transfected BIND construct to represent confident nuclear translocation potential, based on empirical results. Eight of the 22 genes were observed exclusively in the cytoplasm, 5 were exclusively in the nucleus, and 9 were diffusively localized both in the cytoplasm and the nucleus when GFP fusions were transiently expressed in CHO-K1 cells. Our mammalian two-hybrid derived assay was designed to detect the nuclear translocation potential of a CDS; therefore, we considered GFP sub-cellular localization assay reporting diffuse localization of encoded fusion protein both in the cytoplasm and the nucleus to represent true positive results. Though the luciferase ratio was 5.30 (± 1.08), LGALS4-GFP fusions localized exclusively in the cytoplasm. Therefore, compared to the GFP sub-cellular localization assay, the false-positive rate was 7% (1/13). Reciprocally, while DLX6 and TLX2 appeared to be located in the nucleus when fused to GFP, the luciferase ratios of those two genes were only 3.77 (± 0.42) and 4.4 (± 0.53), respectively. As a result, we can conclude that compared with a GFP sub-cellular localization assay, our system performed with a false-negative rate of 22% (2/9).
We also used the program PSORT II to predict the sub-cellular localization of those 22 genes and compared the most probable localization reported by the program to our assay. Again, although we predicted from the results of our assay that DLX6 and TLX2 are unable to translocate to the nucleus, PSORT II predicts them to be nuclear proteins, thus yielding a 22% (2/9) false-negative rate when compared with computational predictions. Four proteins with luciferase ratios ranging from 5.2 (± 0.79) to 7.9 (± 2.27) were predictd by PSORT II to be cytoplasmic proteins which results in a false-positive rate of 30% (4/13).
Sub-cellular localization annotations reported in HPRD agreed poorly with our assay. Under the 5-fold luciferase signal threshold that we used to define proteins as able or not to translocate to the nucleus, our comparison of the reporter-based system with the HPRD annotations showed a 46% (6/13) false-positive rate and a 55% (5/9) false-negative rate. It is important to note that this poor false-positive rate was in large part due to proteins for which our assay gave results that were very close to the 5-fold threshold we defined; 4 out of 6 false-positive results arose from luciferase assay in the 5.13 (± 1.54) to 5.33 (± 0.8) range. Thus, under a stricter definition of the cut off for which a protein is considered to be able to translocate into the nucleus, comparisons of our assay to HPRD annotations would result in a more reasonable 14% false-positive rate. Additionally, our observations of CRIP1 nuclear localization in the GFP-fusion and luciferase reporter-based assays as well as PSORT II prediction contrasts with the lack of nuclear annotation noted for CRIP1 in HPRD. Similarly, the relatively high false-negative rate can be counter-balanced by the observation that 1) both ANKRD2 and IRF3 that were also consistently predicted by our luciferase assay, our GFP fusion assays, and PSORT II as not localized in the nucleus and 2) TLX2 and DLX6 were also mistakenly characterized in our assay when compared to our own GFP-fusion assay.
Finally for each of the 22 GFP fusions, we conducted a quantitative analysis of the distribution of GFP signal located over the nuclear versus that distributed in the cytoplasm. For 5 to 7 single-cell images per construct, the DAPI and GFP signals were used to locate, respectively, the nucleus boundary and the extent of the cytoplasmic compartments. The average intensity of GFP within the nucleus boundary was then computed and compared to that of the cytoplasm. A good correlation between those GFP signal intensity ratios and luciferase activities was observed, providing yet another line of evidence that the luciferase activity measured in our assay accurately reflects the nuclear translocation potential of a particular coding sequence (Figure 4 and Additional file 5).
To test the capacity of our method to detect the translocation potential of proteins located in the cytoplasm during steady state but known to shuttle between the nucleus and the cytoplasm, we selected three known cases and assayed their nuclear translocation: GTSE-1 [38], dishevelled/DVL2 [39], and survivin/BIRC5 [40] (see Additional File 6). We could accurately predict the nuclear translocation potential of GTSE-1 and disheveled, yielding an average luciferase ratio of 9.98 and 9.88, respectively. On the other hand, the average luciferase ratio obtained for BIRC5 was only 2.24. A possible explanation for the failure to detect survivin/BIRC5 translocation potential could be the loss of its anti-apoptotic property upon nuclear localization [41].