---
title: "Target2NP: Compound–Target Interactions"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Target2NP: Compound–Target Interactions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

```{r setup}
library(unitcm)
```

The **Target2NP** module provides access to a large-scale compound–target
interaction database covering multiple experimental sources (BindingDB,
HERB2, NPASS, BATMAN, etc.) as well as computational predictions from
DrugCLIP (deep learning) and SEA (ChEMBL similarity). This vignette
walks through the main workflows.

## Experimental interactions

### Search and filter

```{r search-basic}
# Free-text search across all fields
hits <- search_target2np(search = "quercetin")
hits

# Exact-match by gene symbol
tp53 <- search_target2np(
  search = "TP53",
  search_field = "gene_symbol",
  search_mode = "exact"
)
tp53
attr(tp53, "total")
```

```{r search-filters}
# Combine filters
results <- search_target2np(
  search = "curcumin",
  search_field = "compound_name",
  source_db = "BindingDB",
  target_organism = "Homo sapiens",
  activity_type = "IC50"
)
results
```

### Available filters and statistics

```{r filter-options}
# What filter values exist?
opts <- fetch_target2np_filters()
opts$source_db
opts$target_organism
opts$activity_type

# Global database statistics
stats <- fetch_target2np_stats()
stats$total_records
stats$source_db_distribution
```

### Retrieve a single record

```{r detail}
detail <- get_target2np(1)
detail$compound_name
detail$gene_symbol
detail$activity_value
detail$activity_units
detail$pmid
```

### Batch query

Look up interactions for multiple genes in one call (up to 50
identifiers):

```{r batch}
batch <- batch_target2np(c("TP53", "BRCA1", "EGFR", "VEGFA"))
batch
attr(batch, "queries_matched")
attr(batch, "queries_not_found")

# UniProt-based batch
batch_up <- batch_target2np(
  c("P04637", "P38398"),
  id_type = "uniprot_id"
)
```

## Computational predictions

### DrugCLIP deep-learning predictions

```{r drugclip}
# High-confidence predictions for quercetin
dc_high <- search_target2np_drugclip(
  search = "quercetin",
  search_field = "compound_name",
  confidence = "high"
)
dc_high

# Score-based filtering
dc <- search_target2np_drugclip(
  search = "EGFR",
  search_field = "gene_symbol",
  min_score = 0.7
)
dc
```

### SEA similarity-based predictions

```{r sea}
# High-confidence SEA predictions
sea_high <- search_target2np_sea(
  search = "quercetin",
  search_field = "compound_name",
  confidence = "high"
)
sea_high

# Filter by adjusted p-value
sea <- search_target2np_sea(
  search = "TP53",
  search_field = "gene_symbol",
  max_pvalue = 0.01
)
sea
```

## Cross-source analysis

### Multi-source summary

The `target2np_multi_source_summary()` function queries experimental
records, DrugCLIP, and SEA for the same term and returns an integrated
overview: source counts, overlap statistics, confidence distributions,
and cross-validated compound–target pairs.

```{r multi-source}
summary <- target2np_multi_source_summary(
  search = "TP53",
  search_field = "gene_symbol"
)

# How many results per source?
summary$source_counts

# Target overlap across data sources
summary$target_overlap

# Confidence-level distribution for each source
summary$confidence_distribution

# Compound-target pairs found in >= 2 sources
summary$cross_validated

# Natural-language interpretation
cat(summary$suggestion_text)
```

### Aggregated view

The aggregated view groups experimental interaction records by
compound–target pair (InChIKey + UniProt ID) and returns pairs supported
by multiple source databases. This is useful for identifying
well-evidenced interactions.

```{r aggregated}
# Pairs seen in >= 3 databases
agg <- aggregated_target2np(
  search = "quercetin",
  min_sources = 3
)
agg

# Include DrugCLIP/SEA prediction counts as additional sources
agg_pred <- aggregated_target2np(
  search = "quercetin",
  min_sources = 2,
  include_predictions = TRUE
)
agg_pred
```

## Practical example: multi-evidence target prioritisation

A common workflow is to start with a compound of interest, query all
three data sources, and use cross-validation to prioritise targets.

```{r workflow}
# 1. Check experimental evidence
exp <- search_target2np(
  search = "quercetin",
  search_field = "compound_name",
  search_mode = "fuzzy",
  all_pages = TRUE
)
nrow(exp)

# 2. Get multi-source summary in one call
ms <- target2np_multi_source_summary(
  search = "quercetin",
  search_field = "compound_name",
  search_mode = "fuzzy"
)
ms$source_counts
ms$cross_validated

# 3. Batch-check the top cross-validated targets
top_genes <- unique(vapply(
  ms$cross_validated, `[[`, character(1), "gene_symbol"
))
if (length(top_genes) > 0) {
  batch_detail <- batch_target2np(top_genes)
  batch_detail
}
```