orthomap: Step 1 - get taxonomic information
This notebook will demonstrate how to get taxonomic information for your query species with orthomap.
Given a species name or taxonomic ID, the query species lineage information is extracted with the help of the ete3 python toolkit and the NCBI taxonomy (Huerta-Cepas et al., 2016). This information is needed alongside with the taxonomic classifications for all species used in the OrthoFinder comparison.
Note: If you need to download or update the NCBI taxonomy database via the ete3 python package. Please use the orthomap command line function ncbitax or run the following code:
Notebook file
Notebook file can be obtained here:
https://raw.githubusercontent.com/kullrich/orthomap/main/docs/notebooks/query_lineage.ipynb
Import libraries
[1]:
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt
from statannot import add_stat_annotation
# increase dpi
%matplotlib inline
#plt.rcParams['figure.dpi'] = 300
#plt.rcParams['savefig.dpi'] = 300
plt.rcParams['figure.figsize'] = [6, 4.5]
#plt.rcParams['figure.figsize'] = [4.4, 3.3]
Import orthomap python package submodules
[2]:
# import submodules
from orthomap import qlin, gtf2t2g, of2orthomap, orthomap2tei, datasets
Get query species taxonomic lineage information
The orthomap submodule qlin helps to get taxonomic information for you with the qlin.get_qlin() function as follows:
[3]:
# get query species taxonomic lineage information
query_lineage = qlin.get_qlin(q='Caenorhabditis elegans')
query name: Caenorhabditis elegans
query taxID: 6239
query kingdom: Eukaryota
query lineage names:
['root(1)', 'cellular organisms(131567)', 'Eukaryota(2759)', 'Opisthokonta(33154)', 'Metazoa(33208)', 'Eumetazoa(6072)', 'Bilateria(33213)', 'Protostomia(33317)', 'Ecdysozoa(1206794)', 'Nematoda(6231)', 'Chromadorea(119089)', 'Rhabditida(6236)', 'Rhabditina(2301116)', 'Rhabditomorpha(2301119)', 'Rhabditoidea(55879)', 'Rhabditidae(6243)', 'Peloderinae(55885)', 'Caenorhabditis(6237)', 'Caenorhabditis elegans(6239)']
query lineage:
[1, 131567, 2759, 33154, 33208, 6072, 33213, 33317, 1206794, 6231, 119089, 6236, 2301116, 2301119, 55879, 6243, 55885, 6237, 6239]
The query_lineage variable now contains the following information in a list: - query name query_lineage[0] - query taxID query_lineage[1] - query lineage query_lineage[2] - query lineage dictionary query_lineage[3] - query lineage zip query_lineage[4] - query lineage names query_lineage[5] - reverse query lineage query_lineage[6] - query kingdom query_lineage[7]
[4]:
#query name
query_lineage[0]
[4]:
'Caenorhabditis elegans'
[5]:
#query taxID
query_lineage[1]
[5]:
6239
[6]:
#query lineage
query_lineage[2]
[6]:
[1,
131567,
2759,
33154,
33208,
6072,
33213,
33317,
1206794,
6231,
119089,
6236,
2301116,
2301119,
55879,
6243,
55885,
6237,
6239]
[7]:
#query lineage dictionary
query_lineage[3]
[7]:
{1: 'root',
2759: 'Eukaryota',
6072: 'Eumetazoa',
6231: 'Nematoda',
6236: 'Rhabditida',
6237: 'Caenorhabditis',
6239: 'Caenorhabditis elegans',
6243: 'Rhabditidae',
33154: 'Opisthokonta',
33208: 'Metazoa',
33213: 'Bilateria',
33317: 'Protostomia',
55879: 'Rhabditoidea',
55885: 'Peloderinae',
119089: 'Chromadorea',
131567: 'cellular organisms',
1206794: 'Ecdysozoa',
2301116: 'Rhabditina',
2301119: 'Rhabditomorpha'}
[8]:
#query lineage zip
query_lineage[4]
[8]:
[(1, 'root'),
(131567, 'cellular organisms'),
(2759, 'Eukaryota'),
(33154, 'Opisthokonta'),
(33208, 'Metazoa'),
(6072, 'Eumetazoa'),
(33213, 'Bilateria'),
(33317, 'Protostomia'),
(1206794, 'Ecdysozoa'),
(6231, 'Nematoda'),
(119089, 'Chromadorea'),
(6236, 'Rhabditida'),
(2301116, 'Rhabditina'),
(2301119, 'Rhabditomorpha'),
(55879, 'Rhabditoidea'),
(6243, 'Rhabditidae'),
(55885, 'Peloderinae'),
(6237, 'Caenorhabditis'),
(6239, 'Caenorhabditis elegans')]
[9]:
#query lineage names
query_lineage[5]
[9]:
| PSnum | PStaxID | PSname | |
|---|---|---|---|
| 0 | 0 | 1 | root |
| 1 | 1 | 131567 | cellular organisms |
| 2 | 2 | 2759 | Eukaryota |
| 3 | 3 | 33154 | Opisthokonta |
| 4 | 4 | 33208 | Metazoa |
| 5 | 5 | 6072 | Eumetazoa |
| 6 | 6 | 33213 | Bilateria |
| 7 | 7 | 33317 | Protostomia |
| 8 | 8 | 1206794 | Ecdysozoa |
| 9 | 9 | 6231 | Nematoda |
| 10 | 10 | 119089 | Chromadorea |
| 11 | 11 | 6236 | Rhabditida |
| 12 | 12 | 2301116 | Rhabditina |
| 13 | 13 | 2301119 | Rhabditomorpha |
| 14 | 14 | 55879 | Rhabditoidea |
| 15 | 15 | 6243 | Rhabditidae |
| 16 | 16 | 55885 | Peloderinae |
| 17 | 17 | 6237 | Caenorhabditis |
| 18 | 18 | 6239 | Caenorhabditis elegans |
[10]:
#reverse query lineage
query_lineage[6]
[10]:
[6239,
6237,
55885,
6243,
55879,
2301119,
2301116,
6236,
119089,
6231,
1206794,
33317,
33213,
6072,
33208,
33154,
2759,
131567,
1]
[11]:
#query kingdom
query_lineage[7]
[11]:
'Eukaryota'
Get query species lineage as a tree object
[12]:
lineage_tree = qlin.get_lineage_topo(qt='6239')
print(lineage_tree)
/- /-18/6239/Caenorhabditis elegans
/-|
/-| \-17/6237/Caenorhabditis
| |
/-| \-16/55885/Peloderinae
| |
/-| \-15/6243/Rhabditidae
| |
/-| \-14/55879/Rhabditoidea
| |
/-| \-13/2301119/Rhabditomorpha
| |
/-| \-12/2301116/Rhabditina
| |
/-| \-11/6236/Rhabditida
| |
/-| \-10/119089/Chromadorea
| |
/-| \-9/6231/Nematoda
| |
/-| \-8/1206794/Ecdysozoa
| |
/-| \-7/33317/Protostomia
| |
/-| \-6/33213/Bilateria
| |
/-| \-5/6072/Eumetazoa
| |
/-| \-4/33208/Metazoa
| |
/-| \-3/33154/Opisthokonta
| |
/-| \-2/2759/Eukaryota
| |
--| \-1/131567/cellular organisms
|
\-0/1/root
If you like to continue, please have a look at the documentation of Step 2 - gene age class assignment to get further insides.