Human pancreas – Open Problems in Single Cell Analysis

Info

openproblems_v1/pancreas
Luecken et al. (2021)
1.26 GiB
02-02-2024
16382 cells × 18771 genes

Quick links

Used in

Description

Human pancreatic islet scRNA-seq data from 6 datasets across technologies (CEL-seq, CEL-seq2, Smart-seq2, inDrop, Fluidigm C1, and SMARTER-seq).

Preview

dataset is an AnnData object with n_obs × n_vars = 16382 × 18771 with slots:

obs: size_factors, cell_type, batch
var: feature_name, hvg, hvg_score
obsp: knn_connectivities, knn_distances
obsm: X_pca
varm: pca_loadings
layers: counts, normalized
uns: dataset_description, dataset_id, dataset_name, dataset_organism, dataset_reference, dataset_summary, dataset_url, knn, normalization_id, pca_variance

Reference

Name	Description	Type	Data type	Size
obs
`batch`	A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.	`vector`	`category`	16382
`cell_type`	Classification of the cell type based on its characteristics and function within the tissue or organism.	`vector`	`category`	16382
`size_factors`	The size factors created by the normalisation method, if any.	`vector`	`float32`	16382
var
`feature_name`	A human-readable name for the feature, usually a gene symbol.	`vector`	`object`	18771
`hvg`	Whether or not the feature is considered to be a ‘highly variable gene’	`vector`	`bool`	18771
`hvg_score`	A ranking of the features by hvg.	`vector`	`float64`	18771
obsp
`knn_connectivities`	K nearest neighbors connectivities matrix.	`sparsematrix`	`float32`	16382 × 16382
`knn_distances`	K nearest neighbors distance matrix.	`sparsematrix`	`float64`	16382 × 16382
obsm
`X_pca`	The resulting PCA embedding.	`densematrix`	`float32`	16382 × 50
varm
`pca_loadings`	The PCA loadings matrix.	`densematrix`	`float64`	18771 × 50
layers
`counts`	Raw counts	`sparsematrix`	`float32`	16382 × 18771
`normalized`	Normalised expression values	`sparsematrix`	`float32`	16382 × 18771
uns
`dataset_description`	Long description of the dataset.	`atomic`	`str`	1
`dataset_id`	A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.	`atomic`	`str`	1
`dataset_name`	A human-readable name for the dataset.	`atomic`	`str`	1
`dataset_organism`	The organism of the sample in the dataset.	`atomic`	`str`	1
`dataset_reference`	Bibtex reference of the paper in which the dataset was published.	`atomic`	`str`	1
`dataset_summary`	Short description of the dataset.	`atomic`	`str`	1
`dataset_url`	Link to the original source of the dataset.	`atomic`	`str`	1
`knn`	Supplementary K nearest neighbors data.	`dict`		3
`normalization_id`	Which normalization was used	`atomic`	`str`	1
`pca_variance`	The PCA variance objects.	`dict`		2

Slot crossref data

`dataset.layers['counts']`

In R: dataset$layers[["counts"]]

Type: sparsematrix, data type: float32, shape: 16382 × 18771

Raw counts

`dataset.layers['normalized']`

In R: dataset$layers[["normalized"]]

Type: sparsematrix, data type: float32, shape: 16382 × 18771

Normalised expression values

`dataset.obs['size_factors']`

In R: dataset$obs[["size_factors"]]

Type: vector, data type: float32, shape: 16382

The size factors created by the normalisation method, if any.

`dataset.obs['cell_type']`

In R: dataset$obs[["cell_type"]]

Type: vector, data type: category, shape: 16382

Classification of the cell type based on its characteristics and function within the tissue or organism.

`dataset.obs['batch']`

In R: dataset$obs[["batch"]]

Type: vector, data type: category, shape: 16382

A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.

`dataset.obsm['X_pca']`

In R: dataset$obsm[["X_pca"]]

Type: densematrix, data type: float32, shape: 16382 × 50

The resulting PCA embedding.

`dataset.obsp['knn_connectivities']`

In R: dataset$obsp[["knn_connectivities"]]

Type: sparsematrix, data type: float32, shape: 16382 × 16382

K nearest neighbors connectivities matrix.

`dataset.obsp['knn_distances']`

In R: dataset$obsp[["knn_distances"]]

Type: sparsematrix, data type: float64, shape: 16382 × 16382

K nearest neighbors distance matrix.

`dataset.uns['dataset_description']`

In R: dataset$uns[["dataset_description"]]

Type: atomic, data type: str, shape: 1

Long description of the dataset.

`dataset.uns['dataset_id']`

In R: dataset$uns[["dataset_id"]]

Type: atomic, data type: str, shape: 1

A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived.

`dataset.uns['dataset_name']`

In R: dataset$uns[["dataset_name"]]

Type: atomic, data type: str, shape: 1

A human-readable name for the dataset.

`dataset.uns['dataset_organism']`

In R: dataset$uns[["dataset_organism"]]

Type: atomic, data type: str, shape: 1

The organism of the sample in the dataset.

`dataset.uns['dataset_reference']`

In R: dataset$uns[["dataset_reference"]]

Type: atomic, data type: str, shape: 1

Bibtex reference of the paper in which the dataset was published.

`dataset.uns['dataset_summary']`

In R: dataset$uns[["dataset_summary"]]

Type: atomic, data type: str, shape: 1

Short description of the dataset.

`dataset.uns['dataset_url']`

In R: dataset$uns[["dataset_url"]]

Type: atomic, data type: str, shape: 1

Link to the original source of the dataset.

`dataset.uns['knn']`

In R: dataset$uns[["knn"]]

Type: dict, data type: ``, shape: 3

Supplementary K nearest neighbors data.

`dataset.uns['normalization_id']`

In R: dataset$uns[["normalization_id"]]

Type: atomic, data type: str, shape: 1

Which normalization was used

`dataset.uns['pca_variance']`

In R: dataset$uns[["pca_variance"]]

Type: dict, data type: ``, shape: 2

The PCA variance objects.

`dataset.var['feature_name']`

In R: dataset$var[["feature_name"]]

Type: vector, data type: object, shape: 18771

A human-readable name for the feature, usually a gene symbol.

`dataset.var['hvg']`

In R: dataset$var[["hvg"]]

Type: vector, data type: bool, shape: 18771

Whether or not the feature is considered to be a ‘highly variable gene’

`dataset.var['hvg_score']`

In R: dataset$var[["hvg_score"]]

Type: vector, data type: float64, shape: 18771

A ranking of the features by hvg.

`dataset.varm['pca_loadings']`

In R: dataset$varm[["pca_loadings"]]

Type: densematrix, data type: float64, shape: 18771 × 50

The PCA loadings matrix.

References

Luecken, Malte D., M. Büttner, K. Chaichoompu, A. Danese, M. Interlandi, M. F. Mueller, D. C. Strobl, et al. 2021. “Benchmarking Atlas-Level Data Integration in Single-Cell Genomics.” Nature Methods 19 (1): 41–50. https://doi.org/10.1038/s41592-021-01336-8.