Core utilities that add pandas-esque features to arrow arrays and table.

Arrow forbids subclassing, so the classes are for logical grouping. Their methods are called as functions.

Column

class graphique.core.Column[source]

Bases: pyarrow.lib.ChunkedArray

Chunked array interface as a namespace of functions.

absolute() → pyarrow.lib.ChunkedArray[source]

Return absolute values.

call(func: Callable, *args) → pyarrow.lib.ChunkedArray[source]

Call compute function on array with support for dictionaries.

count(value) → int[source]

Return number of occurrences of value.

equal(value) → pyarrow.lib.ChunkedArray[source]

Return boolean mask array which matches scalar value.

find(*values) → Iterator[slice][source]

Generate slices of matching rows from a sorted array.

is_in(values) → pyarrow.lib.ChunkedArray[source]

Return boolean mask array which matches any value.

mask(func='and', **query) → pyarrow.lib.ChunkedArray[source]

Return boolean mask array which matches query predicates.

max()[source]

Return max of the values.

maximum(value) → pyarrow.lib.ChunkedArray[source]

Return element-wise maximum of values.

mean() → Optional[float][source]

Return mean of the values.

min()[source]

Return min of the values.

minimum(value) → pyarrow.lib.ChunkedArray[source]

Return element-wise minimum of values.

mode()[source]

Return mode of the values.

not_equal(value) → pyarrow.lib.ChunkedArray[source]

Return boolean mask array which doesn’t match scalar value.

quantile(*q) → list[source]

Return q-th quantiles for values.

range(lower=None, upper=None, include_lower=True, include_upper=False) → slice[source]

Return slice within range from a sorted array, by default a half-open interval.

sort(reverse=False, length: int = None) → pyarrow.lib.Array[source]

Return sorted values, optimized for fixed length.

stddev() → Optional[float][source]

Return standard deviation of the values.

sum(exp: int = 1)[source]

Return sum of the values, with optional exponentiation.

variance() → Optional[float][source]

Return variance of the values.

Table

class graphique.core.Table[source]

Bases: pyarrow.lib.Table

Table interface as a namespace of functions.

apply(name: str, alias: str = '', **partials) → pyarrow.lib.Table[source]

Return view of table with functions applied across columns.

group(name: str, reverse=False, predicate=<class 'int'>, sort=False) → Iterator[pyarrow.lib.Table][source]

Generate tables grouped by column, with filtering and slicing on table length.

index() → list[source]

Return index column names from pandas metadata.

is_in(name: str, *values) → pyarrow.lib.Table[source]

Return rows which matches one of the values.

Assumes the table is sorted by the column name, i.e., indexed.

mask(name: str, **query) → pyarrow.lib.Array[source]

Return mask array which matches query.

not_equal(name: str, value) → pyarrow.lib.Table[source]

Return rows which don’t match the value.

Assumes the table is sorted by the column name, i.e., indexed.

range(name: str, lower=None, upper=None, **includes) → pyarrow.lib.Table[source]

Return rows within range, by default a half-open interval.

Assumes the table is sorted by the column name, i.e., indexed.

sort(*names, reverse=False, length: int = None) → pyarrow.lib.Table[source]

Return table sorted by columns.

types() → dict[source]

Return mapping of column types.

unique(name: str, reverse=False, count='') → pyarrow.lib.Table[source]

Return table with first or last occurrences from grouping by column.

Optionally include counts in an additional column. Faster than group() when only scalars are needed.