Core utilities that add pandas-esque features to arrow arrays and table.

Arrow forbids subclassing, so the classes are for logical grouping. Their methods are called as functions.

Column

class graphique.core.Column[source]

Bases: pyarrow.lib.ChunkedArray

Chunked array interface as a namespace of functions.

absolute() → pyarrow.lib.ChunkedArray[source]

Return absolute values.

arggroupby() → dict[source]

Return groups of index arrays.

call(func: Callable, *args) → pyarrow.lib.ChunkedArray[source]

Call compute function on array with support for dictionaries.

count(value) → int[source]

Return number of occurrences of value.

equal(value) → pyarrow.lib.ChunkedArray[source]

Return boolean mask array which matches scalar value.

find(*values) → Iterator[slice][source]

Generate slices of matching rows from a sorted array.

is_in(values, invert=False) → pyarrow.lib.ChunkedArray[source]

Return boolean mask array which matches any value.

mask(func='and', **query) → pyarrow.lib.ChunkedArray[source]

Return boolean mask array which matches query predicates.

max()[source]

Return max of the values.

maximum(value) → pyarrow.lib.ChunkedArray[source]

Return element-wise maximum of values.

min()[source]

Return min of the values.

minimum(value) → pyarrow.lib.ChunkedArray[source]

Return element-wise minimum of values.

not_equal(value) → pyarrow.lib.ChunkedArray[source]

Return boolean mask array which doesn’t match scalar value.

range(lower=None, upper=None, include_lower=True, include_upper=False) → slice[source]

Return slice within range from a sorted array, by default a half-open interval.

sort(reverse=False, length: int = None) → pyarrow.lib.Array[source]

Return sorted values, optimized for fixed length.

sum(exp: int = 1)[source]

Return sum of the values, with optional exponentiation.

Table

class graphique.core.Table[source]

Bases: pyarrow.lib.Table

Table interface as a namespace of functions.

apply(name: str, alias: str = '', **partials) → pyarrow.lib.Table[source]

Return view of table with functions applied across columns.

group(name: str, reverse=False, predicate=<class 'int'>, sort=False) → Iterator[pyarrow.lib.Table][source]

Generate tables grouped by column, with filtering and slicing on table length.

index() → list[source]

Return index column names from pandas metadata.

is_in(name: str, *values) → pyarrow.lib.Table[source]

Return rows which matches one of the values.

Assumes the table is sorted by the column name, i.e., indexed.

mask(name: str, **query) → Iterator[pyarrow.lib.Array][source]

Return mask array which matches query.

not_equal(name: str, value) → pyarrow.lib.Table[source]

Return rows which don’t match the value.

Assumes the table is sorted by the column name, i.e., indexed.

num_chunks() → Optional[int][source]

Return number of chunks if consistent across columns, else None.

range(name: str, lower=None, upper=None, **includes) → pyarrow.lib.Table[source]

Return rows within range, by default a half-open interval.

Assumes the table is sorted by the column name, i.e., indexed.

sort(*names, reverse=False, length: int = None) → pyarrow.lib.Table[source]

Return table sorted by columns.

take_chunks(indices: pyarrow.lib.ChunkedArray) → pyarrow.lib.Table[source]

Return table with selected rows from a non-offset chunked array.

ChunkedArray.take concatenates the chunks and as such is not performant for grouping. Assumes the shape of the columns is the same.

types() → dict[source]

Return mapping of column types.

unique(name: str, reverse=False) → pyarrow.lib.Table[source]

Return table with first or last occurrences from grouping by column.