Core utilities that add pandas-esque features to arrow arrays and table.

Arrow forbids subclassing, so the classes are for logical grouping. Their methods are called as functions.

Column¶

class graphique.core.Column[source]¶

Bases: pyarrow.lib.ChunkedArray

Chunked array interface as a namespace of functions.

call(func: Callable, *args) → pyarrow.lib.ChunkedArray[source]¶: Call compute function on array with support for dictionaries.

equal(value) → pyarrow.lib.ChunkedArray[source]¶: Return boolean mask array which matches scalar value.

find(*values) → Iterator[slice][source]¶: Generate slices of matching rows from a sorted array.

is_in(values, invert=False) → pyarrow.lib.ChunkedArray[source]¶: Return boolean mask array which matches any value.

mask(func='and', **query) → pyarrow.lib.ChunkedArray[source]¶: Return boolean mask array which matches query predicates.

maximum(value) → pyarrow.lib.ChunkedArray[source]¶: Return element-wise maximum of values.

minimum(value) → pyarrow.lib.ChunkedArray[source]¶: Return element-wise minimum of values.

not_equal(value) → pyarrow.lib.ChunkedArray[source]¶: Return boolean mask array which doesn’t match scalar value.

range(lower=None, upper=None, include_lower=True, include_upper=False) → slice[source]¶: Return slice within range from a sorted array, by default a half-open interval.

sort(reverse=False, length: int = None) → pyarrow.lib.Array[source]¶: Return sorted values, optimized for fixed length.

sum(exp: int = 1)[source]¶: Return sum of the values, with optional exponentiation.

Table¶

class graphique.core.Table[source]¶

Bases: pyarrow.lib.Table

Table interface as a namespace of functions.

apply(name: str, alias: str = '', **partials) → pyarrow.lib.Table[source]¶: Return view of table with functions applied across columns.

group(name: str, reverse=False, predicate=<class 'int'>, sort=False) → Iterator[pyarrow.lib.Table][source]¶: Generate tables grouped by column, with filtering and slicing on table length.

is_in(name: str, *values) → pyarrow.lib.Table[source]¶

Return rows which matches one of the values.

Assumes the table is sorted by the column name, i.e., indexed.

mask(name: str, **query) → Iterator[pyarrow.lib.Array][source]¶: Return mask array which matches query.

not_equal(name: str, value) → pyarrow.lib.Table[source]¶

Return rows which don’t match the value.

Assumes the table is sorted by the column name, i.e., indexed.

num_chunks() → Optional[int][source]¶: Return number of chunks if consistent across columns, else None.

range(name: str, lower=None, upper=None, **includes) → pyarrow.lib.Table[source]¶

Return rows within range, by default a half-open interval.

Assumes the table is sorted by the column name, i.e., indexed.

sort(*names, reverse=False, length: int = None) → pyarrow.lib.Table[source]¶: Return table sorted by columns.

take_chunks(indices: pyarrow.lib.ChunkedArray) → pyarrow.lib.Table[source]¶

Return table with selected rows from a non-offset chunked array.

ChunkedArray.take concatenates the chunks and as such is not performant for grouping. Assumes the shape of the columns is the same.

unique(name: str, reverse=False) → pyarrow.lib.Table[source]¶: Return table with first or last occurrences from grouping by column.