https://img.shields.io/pypi/v/graphique.svgimage https://img.shields.io/pypi/pyversions/graphique.svgimage https://pepy.tech/badge/graphiqueimage https://img.shields.io/pypi/status/graphique.svgimage https://api.travis-ci.com/coady/graphique.svgimage https://img.shields.io/codecov/c/github/coady/graphique.svgimage https://readthedocs.org/projects/graphique/badgeimage https://requires.io/github/coady/graphique/requirements.svgimage https://img.shields.io/badge/code%20style-black-000000.svgimage http://mypy-lang.org/static/mypy_badge.svgimage

GraphQL service for arrow tables and parquet data sets. The schema is derived automatically.

Usage

% env PARQUET_PATH=... uvicorn graphique.service:app

Open http://localhost:8000/graphql to try out the API in GraphiQL. There is a test fixture at ./tests/fixtures/zipcodes.parquet.

Configuration

Graphique uses Starlette’s config: in environment variables or a .env file. Config variables are used as input to ParquetDataset.

  • COLUMNS = None
  • DEBUG = False
  • DICTIONARIES = None
  • INDEX = None
  • MMAP = True
  • PARQUET_PATH

Queries

A Table is the primary interface. It has fields for filtering, sorting, and grouping.

"""a column-oriented table"""
type Table {
  """number of rows"""
  length: Long!

  """fields for each column"""
  columns: Columns!

  """Return scalar values at index."""
  row(index: Long! = 0): Row!

  """Return table slice."""
  slice(offset: Long! = 0, length: Long): Table!

  """
  Return tables grouped by columns, with stable ordering.
          `length` is the maximum number of tables to return.
          `count` filters and sorts tables based on the number of rows within each table.
  """
  group(by: [String!]!, reverse: Boolean! = false, length: Long, count: LongReduce): [Table!]!

  """
  Return table of first or last occurrences grouped by columns, with stable ordering.
  """
  unique(by: [String!]!, reverse: Boolean! = false): Table!

  """Return table slice sorted by specified columns."""
  sort(by: [String!]!, reverse: Boolean! = false, length: Long): Table!

  """Return table with minimum values per column."""
  min(by: [String!]!): Table!

  """Return table with maximum values per column."""
  max(by: [String!]!): Table!

  """
  Return table with rows which match all (by default) queries.
          `invert` optionally excludes matching rows.
          `reduce` is the binary operator to combine filters; within a column all predicates must match.
  """
  filter(query: Filters!, invert: Boolean! = false, reduce: Operator! = AND): Table!

Performance

Graphique relies on native pyarrow routines wherever possible. Otherwise it falls back to using NumPy, with zero-copy views. Graphique also has custom optimizations for grouping, dictionary-encoded arrays, and chunked arrays.

Specifying an INDEX of columns indicates the table is sorted, and enables a binary search interface.

  """
  Return table with matching values for compound `index`.
          Queries must be a prefix of the `index`.
          Only one non-equal query is allowed, and applied last.
  """
  search(...): Table!

Installation

% pip install graphique

Dependencies

  • pyarrow >=2
  • strawberry-graphql >=0.30
  • pytz (optional timestamp support)

Tests

100% branch coverage.

% pytest [--cov]

Changes

dev

  • ListColumn and StructColumn types
  • Groups type with aggregate field
  • group and unique optimized
  • pyarrow >= 2 required