# Misata — Synthetic Data Engine for AI Agents Generate production-quality relational synthetic datasets from a single sentence. Misata is the only MCP server that gives AI agents **deterministic, outcome-conformant synthetic data with a verifiable foreign-key integrity proof** — no spreadsheet, no hand-coding, no post-processing. --- ## What it does Describe the dataset you need in plain English. Misata's closed-form math engine generates every row to hit your declared targets exactly — revenue curves, fraud rates, churn percentages, seasonal peaks — while guaranteeing 100% referential integrity across every relationship in the schema. The same seed always produces the same rows. No LLM touches the actual data. An LLM is only used to parse your description into a schema; generation itself is pure deterministic math. That means you can use Misata with **zero API key** if you already have a schema, and you never have to worry about hallucinated values, biased distributions, or non-reproducible outputs. --- ## Tools ### `generate_from_story` — one sentence to a full dataset Pass a plain-English description and get back a complete relational dataset with every table populated, every foreign key satisfied, and every declared outcome hit. ``` "500 customers, 2 000 orders across 5 product categories, revenue peaking in Q4, 3 % return rate, realistic name/address distribution" ``` Returns a preview of every table (first 20 rows), total row counts, and generation metadata. Requires a Groq, OpenAI, or Anthropic API key for the NL→schema step. --- ### `generate_from_schema` — deterministic generation, no API key needed Pass a Misata schema dict (tables, column types, relationships, optional outcome curves) and receive generated data plus a per-relationship FK integrity proof. Fully reproducible: same schema + same seed = identical output every time. Use this when you want full control over the schema, when you're running in an air-gapped environment, or when you need to generate new batches of the same dataset on demand without an LLM in the loop. --- ### `design_schema` — NL description → schema dict Converts a plain-English description into a Misata schema dict without generating any data. Inspect it, modify it, version-control it, then pass it to `generate_from_schema` whenever you need a fresh batch. Requires an LLM API key. --- ## Why synthetic data in an MCP server? AI agents that reason about data pipelines, write migrations, build dashboards, or test analytics logic need **real-shaped data to work with** — not hand-rolled fixtures with three rows and no relationships. Misata gives agents: - **Realistic distributions** — joint name/gender/culture identities, Zipf-law categoricals, geographic distances, rating-conformant text, semantic timestamp profiles - **Declared outcomes** — tell the engine "15 % of orders are returned" and every batch hits exactly 15 %, not approximately - **Referential integrity** — every foreign key is satisfied across every table, verified and reported in the integrity proof - **Reproducibility** — pin a seed and your agent can regenerate the exact same dataset in any environment, at any time --- ## Supported LLM providers (BYOK) Configure whichever key you already have — all three unlock `generate_from_story` and `design_schema`. `generate_from_schema` always works with no key. | Provider | Config field | Where to get one | |-----------|-------------------|------------------------------| | Groq | `groqApiKey` | console.groq.com — free tier | | OpenAI | `openaiApiKey` | platform.openai.com/api-keys | | Anthropic | `anthropicApiKey` | console.anthropic.com | --- ## Common agent workflows **Schema-first workflow (no key needed)** 1. Agent calls `design_schema` with a description → receives schema dict 2. Agent reviews / modifies the schema 3. Agent calls `generate_from_schema` → receives data + integrity proof 4. Agent uses the data for testing, seeding, or analysis **One-shot workflow** 1. Agent calls `generate_from_story` with a single description → complete dataset in one call **Batch regeneration** 1. Schema is stored in the agent's memory or a file 2. Agent calls `generate_from_schema` with a new seed whenever a fresh batch is needed 3. Zero LLM cost per batch --- ## Limits - Preview: first 20 rows per table returned inline; full data available via Misata Studio - Row cap: 5 000 rows per table, 25 000 total per call (contact us for higher limits) - Generation time: typically 1–8 seconds depending on schema complexity and row count --- ## Open source Misata's generation engine is MIT-licensed and available on [GitHub](https://github.com/rasinmuhammed/misata) and [PyPI](https://pypi.org/project/misata/). The engine has no runtime dependency on any LLM or external API — you can run it locally, in CI, or embedded in your own agent. ```bash pip install misata ``` [Studio](https://misata.studio) · [Docs](https://misata.studio/docs) · [GitHub](https://github.com/rasinmuhammed/misata)
How to connect
https://server.smithery.ai/misata/misata/mcp
curl -X POST https://server.smithery.ai/misata/misata/mcp \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{}}'