DataStori is hosted on Amazon Web Services (AWS) servers in the US East Region.
DataStori fetches data from a source application by connecting to its APIs, or by reading CSV files exported to an email or SFTP folder. DataStori can also connect to an application database.
No, customers do not need to buy any IT infrastructure upfront. To run data pipelines on DataStori, a cloud account with AWS, Azure or GCP is needed. This account spins up servers on-demand to run data pipelines and shuts them down after execution. It also provisions other resources like storage and security as defined by the customer, which can be licensed directly from the cloud services provider.
No, DataStori is designed to scale infrastructure on-demand. It has created and run data pipelines on tables up to 100 GB and 30 million rows in size.
Users can set up and run as many data pipelines as they want. The number of data pipelines is only constrained by the parameters defined by the user's cloud services provider.
DataStori charges on the basis of the number of application instances that are connected. This fee has two components - an upfront charge for onboarding and monthly maintenance and support. DataStori does not charge on the size of data ingested or the number of pipelines created and executed - these are part of the infrastructure costs that customers directly pay their cloud services provider.
By default, DataStori creates a lakehouse in the customer's cloud and follows the Medallion architecture for data management. Files are written in the delta format and pushed to the data warehouse of the customer's choice. Users can consume data directly from the lakehouse or from the warehouse. In addition to delta, DataStori can store data in Iceberg, Parquet or CSV formats in the lakehouse.
With DataStori, customer data never leaves their cloud. In addition, DataStori can create data pipelines from an API specification, for any cloud application that a customer uses. This makes it easy for customers to access thousands of cloud applications that are not served by other connectors.
Data transformations in DataStori are limited to deduping and flattening data and encrypting selected columns. Business transformations, analytics and reporting are downstream and not in scope.
Yes, users need to grant limited permissions to DataStori to spin up servers in their cloud. Please contact us or refer to the documentation on our website for more information.
Yes, all credentials are secure with DataStori. We have implemented security safeguards in DataStori including data encryption, virtual network, Multi-Factor Authentication, detailed alerts and logging. For more details, see the DataStori Information Security Guide
on our website.
No, user credentials are encrypted and stored in the application database. They are not human readable when stored.
At all times, customer data resides and moves within their environment - source applications, SharePoint, email, SFTP folders, and destination storage. Data storage and movement are governed by the customer's information security policies, as are access and authentication. With DataStori, customer data is as secure as they want it up to be.
Yes, DataStori can encrypt the columns, or can drop them from the final dataset, based on user input.
No, DataStori cannot view business data. While DataStori orchestrates data pipelines from its cloud, the data movement from source application to storage destination is entirely within the customer's cloud. DataStori can only create and access pipeline setup and execution metadata.
In DataStori, schema evolution and tracking are automated. In addition, data and schema changes can be rolled back to a defined restore point if required.
DataStori runs the following checks on all ingested data for every pipeline execution:
- Data freshness test, to check when the data was last refreshed
- Primary key not null test
- Primary key uniqueness test, to ensure that there are no duplicates in the primary key.
In addition, DataStori has automated retries, logging and alerts built in to make data pipelines more robust.
By default, all data pipelines in DataStori have a concurrency of 1, i.e. at a given time only one instance of the pipeline can run and the rest are queued. In addition, final data is written in delta format and it supports ACID compliance.
DataStori supports the following API authentication mechanisms:
1. API Key
2. Basic Authentication
3. OAuth2 - Client Credentials and Authorization Grant flow
In addition, DataStori can be extended to support custom authentication flows that a source application may require.