DataStori FAQs

How DataStori works | Data security and privacy | Advanced features

How DataStori works

Where is DataStori hosted?

DataStori is hosted on Amazon Web Services (AWS) servers in the US East Region.

How does DataStori execute data pipelines?

DataStori manages data pipelines from its cloud (AWS US East) while pipelines are executed in the customer's cloud. So, customer data remains in their cloud and is not exposed to DataStori or leave their IT environment at any stage of the process.

How does DataStori extract data from source applications?

DataStori fetches data from a source application by connecting to its APIs, or by reading CSV files exported to an email or SFTP folder. DataStori can also connect to an application database.

Do customers need to buy servers, storage or any other software to run DataStori?

No, customers do not need to buy any IT infrastructure upfront. To run data pipelines on DataStori, a cloud account with AWS, Azure or GCP is needed. This account spins up servers on-demand to run data pipelines and shuts them down after execution. It also provisions other resources like storage and security as defined by the customer, which can be licensed directly from the cloud services provider.

Is there any size limit on the data DataStori can handle?

No, DataStori is designed to scale infrastructure on-demand. It has created and run data pipelines on tables up to 100 GB and 30 million rows in size.

How many data pipelines can a user run?

Users can set up and run as many data pipelines as they want. The number of data pipelines is only constrained by the parameters defined by the user's cloud services provider.

How does DataStori charge and what are the costs of running data pipelines?

DataStori charges on the basis of the number of application instances that are connected. This fee has two components - an upfront charge for onboarding and monthly maintenance and support. DataStori does not charge on the size of data ingested or the number of pipelines created and executed - these are part of the infrastructure costs that customers directly pay their cloud services provider.

Where does DataStori write the final data?

By default, DataStori creates a lakehouse in the customer's cloud and follows the Medallion architecture for data management. Files are written in the delta format and pushed to the data warehouse of the customer's choice. Users can consume data directly from the lakehouse or from the warehouse. In addition to delta, DataStori can store data in Iceberg, Parquet or CSV formats in the lakehouse.

How is DataStori different from other ELT providers in the market?

With DataStori, customer data never leaves their cloud. In addition, DataStori can create data pipelines from an API specification, for any cloud application that a customer uses. This makes it easy for customers to access thousands of cloud applications that are not served by other connectors.

Does DataStori perform any data transformations?

Data transformations in DataStori are limited to deduping and flattening data and encrypting selected columns. Business transformations, analytics and reporting are downstream and not in scope.

Data security and privacy

Do users need to grant DataStori access to their cloud?

Yes, users need to grant limited permissions to DataStori to spin up servers in their cloud. Please contact us or refer to the documentation on our website for more information.

Are API and application credentials secure in DataStori?

Yes, all credentials are secure with DataStori. We have implemented security safeguards in DataStori including data encryption, virtual network, Multi-Factor Authentication, detailed alerts and logging. For more details, see the DataStori Information Security Guide on our website.

Can the DataStori admin see user credentials or data?

No, user credentials are encrypted and stored in the application database. They are not human readable when stored.

How does DataStori ensure data security and confidentiality? Does it comply with the customer's security policies?

At all times, customer data resides and moves within their environment - source applications, SharePoint, email, SFTP folders, and destination storage. Data storage and movement are governed by the customer's information security policies, as are access and authentication. With DataStori, customer data is as secure as they want it up to be.

Can DataStori encrypt data before it is stored in the warehouse?

Yes, DataStori can encrypt the columns, or can drop them from the final dataset, based on user input.

Can DataStori view business data?

No, DataStori cannot view business data. While DataStori orchestrates data pipelines from its cloud, the data movement from source application to storage destination is entirely within the customer's cloud. DataStori can only create and access pipeline setup and execution metadata.

Advanced features

How does DataStori handle schema changes?

In DataStori, schema evolution and tracking are automated. In addition, data and schema changes can be rolled back to a defined restore point if required.

How does DataStori ensure data quality?

DataStori runs the following checks on all ingested data for every pipeline execution:
- Data freshness test, to check when the data was last refreshed
- Primary key not null test
- Primary key uniqueness test, to ensure that there are no duplicates in the primary key.

In addition, DataStori has automated retries, logging and alerts built in to make data pipelines more robust.

How is concurrency handled in DataStori?

By default, all data pipelines in DataStori have a concurrency of 1, i.e. at a given time only one instance of the pipeline can run and the rest are queued. In addition, final data is written in delta format and it supports ACID compliance.

Which API authentication protocols are handled by DataStori?

DataStori supports the following API authentication mechanisms:
1. API Key
2. Basic Authentication
3. OAuth2 - Client Credentials and Authorization Grant flow

In addition, DataStori can be extended to support custom authentication flows that a source application may require.

Still have questions? Check the product documentation or write to contact@datastori.io