Datazone

ID: A unique identifier for the dataset.
Schema: Describes the structure of the dataset, including columns, data types, and lengths.
Transactions: Each execution creates a transaction in the dataset. Transactions can either overwrite or append to the dataset.
Write Behavior: Determines how new data is added to the dataset (e.g., append, overwrite).
Source: The origin of the data, which could be from various sources like transorms, other datasets, databases, files, or external APIs.
Format: The data format used to store the dataset. Currently, it is Delta Lake format.

Usage

Data Integration: Datasets are created and populated through data extraction and execution processes of transforms. They can originate from various sources like relational databases, flat files, or APIs.
Data Transformation: Datasets can be used as inputs and outputs for data transformation processes, where they undergo various operations like filtering, aggregation, or splitting.
Data Analysis: Once transformed, datasets are ready for analysis and can be consumed in data analytics tools or used for further data processing.

Versioning

Each significant change to a dataset, either in its schema or data, should be version controlled to track the dataset's evolution and maintain a history of changes.

Properties

ID: A unique identifier for the dataset.
Schema: Describes the structure of the dataset, including columns, data types, and lengths.
Transactions: Each execution creates a transaction in the dataset. Transactions can either overwrite or append to the dataset.
Write Behavior: Determines how new data is added to the dataset (e.g., append, overwrite).
Source: The origin of the data, which could be from various sources like transorms, other datasets, databases, files, or external APIs.
Format: The data format used to store the dataset. Currently, it is Delta Lake format.

Usage

Data Integration: Datasets are created and populated through data extraction and execution processes of transforms. They can originate from various sources like relational databases, flat files, or APIs.
Data Transformation: Datasets can be used as inputs and outputs for data transformation processes, where they undergo various operations like filtering, aggregation, or splitting.
Data Analysis: Once transformed, datasets are ready for analysis and can be consumed in data analytics tools or used for further data processing.

Versioning

Each significant change to a dataset, either in its schema or data, should be version controlled to track the dataset's evolution and maintain a history of changes.

Properties

ID: A unique identifier for the dataset.
Schema: Describes the structure of the dataset, including columns, data types, and lengths.
Transactions: Each execution creates a transaction in the dataset. Transactions can either overwrite or append to the dataset.
Write Behavior: Determines how new data is added to the dataset (e.g., append, overwrite).
Source: The origin of the data, which could be from various sources like transorms, other datasets, databases, files, or external APIs.
Format: The data format used to store the dataset. Currently, it is Delta Lake format.

Usage

Data Integration: Datasets are created and populated through data extraction and execution processes of transforms. They can originate from various sources like relational databases, flat files, or APIs.
Data Transformation: Datasets can be used as inputs and outputs for data transformation processes, where they undergo various operations like filtering, aggregation, or splitting.
Data Analysis: Once transformed, datasets are ready for analysis and can be consumed in data analytics tools or used for further data processing.

Versioning

Each significant change to a dataset, either in its schema or data, should be version controlled to track the dataset's evolution and maintain a history of changes.

Properties

ID: A unique identifier for the dataset.
Schema: Describes the structure of the dataset, including columns, data types, and lengths.
Transactions: Each execution creates a transaction in the dataset. Transactions can either overwrite or append to the dataset.
Write Behavior: Determines how new data is added to the dataset (e.g., append, overwrite).
Source: The origin of the data, which could be from various sources like transorms, other datasets, databases, files, or external APIs.
Format: The data format used to store the dataset. Currently, it is Delta Lake format.

Usage

Data Integration: Datasets are created and populated through data extraction and execution processes of transforms. They can originate from various sources like relational databases, flat files, or APIs.
Data Transformation: Datasets can be used as inputs and outputs for data transformation processes, where they undergo various operations like filtering, aggregation, or splitting.
Data Analysis: Once transformed, datasets are ready for analysis and can be consumed in data analytics tools or used for further data processing.

Versioning

Each significant change to a dataset, either in its schema or data, should be version controlled to track the dataset's evolution and maintain a history of changes.

Properties

ID: A unique identifier for the dataset.
Schema: Describes the structure of the dataset, including columns, data types, and lengths.
Transactions: Each execution creates a transaction in the dataset. Transactions can either overwrite or append to the dataset.
Write Behavior: Determines how new data is added to the dataset (e.g., append, overwrite).
Source: The origin of the data, which could be from various sources like transorms, other datasets, databases, files, or external APIs.
Format: The data format used to store the dataset. Currently, it is Delta Lake format.

Usage

Data Integration: Datasets are created and populated through data extraction and execution processes of transforms. They can originate from various sources like relational databases, flat files, or APIs.
Data Transformation: Datasets can be used as inputs and outputs for data transformation processes, where they undergo various operations like filtering, aggregation, or splitting.
Data Analysis: Once transformed, datasets are ready for analysis and can be consumed in data analytics tools or used for further data processing.

Versioning

Each significant change to a dataset, either in its schema or data, should be version controlled to track the dataset's evolution and maintain a history of changes.

Pyspark Examples in Transforms

Source

Concepts

Dataset

The Dataset entity in the data platform represents a structured collection of data, usually in tabular form. It is a crucial component of data management and processing within the platform, serving as the primary format for storing, manipulating, and retrieving data.

Properties

ID: A unique identifier for the dataset.
Schema: Describes the structure of the dataset, including columns, data types, and lengths.
Transactions: Each execution creates a transaction in the dataset. Transactions can either overwrite or append to the dataset.
Write Behavior: Determines how new data is added to the dataset (e.g., append, overwrite).
Source: The origin of the data, which could be from various sources like transorms, other datasets, databases, files, or external APIs.
Format: The data format used to store the dataset. Currently, it is Delta Lake format.

Usage

Data Integration: Datasets are created and populated through data extraction and execution processes of transforms. They can originate from various sources like relational databases, flat files, or APIs.
Data Transformation: Datasets can be used as inputs and outputs for data transformation processes, where they undergo various operations like filtering, aggregation, or splitting.
Data Analysis: Once transformed, datasets are ready for analysis and can be consumed in data analytics tools or used for further data processing.

Versioning

Each significant change to a dataset, either in its schema or data, should be version controlled to track the dataset's evolution and maintain a history of changes.

Pyspark Examples in Transforms

Datazone

Quick search…

Concepts

References

Tools

Datazone

Datazone

Datazone

Concepts

Dataset

The Dataset entity in the data platform represents a structured collection of data, usually in tabular form. It is a crucial component of data management and processing within the platform, serving as the primary format for storing, manipulating, and retrieving data.

Properties

Usage

Versioning

Properties

Usage

Versioning

Properties

Usage

Versioning

Properties

Usage

Versioning

Properties

Usage

Versioning

Pyspark Examples in Transforms

Previous

Previous

Previous

Source

Next

Next

Next

© Copyright 2024. All rights reserved.

Concepts

Dataset

The Dataset entity in the data platform represents a structured collection of data, usually in tabular form. It is a crucial component of data management and processing within the platform, serving as the primary format for storing, manipulating, and retrieving data.

Properties

Usage

Versioning

Pyspark Examples in Transforms

Previous

Source

Next

© Copyright 2024. All rights reserved.