Further performance improvements for build_asset_reconciliation_sensor.
Dagster now allows you to backfill asset selections that include mapped partition definitions, such as a daily asset which rolls up into a weekly asset, as long as the root assets in your selection share a partition definition.
Dagit now includes information about the cause of an asset’s staleness.
Improved the error message for non-matching cron schedules in TimeWindowPartitionMappings with offsets. (Thanks Sean Han!)
[dagster-aws] The EcsRunLauncher now allows you to configure the runtimePlatform field for the task definitions of the runs that it launches, allowing it to launch runs using Windows Docker images.
[dagster-azure] Add support for DefaultAzureCredential for adls2_resource (Thanks Martin Picard!)
[dagster-databricks] Added op factories to create ops for running existing Databricks jobs (create_databricks_run_now_op), as well as submitting one-off Databricks jobs (create_databricks_submit_run_op). See the new Databricks guide for more details.
[dagster-duckdb-polars] Added a dagster-duckdb-polars library that includes a DuckDBPolarsTypeHandler for use with build_duckdb_io_manager, which allows loading / storing Polars DataFrames from/to DuckDB. (Thanks Pezhman Zarabadi-Poor!)
[dagster-gcp-pyspark] New PySpark TypeHandler for the BigQuery I/O manager. Store and load your PySpark DataFrames in BigQuery using bigquery_pyspark_io_manager.
[dagster-snowflake][dagster-duckdb] The Snowflake and DuckDB IO managers can now load multiple partitions in a single step - e.g. when a non-partitioned asset depends on a partitioned asset or a single partition of an asset depends on multiple partitions of an upstream asset. Loading occurs using a single SQL query and returns a single DataFrame.
[dagster-k8s] The Helm chart now supports the full kubernetes env var spec for user code deployments. Example:
Previously, if an AssetSelection which matched no assets was passed into define_asset_job, the resulting job would target all assets in the repository. This has been fixed.
Fixed a bug that caused the UI to show an error if you tried to preview a future schedule tick for a schedule built using build_schedule_from_partitioned_job.
When a non-partitioned non-asset job has an input that comes from a partitioned SourceAsset, we now load all partitions of that asset.
Updated the fs_io_manager to store multipartitioned materializations in directory levels by dimension. This resolves a bug on windows where multipartitioned materializations could not be stored with the fs_io_manager.
Schedules and sensors previously timed out when attempting to yield many multipartitioned run requests. This has been fixed.
Fixed a bug where context.partition_key would raise an error when executing on a partition range within a single run via Dagit.
Fixed a bug that caused the default IO manager to incorrectly raise type errors in some situations with partitioned inputs.
[ui] Fixed a bug where partition health would fail to display for certain time window partitions definitions with positive offsets.
[ui] Always show the “Reload all” button on the code locations list page, to avoid an issue where the button was not available when adding a second location.
[ui] Fixed a bug where users running multiple replicas of dagit would see repeated Definitions reloaded messages on fresh page loads.
[ui] The asset graph now shows only the last path component of linked assets for better readability.
[ui] The op metadata panel now longer capitalizes metadata keys
[ui] The asset partitions page, asset sidebar and materialization dialog are significantly smoother when viewing assets with a large number of partitions (100k+)
[dagster-gcp-pandas] The Pandas TypeHandler for BigQuery now respects user provided location information.
[dagster-snowflake] ProgrammingError was imported from the wrong library, this has been fixed. Thanks @herbert-allium!
The new @graph_asset and @graph_multi_asset decorators make it more ergonomic to define graph-backed assets.
Dagster will auto-infer dependency relationships between single-dimensionally partitioned assets and multipartitioned assets, when the single-dimensional partitions definition is a dimension of the MultiPartitionsDefinition.
A new Test sensor / Test schedule button that allows you to perform a dry-run of your sensor / schedule. Check out the docs on this functionality here for sensors and here for schedules.
[dagit] Added (back) tag autocompletion in the runs filter, now with improved query performance.
[dagit] The Dagster libraries and their versions that were used when loading definitions can now be viewed in the actions menu for each code location.
New bigquery_pandas_io_manager can store and load Pandas dataframes in BigQuery.
[dagster-snowflake, dagster-duckdb] SnowflakeIOManagers and DuckDBIOManagers can now default to loading inputs as a specified type if a type annotation does not exist for the input.
[dagster-dbt] Added the ability to use the “state:” selector
[dagster-k8s] The Helm chart now supports the full kubernetes env var spec for Dagit and the Daemon. E.g.
The FreshnessPolicy object now supports a cron_schedule_timezone argument.
AssetsDefinition.from_graph now supports a freshness_policies_by_output_name parameter.
The @asset_sensor will now display an informative SkipReason when no new materializations have been created since the last sensor tick.
AssetsDefinition now has a to_source_asset method, which returns a representation of this asset as a SourceAsset.
You can now designate assets as inputs to ops within a graph or graph-based job. E.g.
from dagster import asset, job, op
@assetdefemails_to_send():...@opdefsend_emails(emails)->None:...@jobdefsend_emails_job():
send_emails(emails_to_send.to_source_asset())
Added a --dagit-host/-h argument to the dagster dev command to allow customization of the host where Dagit runs.
[dagster-snowflake, dagster-duckdb] Database I/O managers (Snowflake, DuckDB) now support static partitions, multi-partitions, and dynamic partitions.
Previously, if a description was provided for an op that backed a multi-asset, the op’s description would override the descriptions in Dagit for the individual assets. This has been fixed.
Sometimes, when applying an input_manager_key to an asset’s input, incorrect resource config could be used when loading that input. This has been fixed.
Previously, the backfill page errored when partitions definitions changed for assets that had been backfilled. This has been fixed.
When displaying materialized partitions for multipartitioned assets, Dagit would error if a dimension had zero partitions. This has been fixed.
[dagster-k8s] Fixed an issue where setting runK8sConfig in the Dagster Helm chart would not pass configuration through to pods launched using the k8s_job_executor.
[dagster-k8s] Previously, using the execute_k8s_job op downstream of a dynamic output would result in k8s jobs with duplicate names being created. This has been fixed.
[dagster-snowflake] Previously, if the schema for storing outputs didn’t exist, the Snowflake I/O manager would fail. Now it creates the schema.
Removed the experimental, undocumented asset_key, asset_partitions, and asset_partitions_defs arguments on Out.
@multi_asset no longer accepts Out values in the dictionary passed to its outs argument. This was experimental and deprecated. Instead, use AssetOut.
The experimental, undocumented top_level_resources argument to the repository decorator has been renamed to _top_level_resources to emphasize that it should not be set manually.
load_asset_values now accepts resource configuration (thanks @Nintorac!)
Previously, when using the UPathIOManager, paths with the "." character in them would be incorrectly truncated, which could result in multiple distinct objects being written to the same path. This has been fixed. (Thanks @spenczar!)
Assets with time-window PartitionsDefinitions (e.g. HourlyPartitionsDefinition, DailyPartitionsDefinition) may now have a FreshnessPolicy.
[dagster-dbt] When using load_assets_from_dbt_project or load_assets_from_dbt_manifest with dbt-core>=1.4, AssetMaterialization events will be emitted as the dbt command executes, rather than waiting for dbt to complete before emitting events.
[dagster-aws] When run monitoring detects that a run unexpectedly crashed or failed to start, an error message in the run’s event log will include log messages from the ECS task for that run to help diagnose the cause of the failure.
[dagster-airflow] added make_ephemeral_airflow_db_resource which returns a ResourceDefinition for a local only airflow database for use in migrated airflow DAGs
Made some performance improvements for job run queries which can be applied by running dagster instance migrate.
[dagit] System tags (code + logical versions) are now shown in the asset sidebar and on the asset details page.
[dagit] Source assets that have never been observed are presented more clearly on the asset graph.
[dagit] The number of materialized and missing partitions are shown on the asset graph and in the asset catalog for partitioned assets.
[dagit] Databricks-backed assets are now shown on the asset graph with a small “Databricks” logo.
Fixed a bug where materializations of part of the asset graph did not construct required resource keys correctly.
Fixed an issue where observable_source_asset incorrectly required its function to have a context argument.
Fixed an issue with serialization of freshness policies, which affected cacheable assets that included these policies such as those from dagster-airbyte
[dagster-dbt] Previously, the dagster-dbt integration was incompatible with dbt-core>=1.4. This has been fixed.
[dagster-dbt] load_assets_from_dbt_cloud_job will now avoid unnecessarily generating docs when compiling a manifest for the job. Compile runs will no longer be kicked off for jobs not managed by this integration.
Previously for multipartitioned assets, context.asset_partition_key returned a string instead of a MultiPartitionKey. This has been fixed.
[dagster-k8s] Fixed an issue where pods launched by the k8s_job_executor would sometimes unexpectedly fail due to transient 401 errors in certain kubernetes clusters.
Fix a bug with nth-weekday-of-the-month handling in cron schedules.
[dagster-airflow] load_assets_from_airflow_dag no longer creates airflow db resource definitions, as a user you will need to provide them on Definitions directly
The dagster-airflow library as been moved to 1.x.x to denote the stability of its api's going forward.
[dagster-airflow] make_schedules_and_jobs_from_airflow_dag_bag has been added to allow for more fine grained composition of your transformed airflow DAGs into Dagster.
[dagster-airflow] Airflow dag task retries and retry_delay configuration are now converted to op RetryPolicies with all make_dagster_* apis.
[dagster-airflow] The use_airflow_template_context, mock_xcom and use_ephemeral_airflow_db params have been dropped, by default all make_dagster_* apis now use a run-scoped airflow db, similiar to how use_ephemeral_airflow_db worked.
[dagster-airflow] make_airflow_dag has been removed.
[dagster-airflow] make_airflow_dag_for_operator has been removed.
[dagster-airflow] make_airflow_dag_containerized has been removed.
[dagster-airflow] airflow_operator_to_op has been removed.
[dagster-airflow] make_dagster_repo_from_airflow_dags_path has been removed.
[dagster-airflow] make_dagster_repo_from_airflow_dag_bag has been removed.
[dagster-airflow] make_dagster_repo_from_airflow_example_dags has been removed.
[dagster-airflow] The naming convention for ops generated from airflow tasks has been changed to ${dag_id}__${task_id} from airflow_${task_id}_${unique_int}.
[dagster-airflow] The naming convention for jobs generated from airflow dags has been changed to ${dag_id} from airflow_${dag_id}.