- Keterangan :
D4RL adalah tolok ukur sumber terbuka untuk pembelajaran penguatan offline. Ini menyediakan lingkungan dan kumpulan data standar untuk pelatihan dan algoritma benchmarking.
Kumpulan data mengikuti format RLDS untuk mewakili langkah dan episode.
Dokumentasi Tambahan : Jelajahi Makalah Dengan Kode
Deskripsi konfigurasi : Lihat detail lebih lanjut tentang tugas dan versinya di https://github.com/rail-berkeley/d4rl/wiki/Tasks#gym
Kode sumber :
tfds.d4rl.d4rl_mujoco_walker2d.D4rlMujocoWalker2dVersi :
-
1.0.0: Rilis awal. -
1.1.0: Ditambahkan is_last. -
1.2.0(default): Diperbarui untuk memperhitungkan observasi berikutnya.
-
Kunci yang diawasi (Lihat dokumen
as_supervised):NoneGambar ( tfds.show_examples ): Tidak didukung.
Kutipan :
@misc{fu2020d4rl,
title={D4RL: Datasets for Deep Data-Driven Reinforcement Learning},
author={Justin Fu and Aviral Kumar and Ofir Nachum and George Tucker and Sergey Levine},
year={2020},
eprint={2004.07219},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
d4rl_mujoco_walker2d/v0-expert (konfigurasi default)
Ukuran unduhan :
78.41 MiBUkuran kumpulan data :
98.64 MiBCache otomatis ( dokumentasi ): Ya
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 1.628 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v0-medium
Ukuran unduhan :
80.83 MiBUkuran kumpulan data :
99.72 MiBCache otomatis ( dokumentasi ): Ya
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 5.315 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v0-medium-expert
Ukuran unduhan :
159.24 MiBUkuran kumpulan data :
198.36 MiBCache otomatis ( dokumentasi ): Hanya ketika
shuffle_files=False(kereta)Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 6.943 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v0-campuran
Ukuran unduhan :
8.42 MiBUkuran kumpulan data :
10.06 MiBCache otomatis ( dokumentasi ): Ya
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 501 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v0-acak
Ukuran unduhan :
78.41 MiBUkuran kumpulan data :
112.04 MiBCache otomatis ( dokumentasi ): Ya
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 50.988 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v1-expert
Ukuran unduhan :
143.06 MiBUkuran kumpulan data :
452.72 MiBCache otomatis ( dokumentasi ): Tidak
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 1.003 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'policy': FeaturesDict({
'fc0': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 17), dtype=float32),
}),
'fc1': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 256), dtype=float32),
}),
'last_fc': FeaturesDict({
'bias': Tensor(shape=(6,), dtype=float32),
'weight': Tensor(shape=(6, 256), dtype=float32),
}),
'last_fc_log_std': FeaturesDict({
'bias': Tensor(shape=(6,), dtype=float32),
'weight': Tensor(shape=(6, 256), dtype=float32),
}),
'nonlinearity': string,
'output_distribution': string,
}),
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float32,
'qpos': Tensor(shape=(9,), dtype=float32),
'qvel': Tensor(shape=(9,), dtype=float32),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| algoritma | Tensor | rangkaian | ||
| pengulangan | Tensor | int32 | ||
| kebijakan | FiturDict | |||
| kebijakan/fc0 | FiturDict | |||
| kebijakan/fc0/bias | Tensor | (256,) | float32 | |
| kebijakan/fc0/bobot | Tensor | (256, 17) | float32 | |
| kebijakan/fc1 | FiturDict | |||
| kebijakan/fc1/bias | Tensor | (256,) | float32 | |
| kebijakan/fc1/bobot | Tensor | (256, 256) | float32 | |
| kebijakan/last_fc | FiturDict | |||
| kebijakan/last_fc/bias | Tensor | (6,) | float32 | |
| policy/last_fc/weight | Tensor | (6, 256) | float32 | |
| kebijakan/last_fc_log_std | FiturDict | |||
| kebijakan/last_fc_log_std/bias | Tensor | (6,) | float32 | |
| policy/last_fc_log_std/weight | Tensor | (6, 256) | float32 | |
| kebijakan/nonlinier | Tensor | rangkaian | ||
| kebijakan/output_distribusi | Tensor | rangkaian | ||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float32 | ||
| langkah/info/qpos | Tensor | (9,) | float32 | |
| langkah/info/qvel | Tensor | (9,) | float32 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v1-medium
Ukuran unduhan :
144.23 MiBUkuran kumpulan data :
510.08 MiBCache otomatis ( dokumentasi ): Tidak
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 1.207 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'policy': FeaturesDict({
'fc0': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 17), dtype=float32),
}),
'fc1': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 256), dtype=float32),
}),
'last_fc': FeaturesDict({
'bias': Tensor(shape=(6,), dtype=float32),
'weight': Tensor(shape=(6, 256), dtype=float32),
}),
'last_fc_log_std': FeaturesDict({
'bias': Tensor(shape=(6,), dtype=float32),
'weight': Tensor(shape=(6, 256), dtype=float32),
}),
'nonlinearity': string,
'output_distribution': string,
}),
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float32,
'qpos': Tensor(shape=(9,), dtype=float32),
'qvel': Tensor(shape=(9,), dtype=float32),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| algoritma | Tensor | rangkaian | ||
| pengulangan | Tensor | int32 | ||
| kebijakan | FiturDict | |||
| kebijakan/fc0 | FiturDict | |||
| kebijakan/fc0/bias | Tensor | (256,) | float32 | |
| kebijakan/fc0/bobot | Tensor | (256, 17) | float32 | |
| kebijakan/fc1 | FiturDict | |||
| kebijakan/fc1/bias | Tensor | (256,) | float32 | |
| kebijakan/fc1/bobot | Tensor | (256, 256) | float32 | |
| kebijakan/last_fc | FiturDict | |||
| kebijakan/last_fc/bias | Tensor | (6,) | float32 | |
| policy/last_fc/weight | Tensor | (6, 256) | float32 | |
| kebijakan/last_fc_log_std | FiturDict | |||
| kebijakan/last_fc_log_std/bias | Tensor | (6,) | float32 | |
| policy/last_fc_log_std/weight | Tensor | (6, 256) | float32 | |
| kebijakan/nonlinier | Tensor | rangkaian | ||
| kebijakan/output_distribusi | Tensor | rangkaian | ||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float32 | ||
| langkah/info/qpos | Tensor | (9,) | float32 | |
| langkah/info/qvel | Tensor | (9,) | float32 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v1-medium-expert
Ukuran unduhan :
286.69 MiBUkuran kumpulan data :
342.46 MiBCache otomatis ( dokumentasi ): Tidak
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 2.209 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float32,
'qpos': Tensor(shape=(9,), dtype=float32),
'qvel': Tensor(shape=(9,), dtype=float32),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float32 | ||
| langkah/info/qpos | Tensor | (9,) | float32 | |
| langkah/info/qvel | Tensor | (9,) | float32 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v1-medium-replay
Ukuran unduhan :
84.37 MiBUkuran kumpulan data :
52.10 MiBCache otomatis ( dokumentasi ): Ya
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 1.093 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float64),
'discount': float64,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(9,), dtype=float64),
'qvel': Tensor(shape=(9,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float64),
'reward': float64,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| algoritma | Tensor | rangkaian | ||
| pengulangan | Tensor | int32 | ||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float64 | |
| langkah/diskon | Tensor | float64 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float64 | ||
| langkah/info/qpos | Tensor | (9,) | float64 | |
| langkah/info/qvel | Tensor | (9,) | float64 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float64 | |
| langkah/hadiah | Tensor | float64 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v1-putar ulang penuh
Ukuran unduhan :
278.95 MiBUkuran kumpulan data :
171.66 MiBCache otomatis ( dokumentasi ): Hanya ketika
shuffle_files=False(kereta)Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 1.888 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float64),
'discount': float64,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(9,), dtype=float64),
'qvel': Tensor(shape=(9,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float64),
'reward': float64,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| algoritma | Tensor | rangkaian | ||
| pengulangan | Tensor | int32 | ||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float64 | |
| langkah/diskon | Tensor | float64 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float64 | ||
| langkah/info/qpos | Tensor | (9,) | float64 | |
| langkah/info/qvel | Tensor | (9,) | float64 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float64 | |
| langkah/hadiah | Tensor | float64 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v1-acak
Ukuran unduhan :
132.36 MiBUkuran kumpulan data :
192.06 MiBCache otomatis ( dokumentasi ): Hanya ketika
shuffle_files=False(kereta)Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 48.790 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float32,
'qpos': Tensor(shape=(9,), dtype=float32),
'qvel': Tensor(shape=(9,), dtype=float32),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float32 | ||
| langkah/info/qpos | Tensor | (9,) | float32 | |
| langkah/info/qvel | Tensor | (9,) | float32 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v2-expert
Ukuran unduhan :
219.89 MiBUkuran kumpulan data :
452.16 MiBCache otomatis ( dokumentasi ): Tidak
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 1.001 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'policy': FeaturesDict({
'fc0': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 17), dtype=float32),
}),
'fc1': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 256), dtype=float32),
}),
'last_fc': FeaturesDict({
'bias': Tensor(shape=(6,), dtype=float32),
'weight': Tensor(shape=(6, 256), dtype=float32),
}),
'last_fc_log_std': FeaturesDict({
'bias': Tensor(shape=(6,), dtype=float32),
'weight': Tensor(shape=(6, 256), dtype=float32),
}),
'nonlinearity': string,
'output_distribution': string,
}),
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(9,), dtype=float64),
'qvel': Tensor(shape=(9,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| algoritma | Tensor | rangkaian | ||
| pengulangan | Tensor | int32 | ||
| kebijakan | FiturDict | |||
| kebijakan/fc0 | FiturDict | |||
| kebijakan/fc0/bias | Tensor | (256,) | float32 | |
| kebijakan/fc0/bobot | Tensor | (256, 17) | float32 | |
| kebijakan/fc1 | FiturDict | |||
| kebijakan/fc1/bias | Tensor | (256,) | float32 | |
| kebijakan/fc1/bobot | Tensor | (256, 256) | float32 | |
| kebijakan/last_fc | FiturDict | |||
| kebijakan/last_fc/bias | Tensor | (6,) | float32 | |
| kebijakan/last_fc/weight | Tensor | (6, 256) | float32 | |
| kebijakan/last_fc_log_std | FiturDict | |||
| kebijakan/last_fc_log_std/bias | Tensor | (6,) | float32 | |
| policy/last_fc_log_std/weight | Tensor | (6, 256) | float32 | |
| kebijakan/nonlinier | Tensor | rangkaian | ||
| kebijakan/output_distribusi | Tensor | rangkaian | ||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float64 | ||
| langkah/info/qpos | Tensor | (9,) | float64 | |
| langkah/info/qvel | Tensor | (9,) | float64 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v2-putar ulang penuh
Ukuran unduhan :
271.91 MiBUkuran kumpulan data :
171.66 MiBCache otomatis ( dokumentasi ): Hanya ketika
shuffle_files=False(kereta)Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 1.888 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(9,), dtype=float64),
'qvel': Tensor(shape=(9,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| algoritma | Tensor | rangkaian | ||
| pengulangan | Tensor | int32 | ||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float64 | ||
| langkah/info/qpos | Tensor | (9,) | float64 | |
| langkah/info/qvel | Tensor | (9,) | float64 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v2-medium
Ukuran unduhan :
221.50 MiBUkuran kumpulan data :
505.58 MiBCache otomatis ( dokumentasi ): Tidak
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 1.191 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'policy': FeaturesDict({
'fc0': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 17), dtype=float32),
}),
'fc1': FeaturesDict({
'bias': Tensor(shape=(256,), dtype=float32),
'weight': Tensor(shape=(256, 256), dtype=float32),
}),
'last_fc': FeaturesDict({
'bias': Tensor(shape=(6,), dtype=float32),
'weight': Tensor(shape=(6, 256), dtype=float32),
}),
'last_fc_log_std': FeaturesDict({
'bias': Tensor(shape=(6,), dtype=float32),
'weight': Tensor(shape=(6, 256), dtype=float32),
}),
'nonlinearity': string,
'output_distribution': string,
}),
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(9,), dtype=float64),
'qvel': Tensor(shape=(9,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| algoritma | Tensor | rangkaian | ||
| pengulangan | Tensor | int32 | ||
| kebijakan | FiturDict | |||
| kebijakan/fc0 | FiturDict | |||
| kebijakan/fc0/bias | Tensor | (256,) | float32 | |
| kebijakan/fc0/bobot | Tensor | (256, 17) | float32 | |
| kebijakan/fc1 | FiturDict | |||
| kebijakan/fc1/bias | Tensor | (256,) | float32 | |
| kebijakan/fc1/bobot | Tensor | (256, 256) | float32 | |
| kebijakan/last_fc | FiturDict | |||
| kebijakan/last_fc/bias | Tensor | (6,) | float32 | |
| policy/last_fc/weight | Tensor | (6, 256) | float32 | |
| kebijakan/last_fc_log_std | FiturDict | |||
| kebijakan/last_fc_log_std/bias | Tensor | (6,) | float32 | |
| policy/last_fc_log_std/weight | Tensor | (6, 256) | float32 | |
| kebijakan/nonlinier | Tensor | rangkaian | ||
| kebijakan/output_distribusi | Tensor | rangkaian | ||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float64 | ||
| langkah/info/qpos | Tensor | (9,) | float64 | |
| langkah/info/qvel | Tensor | (9,) | float64 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v2-medium-expert
Ukuran unduhan :
440.79 MiBUkuran kumpulan data :
342.45 MiBCache otomatis ( dokumentasi ): Tidak
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 2.191 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(9,), dtype=float64),
'qvel': Tensor(shape=(9,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float64 | ||
| langkah/info/qpos | Tensor | (9,) | float64 | |
| langkah/info/qvel | Tensor | (9,) | float64 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v2-medium-replay
Ukuran unduhan :
82.32 MiBUkuran kumpulan data :
52.10 MiBCache otomatis ( dokumentasi ): Ya
Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 1.093 |
- Struktur fitur :
FeaturesDict({
'algorithm': string,
'iteration': int32,
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(9,), dtype=float64),
'qvel': Tensor(shape=(9,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| algoritma | Tensor | rangkaian | ||
| pengulangan | Tensor | int32 | ||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float64 | ||
| langkah/info/qpos | Tensor | (9,) | float64 | |
| langkah/info/qvel | Tensor | (9,) | float64 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):
d4rl_mujoco_walker2d/v2-acak
Ukuran unduhan :
206.10 MiBUkuran kumpulan data :
192.11 MiBCache otomatis ( dokumentasi ): Hanya ketika
shuffle_files=False(kereta)Perpecahan :
| Membelah | Contoh |
|---|---|
'train' | 48.908 |
- Struktur fitur :
FeaturesDict({
'steps': Dataset({
'action': Tensor(shape=(6,), dtype=float32),
'discount': float32,
'infos': FeaturesDict({
'action_log_probs': float64,
'qpos': Tensor(shape=(9,), dtype=float64),
'qvel': Tensor(shape=(9,), dtype=float64),
}),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'observation': Tensor(shape=(17,), dtype=float32),
'reward': float32,
}),
})
- Dokumentasi fitur :
| Fitur | Kelas | Membentuk | Tipe D | Keterangan |
|---|---|---|---|---|
| FiturDict | ||||
| Langkah | Himpunan data | |||
| langkah/tindakan | Tensor | (6,) | float32 | |
| langkah/diskon | Tensor | float32 | ||
| langkah/info | FiturDict | |||
| langkah/info/action_log_probs | Tensor | float64 | ||
| langkah/info/qpos | Tensor | (9,) | float64 | |
| langkah/info/qvel | Tensor | (9,) | float64 | |
| langkah/adalah_pertama | Tensor | bodoh | ||
| langkah/adalah_terakhir | Tensor | bodoh | ||
| langkah/is_terminal | Tensor | bodoh | ||
| langkah/pengamatan | Tensor | (17,) | float32 | |
| langkah/hadiah | Tensor | float32 |
- Contoh ( tfds.as_dataframe ):