比较与tf.data.experimental.CsvDataset的功能差异
tf.data.experimental.CsvDataset
class tf.data.experimental.CsvDataset(
filenames,
record_defaults,
compression_type=None,
buffer_size=None,
header=False,
field_delim=',',
use_quote_delim=True,
na_value='',
select_cols=None,
exclude_cols=None
)
mindspore.dataset.CSVDataset
class mindspore.dataset.CSVDataset(
dataset_files,
field_delim=', ',
column_defaults=None,
column_names=None,
num_samples=None,
num_parallel_workers=None,
shuffle=Shuffle.GLOBAL,
num_shards=None,
shard_id=None,
cache=None
)
更多内容详见mindspore.dataset.CSVDataset。
使用方式
TensorFlow:从CSV文件列表创建数据集,支持解压操作,能够设置缓存大小和跳过文件头。
MindSpore:从CSV文件列表创建数据集,支持设置读取样本的数目。
代码示例
# The following implements CSVDataset with MindSpore.
import mindspore.dataset as ds
dataset_files = ['/tmp/example0.csv',
'/tmp/example1.csv']
dataset = ds.TextFileDataset(dataset_files)
# The following implements CsvDataset with TensorFlow.
import tensorflow as tf
filenames = ['/tmp/example0.csv',
'/tmp/example1.csv']
dataset = tf.data.experimental.CsvDataset(filenames,
[tf.float32,
tf.constant([0.0], dtype=tf.float32),
tf.int32])