This functionality will allow you to create a dataset from data stores in multiple, smaller datasets.
I’d like to thank both Thomas Capelle (https://github.com/tcapelle) and Xander Dunn (https://github.com/xanderdunn) for their contributions to make this code possible.
This functionality allows you to use multiple numpy arrays instead of a single one, which may be very useful in many practical settings. It’s been tested it with 10k+ datasets and it works well.
For inference, you should create the new metadatasets using the same method you used when you trained it. The you use fastai’s learn.get_preds method to generate predictions:
vocab = alphabet[:10]dsets = []for i inrange(3): size = np.random.randint(50, 150) X = torch.rand(size, 5, 50) y = vocab[torch.randint(0, 10, (size,))] tfms = [None, TSClassification(vocab=vocab)] dset = TSDatasets(X, y, tfms=tfms) dsets.append(dset)metadataset = TSMetaDataset(dsets)dl = TSDataLoader(metadataset)learn = load_learner("test.pkl")learn.get_preds(dl=dl)
There also en easy way to map any particular sample in a batch to the original dataset and id: