Need help splitting data set for Lamb Gut - SRR10963010 and SRR14289618

I have two large datasets for the lamb gut and could not run taxonomy profiling or genome assembly, so I attempted to split the datasets into chunks they re-run the workflows but galaxy is not allowing me to run “split fasta” on the entire dataset, it’s only inputting step 10. May I please have help on how to bypass this?

Histories:
SRR14289618 - Galaxy
SRR10963010 - Galaxy

Hi Disomi,

####Below is the March 4, 2025 updated response to your question

For splitting a fastq (or fastq.gz) dataset into smaller dataset:

METHOD 1 (recommended and faster then method 2): Use sektk_sample tool (see example below for getting (e.g.) a 10% sample of your dataset).

  • In parameters, to indicate e.g. 10% subset enter 0.1

METHOD 2: Use Split file tool (see example below for splitting your dataset into 10 parts/chunks)

  • In parameters, need to actively select ‘FASTQ’
  • In parameters, need to actively indicate number of chunks to split data into (e.g. 10 in example below)

###Below is the February 26, 2025 response that is for splitting a .fasta file into smaller chunks

  1. In your history I don’t see “split fasta” operations - what happened when you tried running ‘Split Fasta’? What error did you get?

Split Fasta will require a bit of space on Galaxy to store all the split data (which you can delete after, leaving only one subset for use). And, based on your failed ‘Flye’ operation (that failed due to system running out of memory), the problem with Split Fasta may have been memory-related. But without the relevant history and error message it is difficult to troubleshoot.

  1. If you have not done so already to help clear space in Galaxy, go ahead and “Purge All Deleted Content” (see image below) this should help.

  2. Let us know if this helps or if you need further troubleshooting help.