Need help splitting data set for Lamb Gut - SRR10963010 and SRR14289618

Disomi · February 25, 2025, 7:52pm

I have two large datasets for the lamb gut and could not run taxonomy profiling or genome assembly, so I attempted to split the datasets into chunks they re-run the workflows but galaxy is not allowing me to run “split fasta” on the entire dataset, it’s only inputting step 10. May I please have help on how to bypass this?

Histories:
SRR14289618 - https://usegalaxy.org/u/disomi/h/lamb-gut-srr14289618
SRR10963010 - Galaxy

ValerieG · February 26, 2025, 5:37pm

Hi Disomi,

####Below is the March 4, 2025 updated response to your question

For splitting a fastq (or fastq.gz) dataset into smaller dataset:

METHOD 1 (recommended and faster then method 2): Use sektk_sample tool (see example below for getting (e.g.) a 10% sample of your dataset).

In parameters, to indicate e.g. 10% subset enter 0.1

METHOD 2: Use Split file tool (see example below for splitting your dataset into 10 parts/chunks)

In parameters, need to actively select ‘FASTQ’
In parameters, need to actively indicate number of chunks to split data into (e.g. 10 in example below)

###Below is the February 26, 2025 response that is for splitting a .fasta file into smaller chunks

In your history I don’t see “split fasta” operations - what happened when you tried running ‘Split Fasta’? What error did you get?

Split Fasta will require a bit of space on Galaxy to store all the split data (which you can delete after, leaving only one subset for use). And, based on your failed ‘Flye’ operation (that failed due to system running out of memory), the problem with Split Fasta may have been memory-related. But without the relevant history and error message it is difficult to troubleshoot.

If you have not done so already to help clear space in Galaxy, go ahead and “Purge All Deleted Content” (see image below) this should help.

Screenshot 2025-02-26 at 12.32.53 PM354×232 15.2 KB
Let us know if this helps or if you need further troubleshooting help.

Topic		Replies	Views
Help! I need to turn this Data set into a krona pie chart Help Galaxy	3	64	November 20, 2024
Willing to help with Public Dataset! Help Galaxy	3	41	November 26, 2024
How to use Cutadapt Help Galaxy	1	18	April 1, 2025
Galaxy Visualization of WGS dataset for C.elegans Help Galaxy	1	38	February 12, 2025
Medaka not working on Assemble Genomes Public Workflow on Galaxy Help Galaxy	1	19	February 18, 2025

Need help splitting data set for Lamb Gut - SRR10963010 and SRR14289618

Related topics