Help! I need to turn this Data set into a krona pie chart

ekallegrace · November 18, 2024, 2:01pm

When I try to run my dataset (SRR-it is a very large dataset) into trim QC to trim it, it always stops running at the Fastp job. An error sign shows saying that there isn’t enough memory for the job to run. So i realized that i have to split the dataset. My ultimate goal is to get it into a krona pie chart to see the types of bacteria present in the data set. Chapter 5 Taxonomy Profiling | BioDIGS miniCURE

Here is my history: https://usegalaxy.org/u/grace_ekalle1/h/bbyform-irp

ValerieG · November 18, 2024, 3:09pm

Hi Grace,

####Below is the March 5, 2025 updated response to your question

For splitting a fastq (or fastq.gz) dataset into a smaller dataset:

METHOD 1 (recommended and faster then method 2): Use sektk_sample tool (see example below for getting (e.g.) a 10% sample of a dataset).

For e.g. 10% of your dataset, in parameters, enter 0.1

METHOD 2: Use Split file tool (see example below for splitting your dataset into 10 parts/chunks)

In parameters, need to actively select ‘FASTQ’
In parameters, need to actively indicate number of chunks to split data into (e.g. 10 in example below)

#Below is the earlier November response to your question for splitting a FASTA file:

Hi. Your dataset might just be too large to handle with the given memory. For a workaround try splitting your dataset into smaller chunks:

Use Split Fasta tool on Galaxy to split data into smaller number of chunks (auto suggested is 10 chunks).
Once the Split fasta operation is complete, use Extract dataset tool to extract one of the chunks - select to extract the first dataset as the simplest option (all chunks in theory are ‘equivalent’ in a sense of being a random assortment of sequences from the dataset.)
Go ahead and try using fastp on this smaller dataset.

ekallegrace · November 20, 2024, 2:42pm

I think I figured it out! Here is my history
:Galaxy
The data I Imported: SRR29980925

If you could please double check the workflow, i would really appreciate it. Thank you for all your advice!

ValerieG · November 20, 2024, 3:17pm

Good job!

I see you have successfully split your file and did the taxonomy workflow. Of note, since your original dataset was super large, the resulting 10% chunk ended up being also very large (18.7GB!!). This is most likely why it took a long time to do all the steps. Next time I recommend splitting your very large dataset into more than 10 chunks, say 30 or even 50 chunks. This will speed up your workflow and hopefully should produce similar results with taxonomy classification.

Another suggestion is to remove replicate galaxy entries in your history, since they occupy lots of your precious space there. I think you have replicate input SRR files there, and replicate split fasta entries.

Topic		Replies	Views
Need help splitting data set for Lamb Gut - SRR10963010 and SRR14289618 Help Galaxy	1	39	February 26, 2025
Galaxy Visualization of WGS dataset for C.elegans Help Galaxy	1	38	February 12, 2025
How to use Cutadapt Help Galaxy	1	18	April 1, 2025
Need Coding Help Help R	5	39	March 6, 2025
Willing to help with Public Dataset! Help Galaxy	3	40	November 26, 2024

Help! I need to turn this Data set into a krona pie chart

Related topics