Htseq-count/0.11 doesn't allow for multi-threading ( -n NPROCESSES)-- Update to 0.12?

maria_myologie · Novembre 2, 2020, 8:21

After launching a process .nf with htseq-count (HTSeq) and failing I realised that the version of htseq in the cluster is anterior to the one I used locally and succeeded to obtain output files for same bam file.

Two flags that were NOT accepted in the /0.11 version, but are available in the latest one :

--counts_output
--nprocesses

Moreover, even when running the command without these flags, there is a problem with the decoding of bam or gtf file used , but they are fine since I used them locally and had no error.
The error I got :

 Error occured when processing GFF file (line 1 of file /shared/projects/[...]/DM1cl5_1_Aligned.sortedByCoord.out.bam):
 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
  [Exception type: UnicodeDecodeError, raised in codecs.py:322]

A reproducible script can be found in
/shared/home/mkondili/MBNL_DCT/run_htseq.sh

The documentation about the error is quite old, and appears to other tools/packages too, but no other solution provided to overcome it.
So, i suggest if possible to provide the latest version of the package in the cluster.
For any other suggestion, please give feedback.
Thanks in advance !

Francois · Novembre 2, 2020, 9:50

Hello Maria,

The installation process of Htseq 0.12.4 has started here:

It should be available shortly.

Mag · Novembre 4, 2020, 10:52

Just a precision that might be useful (it took me a while to figure it out...) The --nprocesses only works to process in parallel several bam files. As far as I know, HTseq-count on one file is not parallelized. See the discussion here : -n parallel CPUs do not speed up · Issue #7 · htseq/htseq · GitHub.

The -n can process up to n BAM files in parallel to speed up, but cannot use multiple cores on the same file.

maria_myologie · Novembre 4, 2020, 4:59

Thanks ! I ll keep in mind and adjust my script.