xerces-c can not deal with high GBK file

Kirby Zhou kirbyzhou at sohu-rd.com
Mon Jul 26 10:43:39 UTC 2010


:-)

And if you decide to take ICU instead of IconvGNU in the xerces,
There seems another bug:

ICUTranscoder::transcodeFrom 

 495     UErrorCode  err = U_ZERO_ERROR;
 496     ucnv_toUnicode
 497     (
 498         fConverter
 499         , &startTarget
 500         , startTarget + maxChars
 501         , (const char**)&startSrc
 502         , (const char*)endSrc
 503         , (fFixed ? 0 : (int32_t*)fSrcOffsets)
 504         , false
 505         , &err
 506     );

There seems need a mutex to protect fConverter.
ICULCPTranscoder::calcRequiredSize called ' XMLMutexLock
lockConverter(&fMutex); ' to do it.
I do not known why the coder of xerces do not do the same thing here.

 

Regards,
   Kirby Zhou    
   from   SOHU-RD   +86-10-6272-8261


-----Original Message-----
From: epel-devel-list-bounces at redhat.com
[mailto:epel-devel-list-bounces at redhat.com] On Behalf Of Stephen John
Smoogen
Sent: Saturday, July 24, 2010 1:21 AM
To: EPEL development disccusion
Subject: Re: xerces-c can not deal with high GBK file

Thanks for the bug report. will see what we can do with it.

On Fri, Jul 23, 2010 at 01:41, Kirby Zhou <kirbyzhou at sohu-rd.com> wrote:
> xerces-c-3.0.1/2.7.0 can not deal with high GBK file
>
> There is a bug inside util/Transcoders/IconvGNU/IconvGNUTransService.cpp.
>
> 1027     for (size_t cnt = 0; cnt < maxChars && srcLen; cnt++) {
> 1028         size_t    rc = iconvFrom(startSrc, &srcLen, &orgTarget,
> uChSize());
> 1029         if (rc == (size_t)-1) {
> 1030             if (errno != E2BIG || prevSrcLen == srcLen) {
> 1031                 ThrowXMLwithMemMgr(TranscodingException,
> XMLExcepts::Trans_BadSrcSeq, getMemoryManager());
> 1032             }
> 1033         }
> 1034         charSizes[cnt] = prevSrcLen - srcLen;
> 1035         prevSrcLen = srcLen;
> 1036         bytesEaten += charSizes[cnt];
> 1037         startSrc = endSrc - srcLen;
> 1038         toReturn++;
> 1039     }
>
> If a huge file is passed to xerces, partial text will be passed to
> IconvGNUTranscoder, and an incomplete multibyte sequence will been
> encountered in the input.
> errno EINVAL is for that. But the errno of EINVAL is unchecked.
>
>
>
> Regards,
>   Kirby Zhou
>   from   SOHU-RD   +86-10-6272-8261
>
>
>
> _______________________________________________
> epel-devel-list mailing list
> epel-devel-list at redhat.com
> https://www.redhat.com/mailman/listinfo/epel-devel-list
>



-- 
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines

_______________________________________________
epel-devel-list mailing list
epel-devel-list at redhat.com
https://www.redhat.com/mailman/listinfo/epel-devel-list





More information about the epel-devel-list mailing list