Wednesday, March 26, 2008




I need to find the equivalent of a SUBSTRB function in Informatica.
I have data in legacy databases (Oracle 9i) where some columns are
4000 bytes long and I'd like to copy as much of the text as
possible to an AL32UTF8 database. Because the source database has
text in Chinese, Japanese and Korean(CJK), it may require more bytes to
store in the UTF target than in the source.

Thus I might have started with 4000 bytes but it might take 6000
bytes (more or less the worst case but I suppose it could be up to
8000 bytes) to store the data in UTF.

Right now my only choice is to pull 1333 characters (presumably
2666 bytes) as we feel that is unlikely to take more than 4000 bytes
in UTF. However, some of the text is probably ASCII so we are
throwing away characters that we could store if we knew how many
bytes they would require in UTF8.

What I'd like to do is write out the whole 4000 bytes from the
source system to a UTF8 file and then load that file using an
Informatica mapping. There wouldn't be a problem reading the 4000
characters from the flat file but I need to be able to use just the
first 4000 bytes of the text as we know we can store that much in
our target field. That would give us the maximum amount of text from
the source system (without using CLOB columns). I might be able to
get just 1333 characters in 4000 bytes or I might be able to get
4000 characters if they were all ASCII.

No comments: