Tuesday 6 June 2017

Node.js binary buffer to UTF-8

Now, I know that this isn't directly possible since some special characters use 2 bytes and NodeJS won't know if it's 1 byte or 2 bytes.

But I have a binary buffer with UTF-8 data. The data is all together and I know that part A starts in position 40, part B starts at 70, and so long.

I've seen a lot of posts saying to encode in base64 or hex, but that will change the position. The only way I'm (being) able to get the correct data in the correct position is by converting to binary, and then create some cases where if the character it's 'Ã' followed by 'º', replace them by 'ú'.

Some of my code:

var bigBuf = Buffer.allocUnsafe(256*6);

this is where I store the data.

console.log(bigBuf)
<Buffer 52 65 70 c3 ba 62 6c 69 63 61 20 50 6f 72 74 75 67 75 65 73 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50 52 54 00 00 00 00 00 00 00 ... >


console.log(bigBuf.toString())
República PortuguesaPRTCartão...

console.log(bigBuf.toString().charAt(40))
R

here it should be P (from PRT), where it isn't because of the 'ú' before

console.log(bigBuf.toString().charAt(120))
a

here it should be C (from Cartão) and it isn't because of the 'ú' before, and now the next one will be even worse because of 'ã' in "Cartão"

And this happens everytime there's a special character. In the end of the buffer the data is many positions away from where it should be.

Is there a real way of doing this or are my chances locked up since I need the information to be in those exact positions? Sorry if this doesn't make any sense and I'm doing it in a completely idiotic way



via jm8FE

No comments:

Post a Comment