Jay Taylor's notes

back to listing index

Protocol buffers python - unicode decode error

[web search]
Original source (stackoverflow.com)
Tags: python unicode stackoverflow.com
Clipped on: 2016-06-22

I need to receive a protocol buffers message on my python - tornado server and get the stuff out of the binary message.

postContent = self.request.body
message = prototemp.ReqMessage()
message.ParseFromString(postContent)

It works perfectly using a test tool. When i run it in sandbox environment and simulate 1000 requests from my client, it works in certain cases, but in most of the requests, it throws an exception -

  File "server1.py", line 21, in post
    message.ParseFromString(postContent)
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/message.py", line 179, in ParseFromString
    self.MergeFromString(serialized)
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 755, in MergeFromString
    if self._InternalParse(serialized, 0, length) != length:
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse
    pos = field_decoder(buffer, new_pos, end, self, field_dict)
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 544, in DecodeField
    if value._InternalParse(buffer, pos, new_pos) != new_pos:
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/python_message.py", line 782, in InternalParse
    pos = field_decoder(buffer, new_pos, end, self, field_dict)
  File "/usr/lib/python2.6/site-packages/protobuf-2.4.1-py2.6.egg/google/protobuf/internal/decoder.py", line 410, in DecodeField
    field_dict[key] = local_unicode(buffer[pos:new_pos], 'utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 1: invalid continuation byte

In some other cases it gives these errors -

UnicodeDecodeError: 'utf8' codec can't decode byte 0xbf in position 3: invalid start byte

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 3: unexpected end of data

What could be the reason ?

asked Jun 9 '12 at 1:54
Image (Asset 2/3) alt=
Aditya Singh
346816
   upvote
  flag
have you tried using a try/except clause to print the string that's generating the exception? or used pdb to view what the variables are at that point? Because it's telling you the problem: there's some character at the indicated position in the string that can't be encoded with utf-8. So either you need to handle that character. (and if you can figure out what it is and whether you have to deal with it in general, you'll be able to handle it) – Jeff Tratner Jun 9 '12 at 5:13
   upvote
  flag
My first guess is that the test client was using UTF-16 as those bytes don't seem to match UTF-8 or any meaningful western chartset – Alastair McCormack Nov 19 '12 at 23:44
   upvote
  flag
Sounds like it is receiving a message for which it lacks a protocol definition. Are one or more emitters using a different spec? – MarkHu Mar 18 '13 at 23:16

I had exactly same problem with RabbitMQ and Protocol Buffers. The problem is that protocol buffer assumes the input to be of type str, whereas RabbitMQ seems to decode the message as unicode in some cases (if the byte array contains bytes greater than 127). The same may happen with Tornado as well. So far it seems, that the problem can be solved by following piece of code:

body = self.request.body
if type(body) == unicode:
    data = bytearray(body, "utf-8")
    body = bytes(data)
message = whatever.FromString(body)

This code turns the unicode string to python bytes object, which can be happily parsed by protocol buffer messages. Dunno if there is some better way to do this, but at least this seems to work.

answered Jun 10 '13 at 20:17
Image (Asset 3/3) alt=
   upvote
  flag
+1 for you. Thanks! You saved me alot of aggravation. Can you tell me what bytes(data) does? When I am in interactive python and I do a help(bytes) I don't see any information on this function. Thanks! – Bitdiot Aug 6 '14 at 20:38
   upvote
  flag
It's not a function. It's a python type. – the_drow Jul 15 '15 at 8:41

Your Answer

asked

4 years ago

viewed

3629 times

active

3 years ago

Hot Network Questions

Technology Life / Arts Culture / Recreation Science Other
  1. Stack Overflow
  2. Server Fault
  3. Super User
  4. Web Applications
  5. Ask Ubuntu
  6. Webmasters
  7. Game Development
  8. TeX - LaTeX
  1. Programmers
  2. Unix & Linux
  3. Ask Different (Apple)
  4. WordPress Development
  5. Geographic Information Systems
  6. Electrical Engineering
  7. Android Enthusiasts
  8. Information Security
  1. Database Administrators
  2. Drupal Answers
  3. SharePoint
  4. User Experience
  5. Mathematica
  6. Salesforce
  7. ExpressionEngine® Answers
  8. more (13)
  1. Photography
  2. Science Fiction & Fantasy
  3. Graphic Design
  4. Movies & TV
  5. Seasoned Advice (cooking)
  6. Home Improvement
  7. Personal Finance & Money
  8. Academia
  9. more (9)
  1. English Language & Usage
  2. Skeptics
  3. Mi Yodeya (Judaism)
  4. Travel
  5. Christianity
  6. Arqade (gaming)
  7. Bicycles
  8. Role-playing Games
  9. more (21)
  1. Mathematics
  2. Cross Validated (stats)
  3. Theoretical Computer Science
  4. Physics
  5. MathOverflow
  6. Chemistry
  7. Biology
  8. more (5)
  1. Stack Apps
  2. Meta Stack Exchange
  3. Area 51
  4. Stack Overflow Careers
site design / logo © 2016 Stack Exchange Inc; user contributions licensed under cc by-sa 3.0 with attribution required
rev 2016.6.22.3698