Jay Taylor's notes
back to listing indexHow can I remove the ANSI escape sequences from a string in python
[web search]- Home
-
- Public
- Stack Overflow
- Tags
- Users
- Jobs
-
This is my string:
'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m'
I was using code to retrieve the output from a SSH command and I want my string to only contain 'examplefile.zip'
What I can use to remove the extra escape sequences?
Delete them with a regular expression:
import re
ansi_escape = re.compile(r'\x1B\[[0-?]*[ -/]*[@-~]')
ansi_escape.sub('', sometext)
Demo:
>>> import re
>>> ansi_escape = re.compile(r'\x1B\[[0-?]*[ -/]*[@-~]')
>>> sometext = 'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m'
>>> ansi_escape.sub('', sometext)
'ls\r\nexamplefile.zip\r\n'
(I've tidied up the escape sequence expression to follow the Wikipedia overview of ANSI escape codes, focusing on the CSI sequences, and ignoring the C1 codes as they are never used in today's UTF-8 world).
-
The line ansi_escape.sub('', sometext) should be assigned to your final variable. – crafter Feb 4 at 14:14
-
The accepted answer to this question only considers color and font effects. There are a lot of sequences that do not end in 'm', such as cursor positioning, erasing, and scroll regions.
The complete regexp for Control Sequences (aka ANSI Escape Sequences) is
/(\x9B|\x1B\[)[0-?]*[ -\/]*[@-~]/
Refer to ECMA-48 Section 5.4 and ANSI escape code
-
-
-
OSC is an "ANSI escape sequence", is frequently used, and would begin with a different pattern. Your answer is incomplete. – Thomas Dickey Aug 4 '16 at 7:57
-
This doesn't work for color codes produced by
bluetoothctl
, example:\x1b[0;94m
. Making the expression case insensitive or replacing1B
with1b
in the pattern made no difference. I'm using Python and the linere.compile(r'/(\x9b|\x1b\[)[0-?]*[ -\/]*[@-~]/', re.I)
. Then I'm doingpattern.sub("", my_string)
which doesn't accomplish anything. Am I doing something wrong? – Hubro Dec 30 '16 at 8:11 -
(I was too slow to edit my previous comment). I assume your pattern is using features not available in Python's
re
module? – Hubro Dec 30 '16 at 8:18
Function
Based on Martijn Pieters♦'s answer with Jeff's regexp.
def escape_ansi(line):
ansi_escape = re.compile(r'(\x9B|\x1B\[)[0-?]*[ -/]*[@-~]')
return ansi_escape.sub('', line)
Test
def test_remove_ansi_escape_sequence(self):
line = '\t\u001b[0;35mBlabla\u001b[0m \u001b[0;36m172.18.0.2\u001b[0m'
escaped_line = escape_ansi(line)
self.assertEqual(escaped_line, '\tBlabla 172.18.0.2')
Testing
If you want to run it by yourself, use python3
(better unicode support, blablabla). Here is how the test file should be:
import unittest
import re
def escape_ansi(line):
…
class TestStringMethods(unittest.TestCase):
def test_remove_ansi_escape_sequence(self):
…
if __name__ == '__main__':
unittest.main()
-
Why have you left the
/
escaped in the second to last character set[ -\/]
? – Andrew Gelnar Aug 10 '16 at 12:04 -
The suggested regex didn't do the trick for me so I created one of my own. The following is a python regex that I created based on the spec found here
ansi_regex = r'\x1b(' \
r'(\[\??\d+[hl])|' \
r'([=<>a-kzNM78])|' \
r'([\(\)][a-b0-2])|' \
r'(\[\d{0,2}[ma-dgkjqi])|' \
r'(\[\d+;\d+[hfy]?)|' \
r'(\[;?[hf])|' \
r'(#[3-68])|' \
r'([01356]n)|' \
r'(O[mlnp-z]?)|' \
r'(/Z)|' \
r'(\d+)|' \
r'(\[\?\d;\d0c)|' \
r'(\d;\dR))'
ansi_escape = re.compile(ansi_regex, flags=re.IGNORECASE)
I tested my regex on the following snippet (basically a copy paste from the ascii-table.com page)
\x1b[20h Set
\x1b[?1h Set
\x1b[?3h Set
\x1b[?4h Set
\x1b[?5h Set
\x1b[?6h Set
\x1b[?7h Set
\x1b[?8h Set
\x1b[?9h Set
\x1b[20l Set
\x1b[?1l Set
\x1b[?2l Set
\x1b[?3l Set
\x1b[?4l Set
\x1b[?5l Set
\x1b[?6l Set
\x1b[?7l Reset
\x1b[?8l Reset
\x1b[?9l Reset
\x1b= Set
\x1b> Set
\x1b(A Set
\x1b)A Set
\x1b(B Set
\x1b)B Set
\x1b(0 Set
\x1b)0 Set
\x1b(1 Set
\x1b)1 Set
\x1b(2 Set
\x1b)2 Set
\x1bN Set
\x1bO Set
\x1b[m Turn
\x1b[0m Turn
\x1b[1m Turn
\x1b[2m Turn
\x1b[4m Turn
\x1b[5m Turn
\x1b[7m Turn
\x1b[8m Turn
\x1b[1;2 Set
\x1b[1A Move
\x1b[2B Move
\x1b[3C Move
\x1b[4D Move
\x1b[H Move
\x1b[;H Move
\x1b[4;3H Move
\x1b[f Move
\x1b[;f Move
\x1b[1;2 Move
\x1bD Move/scroll
\x1bM Move/scroll
\x1bE Move
\x1b7 Save
\x1b8 Restore
\x1bH Set
\x1b[g Clear
\x1b[0g Clear
\x1b[3g Clear
\x1b#3 Double-height
\x1b#4 Double-height
\x1b#5 Single
\x1b#6 Double
\x1b[K Clear
\x1b[0K Clear
\x1b[1K Clear
\x1b[2K Clear
\x1b[J Clear
\x1b[0J Clear
\x1b[1J Clear
\x1b[2J Clear
\x1b5n Device
\x1b0n Response:
\x1b3n Response:
\x1b6n Get
\x1b[c Identify
\x1b[0c Identify
\x1b[?1;20c Response:
\x1bc Reset
\x1b#8 Screen
\x1b[2;1y Confidence
\x1b[2;2y Confidence
\x1b[2;9y Repeat
\x1b[2;10y Repeat
\x1b[0q Turn
\x1b[1q Turn
\x1b[2q Turn
\x1b[3q Turn
\x1b[4q Turn
\x1b< Enter/exit
\x1b= Enter
\x1b> Exit
\x1bF Use
\x1bG Use
\x1bA Move
\x1bB Move
\x1bC Move
\x1bD Move
\x1bH Move
\x1b12 Move
\x1bI
\x1bK
\x1bJ
\x1bZ
\x1b/Z
\x1bOP
\x1bOQ
\x1bOR
\x1bOS
\x1bA
\x1bB
\x1bC
\x1bD
\x1bOp
\x1bOq
\x1bOr
\x1bOs
\x1bOt
\x1bOu
\x1bOv
\x1bOw
\x1bOx
\x1bOy
\x1bOm
\x1bOl
\x1bOn
\x1bOM
\x1b[i
\x1b[1i
\x1b[4i
\x1b[5i
Hopefully this will help others :)
if you want to remove the \r\n
bit, you can pass the string through this function (written by sarnold):
def stripEscape(string):
""" Removes all escape sequences from the input string """
delete = ""
i=1
while (i<0x20):
delete += chr(i)
i += 1
t = string.translate(None, delete)
return t
Careful though, this will lump together the text in front and behind the escape sequences. So, using Martijn's filtered string 'ls\r\nexamplefile.zip\r\n'
, you will get lsexamplefile.zip
. Note the ls
in front of the desired filename.
I would use the stripEscape function first to remove the escape sequences, then pass the output to Martijn's regular expression, which would avoid concatenating the unwanted bit.