Skip to content

Asterisk 13.5+ extra-escapes all channel variables, including $RECOG_RESULT #47

@sfgeorge

Description

@sfgeorge

On Asterisk 13.5+ combined with LumenVox ASR, we're noticing that UniMRCP-based speech recognition is failing with the following error: ERROR Adhearsion::Translator::Asterisk: <Nokogiri::XML::SyntaxError> The value following "version" in the XML declaration must be a quoted string.

The reason for this is that Asterisk 13.5+ now escapes several characters - including ' " ? - with backslashes \ now for all VarSet (channel variable set) events. So ALL channel variables, including the $RECOG_RESULT variable for conveying NLSML results from speech recognition, are now subject to a different encoding than before.


Add to that, despite the fact that Adhearsion enables the UniMRCP uer option (URI-encoded results), single quote ' is one of the characters that is not typically URI-encoded - and so the single-quotes included in a LumenVox response are not URI-encoded, triggering Asterisk 13.5+'s new functionality to intercede and replace instances of ' with \':

...
Variable: RECOG_RESULT
Value: %3C%3Fxml%20version%3D\'1.0\'%20encoding%3D\'ISO-8859-1\'%20%3F%3E%3Cresult%3E%3Cinterpretation%20grammar%3D%22builtin%3Agrammar%2Fnumber%22%20confidence%3D%220.96%22%3E%3Cinput%20mode%3D%22speech%22%3Eseven%3C%2Finput%3E%3Cinstance%3E7%3C%2Finstance%3E%3C%2Finterpretation%3E%3C%2Fresult%3E

Decoded:
<?xml version=\'1.0\' encoding=\'ISO-8859-1\' ?><result><interpretation grammar="builtin:grammar/number" confidence="0.96"><input mode="speech">seven</input><instance>7</instance></interpretation></result>
... ❌malformed with \'


In contrast, here's how that variable would be received prior to Asterisk 13.5:

...
...
Variable: RECOG_RESULT.
Value: %3C%3Fxml%20version%3D'1.0'%20encoding%3D'ISO-8859-1'%20%3F%3E%3Cresult%3E%3Cinterpretation%20grammar%3D%22builtin%3Agrammar%2Fnumber%22%20confidence%3D%220.92%22%3E%3Cinput%20mode%3D%22speech%22%3Eseven%3C%2Finput%3E%3Cinstance%3E7%3C%2Finstance%3E%3C%2Finterpretation%3E%3C%2Fresult%3E

Decoded:
<?xml version='1.0' encoding='ISO-8859-1' ?><result><interpretation grammar="builtin:grammar/number" confidence="0.92"><input mode="speech">seven</input><instance>7</instance></interpretation></result>
... ✅valid NLSML


The back-slashing of the following characters was introduced with this change in ASTERISK-24934 [patch]Asterisk manager output does not escape control characters

ASCII Character in C new 2-character AMI Representation in Asterisk >= 13.5
\a (0x07) Alert (Beep, Bell) \ a (0x5c 0x61)
\b (0x08) Backspace \ b (0x5c 0x62)
\f (0x0C) Formfeed Page Break \ f (0x5c 0x66)
\n (0x0A) Newline (Line Feed) \ n (0x5c 0x6E)
\r (0x0D) Carriage Return \ r (0x5c 0x72)
\t (0x09) Horizontal Tab \ t (0x5c 0x74)
\v (0x0B) Vertical Tab \ v (0x5c 0x75)
\ (0x5C) Backslash \ \ (0x5c 0x5c)
' (0x27) Apostrophe or single quotation mark \ ' (0x5c 0x27)
" (0x22) Double quotation mark \ " (0x5c 0x22)
? (0x3F) question mark \ ? (0x5c 0x3F)

Some Strategies for Resolution

  1. We could just always attempt to unescape \, in all versions of Asterisk.
    Cons: This would be a change in behavior, and could potentially corrupt data in Asterisk < 13.5.

  2. We could activate auto-unescaping based on RubyAMI::Stream#version being >= 2.8.0 since the issue was introduced as AMI_VERSION moved from 2.7.0 to 2.8.0.
    Pro: 0-configuration, "It just works" solution.
    Cons:

  • A complex, stateful solution.
  • Introduces the concept of separate modes of Asterisk compatibility.
  1. We could decide whether unescape or not based on a config value of some sort being enabled.
    Pro:
  • Straightforward to implement & test.
  • We can decide whether or not to default the option to ON or OFF.
    Cons:
  • Introduces the concept of separate modes of Asterisk compatibility.
  • NOT 0-configuration -- Rather, if you hit this error, you may have to do a web search for this error and learn that you need to flip this configuration option ON to resolve.

My leaning is towards option 3 2. But I'm very interested in other points of view on the matter. 👀

Cc: @gfaza @lpradovera @bklang

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions