With the Scout SDK you have 2 options when it comes to audio advice – using pre-recorded audio files (.mp3 files) or relying on the phone’s/device’s TextToSpeech engine to read out the instructions.
In this article, I’ll explain where do these advice intructions come from, how they are composed and how you can modify them to better fit you requirements, be that you are adding support for a new language or adding a new language pack to an existing language (i.e. the Terminator or Homer Simpson language pack for the English language).
Advice configuration file(s)
In the SKMaps.zip (Android) or SKMaps.bundle file (iOS), inside /AdvisorConfig/Languages you’ll find the language settings folders – one for each language/language pack combination – i.e. en and en_us
Inside each language folder you will find a number of configuration files, covering both .mp3 advice instructions and text-to-speech advice instructions:
- numbers.adv – defines how the numbers will be split, so that they are correctly pronounced (audio files only)
- general.csv – defines the rules for creating audio/text instructions for audio files support
- general_TTS.adv – defines the rules for creating audio/text instructions for text-to-speech support
- general_config.adv – defines the rules for playing “combined” advice instructions & imperial-to-metric / metric-to-imperial conversion rules for audio files support
- general_config_TTS.adv – defines the rules for playing “combined” advice & imperial-to-metric / metric-to-imperial conversion rules for text-to-speech support
If you want to add support for a new language (or a new language pack) just add a new directory in the Languages folder and inside just copy the english config files (the content of the “en” folder) – feel free to start from any other language folder – if a particular language is more similar (grammar wise) to your target language than english, start from that language.
Note: the easiest way is to create configs for the text-to-speech engine and then have the phones TTS engine speak out the instructions. If you’d like to provide support for audio files, you’ll have to record your own sound files (mp3 files) involving some audio processing work.
- Describes when “combined advice” are played – these are advice instructions played when moving from a certain type of street to another (i.e. highway to primary road) in quick succession and it will sound something like “instruction then combine_instruction” . The purpose of this is to chain 2 instructions when the distance between the instructions falls below certain thresholds, so you’ll need to know in advance what to do after your next maneuver (i.e. “turn right and then turn left”).
- The thresholds are in meters. I.e.
The above statement would play the chained instruction when moving from a highway to another highway, within the city limits, when the instructions are closer than 200 meters apart
- The thresholds are defined twice, once within city limits, once outside city limits (2 different average car speeds are considered for evaluating the distance)
- Our internal categorisation for H, M, C, S , is based on the OpenStreetMap schema:
case eSegTypeMotorway: case eSegTypeTrunk: return StreetFcHighway; case eSegTypePrimary: case eSegTypeSecondary: return StreetFcMajor; case eSegTypeMotorway_link: case eSegTypeTertiary: case eSegTypePrimary_link: case eSegTypeSecondary_link: case eSegTypeTertiary_link: case eSegTypeTrunk_link: return StreetFcConnecting; case eSegTypePedestrian: case eSegTypeService: case eSegTypeBridleway: case eSegTypeFerryPed: case eSegTypeUnclassified: case eSegTypeResidential: case eSegTypeFerry: case eSegTypeLiving_street: case eSegTypeSteps: case eSegTypeCycleway: case eSegTypeUnpavedTrack: case eSegTypePermissive: case eSegTypeDestination: return StreetFcSmall;
- Also in this file, the rules for converting between metric and imperial units are defined
- Unless required, you should not change this file
- For certain navigation instructions (T junctions, roundabout, etc.) a number of advice are defined: 3 advice (provided at different distances from destination), combined advice (when there’s no time to play all the previous advice instructions) and 1 advice for the WEB API (which is optional as it’s not used by the mobile SDK)
- The conventions are:
- text is interpreted as an audio file name – i.e. in_open means that the in_open.mp3 file has to be played
- different audio files are separated by the pipe (|) separator
- $ signifies that something is to be replaced at run time with a particular value.
- $distance will get replace with the actual distance
- take_the_$(roundaboutExit)_exit_at_the_roundabout will become take_the_1th_exit_at_the_roundabout (which is a file name)
- The only conditional operator than is used by default is the $hasRef that indicates that if the existing road has a reference in OSM, then what follows the $hasRef conditional will be played, otherwise it won’t be played
- (for more conditional operators see the _TTS config files, but keep in mind that via audio files you cannot reliably say non standard/dynamic phrases – i.e. street names)
- @ is an operator only valid for the “final” advice in a navigation – you will see it only at the end of advice for “destination …” or “destination …”. It is required to signal our SDK that navigation has ended (it’s an internal SDK requirement)
- $direction converts to what is defined in “direction names in user language” section (inside the General.csv file).
- $(side) converts to what is defined in “#street sides” section (inside the General.csv file)
- This should be the file that you should (mainly) modify for creating a new voice pack
Other consideration regarding audio files support:
- Only the sound files references in the general.csv are used – the audio files folders hold more files that those actually used – this is due to historic reasons (they were either used at a point or planned to be used)
- The text instructions associated with a particular advice are also based on the General (and General_TTS) file – the config will be converted to text replacing _ and | with a space ” ” . The _open and _closed terminations will be stripped out from the final text
- For TTS support the text is converted to real text (similar to the text instructions generation for audio files), which will be read out by the TTS engine “now|leave_the_main_road|your_destination|is_on_the_$(side)_side|@” will become “now leave the main road your destination is on the left side” (notice that $(side) is replaced following the same rules defined in General.csv and that the ending @ is not part of the advice, as it’s used only as a signaling mechanism)
- As TTS engines can also pronounce street names, you can include this information in the advice via the $nameOrRef (which will return the street’s name of reference, giving priority to the Name tag)
- The name, corresponds to whatever is the value of the “name” tag is (i.e. “Tri-State Tollway”, the reference corresponding to whatever the value of the “ref” tag is (i.e. “I 294”)
- Bundled with the $nameOrRef parameter you have the $hasNameOrRef conditional – if that particular road has a name or a ref value the text following the conditional will be used, otherwise the instruction will stop
- A “name only” variant of the above parameter is “$name” usually accompanied by the conditional “$hasName”
- As a general rule, always use the $ref, $name and $nameOrRef parameters after their respective conditionals, otherwise the generated advice may be sound/read “weird”
With this knowledge at hand you should have the tools required to modify the audio/text advice, to create your own language pack or add support for a whole new language in order to create a new experience.
Taking for example the original idea of Forman & Bodenfors created for If Insurance who developed a sat nav app that comes with a pretty childish feature: when driving in areas where it’s more likely to be children around, the regular navigation voice automatically switches to a child’s voice. By adding a simple feature like this, the experience of drivers will immediately be changed causing a visceral, physical, cognitive and behavioral effect when hearing a child voice.
Based on experience, I think that more questions will come up once you get more familiar with the structure, so feel free to contact us- choose the channel you most prefer– and I’ll do my best to address some of them in a future blog post.