Skip over navigation

Restore visual elements.

 

 

 

 

TechDis logo


Valid XHTML 1.0 Strict

 

An audio user interface for a music and sound editor

Julio D'Escrivan (original concept and research)

Jan Trutzschler (programmer and developer)

The auditory user interface for a music and sound editor arose out of the need to look after one of my own visually impaired students at Anglia Ruskin University. The original project is rather ambitious as it aims to provide, eventually, a software application of professional standard. Thanks to the JISC TechDis/HEAT scheme, we have been able to create a working prototype which can already be used by students, although more money and time are needed to fully realise this idea and make it even more accessible and reliable. Our brief was to create a simple software application that would allow a visually impaired user to aurally access the controls in a sound editing environment.

Software music and sound editing environments allow users to select audio files which will then be placed in different time positions relative to each other, to cut, copy and paste from and into them, and finally to mix the audio down to a single stereo file (this process is called bouncing). These environments are also referred to as ‘audio sequencers’ or simply ‘sequencers’. The problem is that their user interfaces are prominently based on visual paradigms, usually with few or no accessibility options. The challenge for us was to use custom software programming to embed these options from within the sequencer itself.

In July we started work by creating a specification of what we thought the software should be able to do. The following are my notes to Jan Trutzschler, who carried out the programming of the auditory feedback in SuperCollider, an ‘Object Oriented Programming’ language. Jan had been working on a visual multitracking application and his experience with sound sequencing was a key element in the realisation of this project.

“I suggest we go for sfx audio feedback because the machine talking is so obtrusive as you try and navigate the interface, as we have found from using the mac voice over. For example we could have short sounds that signify each action, a sort of audio equivalent to visual icons. These sounds could be either a sample or a synthetic sound. When the user first begins to utilise the auditory user interface multitracker, they should receive both the spoken feedback followed by the sfx. I can design these sounds myself. The idea is that once the user can relate the word 'play' to, for example, a short click such a  you would get from pressing a button, then he can shut off the speaking part and just have the click sound. This would make their work very speedy and efficient as they don't have to wait for the computer to pronounce the whole word.

At any given point the user should be able to query what a certain command is by using a qwerty key modifier, for example, that would speak out the command they are trying to use -this is in case they have forgotten what a sound stands for. I did this on the FM synth because I found that once the student knew what keys did, he quickly began to change the sound and now and again he could confirm what he was doing by pressing the 'query' key.

Every command must be actioned by a key or combination of keys, or by an assignable controller number, we can negotiate these, but I believe from teaching experience that transport commands should live on the numeric keypad as this makes them very quick to use.. e.g., enter->play, '0'-> return to zero, '-' ->forward, "-"->backward, *->record, "."->locate ... (and of course spacebar should also be ‘play’ as in all sequencers !)”

As expressed in these notes, the need to provide sounds or auditory icons (Gaver, 1986) was obvious from the start. A consideration of issues involved in communicating graphical information using musical elements (Alty & Rigas, 1998) was also interesting but the demands of getting a basic application working made us concentrate on speech and short sfx as audio feedback. What we did not envisage at that stage, but we hope to incorporate into future versions, were earcons (Blattner & Sumikawa, 1989) and spearcons (Walker, Nance & Lindsay, 2006). Earcons are abstract melodic elements that
can be associated with user interface elements. Spearcons are rapid speech based sounds that have been shown to enhance accessibility in auditory displays (this is what auditory user interfaces are generally called in the human-computer interface research field). What we found as the project developed is that since Jan had to program ‘from the ground’ up in order to include these accessibility features into his previously existing multitracking object oriented classes, we had to make choices on what should be implemented in order to have a working version by the deadline of this project. This does not mean that the ideas expressed above will not be implemented, simply that pragmatism has guided our choices during the creation process in order to meet the deadline.

I then proceeded to provide a list of specific functionalities that the software should have regarding transport control and Jan suggested which combination of keys (keycommands) could be used to control them. I assumed these basic controls should be represented aurally, through experimenting with the interface we eventually identified which ones worked best and added others. The next step was to find out how the manipulation of tracks and sound files could be made accessible, so time we tabulated more commands and I added some feedback upon testing. Jan suggested that everything which has to do with selecting an audio region should use the option (alt) key, and everything, which actually moves audio regions should use the ctrl key. Finally, Jan added the speech feedback for soundfile managing, this is an area which should benefit from working with earcons and spearcons in the next stage of our research and development.

Once all these features were working, we proceeded to test them ourselves for consistency. We found issues of stability that were solved gradually, but in the end it is clear that we need more development time. The main tester was a blind PhD student named David Hindmarch. He uses a commercial sequencer but was thrilled at the prospect of a purpose built sequencer with auditory display. He raised many important issues and gave us ideas for further development. Jan had a meeting with him in Birmingham and he reported back: (please note that the working name for the sequence  is ‘teatracks’).

“On the 16th of November 2007 I met David Hindmarch, a blind student under Jonty Harrison at the University of Birmingham, to show him the current state of development of the multi-tracker sound program. First of all he was very excited to hear about the project. He has been using Windows PCs until now, on which he has been working with SoundForge and Sonar. These two programs are scripted in order to give spoken feedback and to be controlled with keyboard commands. However they were not designed for visually disabled users and David showed me a few examples of graphical interfaces, which act very intuitive, when one can see them, but otherwise are not accessible at all. David was very happy about all spoken and sonic feedback, which one can get from the multi-tracker "tea-tracks". And since he has a lot of experience with computer programs of that kind, he had some very useful suggestions, which made me realise how much on  is focused on the visual representation even when trying to forget about them. He sketched out a work flow, which was partly already possible and which I took as a guideline for further development. The workflow can be described as the following: One imports audio files into the program's sound-file pool, moves the time cursor to certain position, places a sound file on a selected track and uses copy-, cut- and paste-operations on the sound-file. Multiple sound files can be placed and mixed to another file on the hard-disk. At the meeting I also met Zlatko Baracskai, who has started to developsome small applications for David. Since he also uses SuperCollider, as I do, we decided that it would be of advantage for all of us to try to incorporate these  applications as plug-ins into "tea-tracks". Jan Trutzschler.”

I also made contact with David Hindmarch and exchanged ideas along the same lines, in fact, regarding our project versus existing commercial sequencers he said this in an email exchange on 18th November 2007:

“Sound forge is becoming increasingly clunky and unreliable, if blind users of Computers can start to use readily designed Apps built in SC [SuperCollider], it will begin to level the playing field in regard to Sighted musicians.”

In fact, quite early on, another interesting comment regarding inclusion was communicated to me via email by Dr. Tony Stockman, a blind senior academic researcher  into auditory displays of Queen Mary, University of London, in May 2007:

"I am very excited about this project, it seems to me remarkable that, as far as I am aware, no one has embarked on it before, as you say in the abstract, current interfaces provide a major barrier to visually impaired students of electronic music, there is a real need for something of this kind."

What we have done and intend to carry on pursuing has great potential in terms of enabling greater accessibility for the visually impaired. At present this software enables a blind musician to do basic sound editing and in that way it has been successful, but the real benefit of having developed this project is the research and development opportunities it opens. There is a real need to create applications that provide auditory displays for visually impaired users, especially in music. It seems that many blind people tend to be especially sensitive to auditory feedback and this works to their advantage in the fields of music and audio. Blind users who are also musicians are especially advantaged in that they can differentiate a greater range of sound meanings. Anecdotally, they seem also able to identify pitch and timbre with particular precision. What is needed is the development of music and audio tools that are based on these natural advantages as opposed to ‘remedy’ tools which attempt to translate sighted interfaces for non-sighted users. It requires a paradigm shift. I am very interested in this and I believe that what we have done, as well as having a concrete output, provides a useful starting point.

The methodology employed, that of identifying and implementing appropriate audio feedback is not exclusive to one programming language. We used SuperCollider because it is specifically designed for music, but the design guiding principles are not subject to this choice. Even the programming that has been done is simply an instance of an approach which can be applied in many different ways: that of systematically creating an auditory display for music editing based on auditory icons and speech. The advantage, though, of programming from the ‘ground level up’ as it were is great in that accessible functionality can be implemented at every stage. It also means that it is easier to adapt the software to new findings. For example, now that we have a software construct ( a ‘class’) that can produce a sound effect in response to an action, it is a trivial matter to substitute it by a musical one (earcon) or a rapid speech sample (spearcon). It must be noted that the programming of spoken feedback by Jan was particularly creative and that he had to develop a parallel ‘speech server’ to that of the Apple Macintosh computer. Previous attempts I had made in this area were unsuccessful as the voice over utility conflicts with SuperCollider speech unless a separate class is developed. It must also be noted that although the functionality to respond to the USB mixing interfaces provided by TechDis is already enabled (although without aural representation yet), we had to postpone the level mixing capabilities of the sequencer as a matter of priority. We needed to have a working application and, as mentioned earlier, more research and development time is needed to provide all the desired functionalities.

Further links

A number of video clips of a blind user using the TeaTracks application are available by following the links below:

Adding files to TeaTracks
Focussing in TeaTracks
Navigating in TeaTracks
Placing sounds in TeaTracks
Splitting regions in TeaTracks

The full project report, including a manual for the TeaTracks application, is also available to download:

TeaTracks project report and manual (PDF - 151 KB)