Computing:
Can We Talk?

Developers bank on pervasive future. But not so fast… There are issues surrounding wireless networks, standards, and basic ergonomics.

 
  by  Nancy Cohen    
 

 
 

Technology’s pundits are informing America to prepare for technology’s next two waves. The first wave will hurl us all away from keyboard, monitor, and mouse and over to smart devices of all shapes and sizes. The other wave will engage us in something called multimodal forms of computing. That last bit about multimodal has become a favored enterprise message at vendor-sponsored trade shows and briefings, as computing moves from singularly keyboard-reliant PCs into phones and handhelds.

Multimodal refers to the mix of voice, data, and images where people can operate in several modes at once. You can vocalize, type, and send numbers from a wireless handset within the same session, form, and context. Multiple input and output methods can be used simultaneously, including stylus, touchscreen, keypad, and voice. Users request information from a wireless PDA by voice and get answers back by charts, for example. They can speak, write, and see images but are not limited to any single interface during the same transaction. Users making travel arrangements on a mobile device can use voice to request arrival and departure routes. They can use voice or stylus interchangeably to complete the booking.

What multimodal applications promise in business terms are savings in infrastructure, not to mention bringing enhanced efficiencies in handling customer calls. For industries like manufacturing, retail, and transportation, employees can request inventory information hands-free or relay information on the factory floors, and get information back in text or graphics. The Kelsey Group, Princeton, NJ analysts, predicts $6.4B in subscription revenues for in-car voice and wireless services by 2006.

IBM, in July, announced new tools and software for propelling development of multimodal technologies. Soon after came the IBM-Opera announcement: They’re teaming to develop a multimodal browser based on the XHTML+Voice (X+V) spec. X+V competes for developer attention with SALT (see snapshot), which is backed by Microsoft for .NET.

 
         
 

X + V SNAPSHOT

What is X+V?

Shorthand for XHTML + Voice. A multimodal standard that is a combination of XHTML and VoiceXML in order to support multimodal
interactions for PDAs on up. With it, the user gets more options: to operate in a voice-only, visual-only, or multimodal environment.

Developers
IBM, Motorola, and Opera

W3C Status:
Under consideration

http://www3.org/2002/Talks/www2002-voice/l

SALT Alternative
The Speech Application Language Tags (SALT) Forum is developing a royalty-free, platform-independent standard for multimodal and telephony-enabled access to information and web services for PDAs and other devices.

Founders
Cisco Systems, Intel, Philips, Microsoft, and SpeechWorks

W3C Status:
Under consideration

 

The big deal with the IBM-Opera alliance is that one will access Web and voice information at the same time. As for Opera Software, its CEO, Jon von Tetzchner, has seen the market opportunity in his browser for the embedded market all along. Features that apply easily are Opera’s speed and configurability, cross-platform core, and simplified maintenance.

Two years ago, Opera announced it was teaming with several embedded Linux companies, Lineo, Insignia, and Trolltech for something called the Embedix Plus PDA platform. Designed for manufacturers as a software turnkey solution, Embedix bundles basic pre-requisites for mobile handheld devices and other appliances.

Then Sharp announced that its Zaurus Linux/Java-based handheld would use the Opera for Linux web browser. Opera also reached a deal with Symbian Ltd., the London consortium of mobile phone makers, a partnership that includes Nokia and Ericsson, where Opera was to become the default browser on a number of wireless devices that connect to the Internet. Tetzchner expects to see browsers showing up more in mobile phones, set-top boxes and cars in the near future.

Sunil Soares, director of product management for IBM’s Pervasive Computing Division, only needs to think about his interactions with customers, business partners, and developers in pervasive computing to speak confidently about how advances in tools and middleware permitting interactions of voice, keypad, and stylus interchangeably will be a big deal for business users and for IBM.

Open magazine asked Soares to discuss IBM’s turning to Opera for work on a multimodal browser based on the XHMTL + Voice specification:

 
     
 

“We decided to partner with Opera for two key reasons. First, we were attracted to them because of their key partners in the embedded market, such as Lineo. The second reason is the fact that their browsers support multiple operating systems. IBM sells on leveraging customers’ existing infrastructure.”

We also asked Soares to step beyond the lofty language of multimodal press announcements to illustrate what effects this will have on business. “I like to paint in the vision with some anecdotal evidence of a brokerage. Brokerage people tell me that their customers, in asking about stock portfolios, are looking for input capabilities as speech with the response coming back to them as text.” It’s convenient for a brokerage customer to request trading positions using voice, but he doesn't want to hear the results, bur rather see them. With multimodal interfaces, verbal requests can get visual responses.

The standard also saves companies money by combining web and voice infrastructures, eliminating the need for redundant infrastructure. Being able to avoid such investments along with better efficiencies through voice functions are going to be significant sales stories. “Multimodal applications are very much about reducing costs,” says Soares.

As for how this will impact embedded Linux, Soares says, “We believe that Linux is key to our customers as it harnesses power with simplicity. We’re seeing a fair amount of embedded Linux focus, especially from smart-phone manufacturers, and particularly in the Asia-Pacific region.”

But the battle to get multimodal products in full gear is not over. Application designers have a number of issues to confront. An informative overview of where multimodal application technologies now stand is by software consultant Harsha Srivatsa, in his article about multimodal applications appearing on IBM’s developerWorks web site.

Srivatsa’s reality check on multimodal applications includes the following:

• Standards for multimodal applications are just being developed

• Multimodal applications are bandwidth-hungry. The 2G wireless network constraints in capacity and bandwidth still challenge the coming of age of multimodal applications. However, the speed in which 3G wireless networks can progress will have an effect on a more widespread adoption of multimodal applications.

• Developers will need to take care in confronting ergonomic issues, with users switching from one mode to another, such as listening and then watching. Srivatsa refers to the risk of “feature bloats with the various interfaces.”

• Achieving synchronization between modes to capture events and unify server-client interactions is a challenge.