I am personally fascinated with the concept of Multimodal Interaction, yet find it surprising at how slow we are moving toward using voice and touch as a common way to communicate and interact with computers.
Touch Revolution
Touch screen technology has been around a long time. I remember overlay touch screen kioks back in the early 90’s. However, the technology really hasn’t taken off until recently, with the advent of multi-touch technology, and in particular the Apple iPhone has become a major catalyst for touch technology.
Multi-touch enables gestures that are more intuitive than ever before. What sold me on the iPhone when it was introduced was the ability to zoom in and zoom out with 2-fingers, not just pictures but web page content and email. It makes the Human Computer Interaction (HCI) experience more natural. The iPhone is also very responsive, making it usable and practical. You could say it’s Haptic aptitude is very good.
Because of this catalytic success, there is a sea change occurring in wireless devices, where touch is wanted and will be expected going forward as the primary HCI interface for some time to come, at least for Smart Phones, formerly PDAs.
A new Generation of Laptops and PCs
HP has made an investment in leading the industry toward a new generation of laptops and PCs that are touch enabled. They sold 400,000 TouchSmart PCs in 2008 (according to WSJ), and introduced the first ‘Consumer’ touch laptop earlier this year. I bought one—read about it here. Unfortunately, I haven’t had the same satisfaction as I have with touch on the iPhone. However, I expect it will get better soon. Many other hardware manufacturers are focused on bringing multi-touch to the market. Microsoft is investing a significant amount of resources and dollars into making Windows 7 a multi-touch success.
Microsoft Surface has been around for several years, yet is still considered a high-end technology demo in most respects. It’s really cool to play with, yet too expensive to be practical yet. It reminds me of the late 80’s when CD-R technology was introduced. I was doing a research project on practical ‘multimedia’ applications, and the use of CD-R as a microfiche replacement was one of the areas my team was investigating. The corporation I worked for (and many others) were spending tens of thousands of dollars per month to have paper documents transferred to microfiche for archival. The thought of recording to CD and having a searchable document repository was compelling. This was obviously before the Internet took off… A long story short—the CD-R drives at that time cost $12,000 to $15,000 for a single drive and the media was close to $100 per disc. Today, you can buy a high-speed DVD / CD-R drive for under $100, and the media is just cheap commodity. I think touch technology is making a similar transition, yet I expect it will accelerate even faster than the CD-R example above.
Notably, Microsoft just announced this week a touch Zune HD device and a Touch Pack for Windows 7, which will enable hardware OEMs to provide a few of the Surface apps on Windows 7.
The old and the new
We’ve been using a form of multimodal interaction for some time. Can you imagine using a PC without a mouse? Using a laptop without a touchpad or stick? This was obviously another example of Apple (and Steve Jobs in particular) creating a catalyst that would forever change how we interacted with computers.
Going forward, I expect the interfaces to expand dramatically. I personally am excited about using voice as a practical HCI interface. Although voice hasn’t had a very good track record to-date, I expect it will have as big if not a bigger impact than touch. More important, I think voice and touch together will be the replacement of mouse and keyboard. However, change is difficult and the transition will take more time than you might think.
Paradigm Shift for Voice
As humans we communicate with others in a multimodal form, most of the time without even know it. I use a keyboard and mouse today naturally without even thinking about it. Eventually, we will use voice and touch in similar familiarity. However, the transition requires incremental change and a new paradigm shift.
Voice technology on the PC to-date has been primarily focused on becoming a keyboard and mouse replacement. In other words, Command and Control. This is important and will eventually succeed—after we’ve vastly improve speech recognition. However, I believe there is another use for voice technology that is more passive and complimentary to existing Command and Control interfaces.
I call this new paradigm Listening Applications. A Listening Application enables passive capture of voice. Passive, meaning that it’s not a command and control paradigm—it’s capturing the information and doing something useful with it that adds value to the end-user, without them changing what they are used to in a big way. Another word that may describe this is Annotating, yet not in the traditional sense of adding a bubble to a word document. A good example would be capturing voice notes while you are researching a subject on the web. The voice notes are captured in the context of something else you are doing, converted to text, indexed, and easily organized along with the traditional web bookmarks. This is the multi-modal experience I am looking forward to—soon…