Mastering & Understanding the Android Media Framework

We have kind of an ambitious title, called “Mastering the Media Framework.” I think the reality is that if you believe that– that we’re going to do that in an hour, it’s probably pretty ambitious. And if you do believe that, I have a bridge just north of here that you might be interested in. But I think we actually will be able to cover a few kind of interesting things. In thinking about this topic, I wanted to cover stuff that wasn’t really available in the SDK, so we’re really going to del– delve into the lower parts of the framework, the infrastructure that basically everything’s built on. Kind of explain some of the design philosophy. So…

So on the agenda, in the cutesy fashion of the thing, we’re talking about the architecture– Frank Lloyd Android. What’s new in our Cupcake release, which just came out recently. And those of you who have the phone, you’re running that on your device today. And then a few common problems that people run into when they’re writing applications for the framework. And then there probably will be a little bit of time left over at the end for anybody who has questions.

So moving along, we’ll start with the architecture. So when we first started designing the architecture, we had some goals in mind. One of the things was to make development of applications that use media, rich media applications, very easy to develop. And so that was one of the key goals that we wanted to accomplish in this. And I think you’ll see it as we look at the framework. It’s really simple to play audio, to display a video, and things like that. One of the key things, because this is a multi-tasking operating system, is we have– you could potentially have things happening in the background.

For example, you could have a music player playing in the background. We need the ability to share resources among all these applications, and so that’s one of the key things, was to design an architecture that could easily share resources. And the other thing is, you know, paramount in Android is the security model. And if you’ve looked over the security stuff– I’m not sure we had a talk today on security. But security is really important to us. And so we needed a way to be able to sandbox parts of the application that are– that are particularly vulnerable, and I think you’ll see as we look at the– the framework, that it’s designed to isolate parts of the system that are particularly vulnerable to hacking. And then, you know, providing a way to add features in the future that are backwards compatible. So that’s the– the room for future growth. So here’s kind of a 30,000-foot view of the way the media framework works. So on the left side, you’ll notice that there is the application. And the red line– red dashed line there– is denoting the process boundary.

So applications run in one process. And the media server actually runs in its own process that’s actually booted up– brought up during boot time. And so the codecs and the file parsers and the network stack and everything that has to do with playing media is actually sitting in a separate process. And then underneath that are the hardware abstractions for the audio and video pass. So Surface Flingers are an abstraction for video and graphics. And Audio Flinger’s the abstraction for audio. So looking at a typical media function, there’s a lot of stuff– because of this inner process communication that’s going on, there’s a lot of things that are involved in moving a call down the stack.

So I wanted to give you an idea– for those of you who’ve looked at the source code, it’s sometimes hard to follow, you know, how is a call– A question that comes up quite frequently is how does a function call, like, you know, prepare or make its way all the way down to the framework and into the– the media engine? So this is kind of a top-level view of what a stack might look like.

At the very top is the Dalvik VM proxy. So that’s the Java object that you’re actually talking to. So, for example, for a media player, there’s a media player object. If you look at the media player definition, it’s a pretty– I mean, there’s not a lot of code in Java. It’s pretty simple. And basically, it’s a proxy for– in this case, actually, the native proxy, which it’s underneath, and then eventually, the actual implementation. So from that, we go through JNI, which is the Java Native Interface. And that is just a little shim layer that’s static bindings to an actual MediaPlayer object. So when you create a MediaPlayer in Java, what you’re actually doing is making a call through this JNI layer to instantiate a C++ object. That’s actually the MediaPlayer.

And there’s a reference to that that’s held in the Java object. And then some tricky stuff– weak references to garbage collection and stuff like that, which is a little bit too deep for the talk today. Like I said, you’re not going to master the framework today, but at least get an idea of what’s there. So in the native proxy, this is actually a proxy object for the service. So there is a little bit of code in the native code. You know, a little bit of logic in the native code.

MediaPlayer objects

But primarily, most of the implementation is actually sitting down in this media server process. So the native proxy is actually the C++ object that talks through this binder interface. The reason we have a native proxy instead of going directly through JNI is a lot of the other pieces of the framework does. So we wanted to be able to provide access to native applications in the future to use MediaPlayer objects. So it makes it relatively easy, because that’s something you’d probably want to do with games and things like that that are kind of more natural to write in native code.

We wanted to provide the ability to do that. So that’s why the native proxy sits there and then the Java layer just sits on top of that. So the binder proxy and the binder native piece– Binder is our abstraction for inter-process communication. Binder, basically, what it does, is it marshals objects across this process boundary through a special kernel driver. And through that, we can do things like move data, move file descriptors that are duped across processes so that they can be accessed by different processes. And we can also do something which–we can share memory between processes. And this is a really efficient way of moving data back and forth between the application and the media server. And this is used extensively in Audio Flinger and Surface Flinger. So the binder proxy is basically the marshalling code on the applications side. And the binder native code is the marshalling code for the server side of the process. And if you’re looking at all the pieces of the framework– they start with, for example– there’s an android_media… _mediaplayer.cpp, which is the JNI piece. There’s a mediaplayer.cpp, which is the native proxy object. Then there’s an imediaplayer.cpp, which is actually a– a binder proxy and the binder native code in one chunk. So you actually see the marshalling code for both pieces in that one file. And one is called bpmediaplayer.cpp– or, sorry, BP MediaPlayer object. And a BN MediaPlayer object. So when you’re looking at that code, you can see the piece that’s on the native side– the server side and the proxy. And then the final piece of the puzzle is the actual implementation itself. So in the case of the media server– sorry, the MediaPlayer– there’s a MediaPlayer service which instantiates a MediaPlayer object in the service that’s, you know, proxied in the application by this other MediaPlayer object.

That’s basically– each one of the calls goes through this stack. Now, because the stack is, you know, fairly lightweight in terms of we don’t make a lot of calls through it, we can afford a little bit of overhead here. So there’s a bit of code that you go through to get to this place, but once you’ve started playing, and you’ll see this later in the slides, you don’t have to do a lot of calls to maintain the application playing. So this is actually kind of a top-level diagram of what the media server process looks like. So I’ve got this media player service. And it can instantiate a number of different players. So on the left-hand side, you’ll see, bottom, we have OpenCORE, Vorbis, and MIDI. And these are three different media player types. So going from the simplest one, which is the Vorbis player– Vorbis basically just plays Ogg Vorbis files, which is a– we’ll get into the specifics of the codec, but it’s a psycho-acoustic codec that’s open sourced.

We use this for a lot of our internal sounds, because it’s very lightweight. It’s pretty efficient. And so we use that for our ringtones and for our application sounds. The MIDI player, a little more complex. But basically, it’s just another instantiation of a media player. These all share a common interface, so if you look at the interface, there’s almost, you know, one-for-one correspondence between what you see there and what’s actually happening in the players themselves. And then the final one is OpenCORE. So anything that isn’t an Ogg file or a MIDI file is routed over to OpenCORE. And OpenCORE is basically the– the bulk of the framework. It consists of all of the major codecs, like, you know, MP3 and AAC and AMR and the video codecs, H.263 and H.264 and AVC. So any file that’s not specifically one of those two ends up going to OpenCORE to be played. Now, this provides some extensibility.

The media player service is smart enough to sort of recognize these file types. And we have a media scanner that runs at boot time– that goes out, looks at the files, figures out what they are. And so we can actually, you know, replace or add new player types by just instantiating a new type of player. In fact, there are some projects out there where they’ve replaced OpenCORE with GStreamer or other media frameworks. And we’re talking to some other– some different types of player applications that might have new codecs and new file types, and that’s one way of doing it. The other way of doing it is you– if you wanted to add a new file type, you could actually implement it inside of OpenCORE. And then on the right-hand side, we have the media recorder service. Prior to– in the 1.0, 1.1 releases, that was basically just an audio record path. In Cupcake, we’ve added video recording. So this is now integrated with a camera service. And so the media recorder– again, it’s sort of a proxy.

There’s a proxy, um– it uses the same sort of type of thing, where there’s a media recorder– media recorder object in the Java layer. And there’s a media recorder service that actually does the recording. And for the actual authoring engine, we’re using OpenCORE. And it has the– the encoder side. So we’ve talked about the decoders, and the encoders would be H.263, H.264, and also AVC. Sorry, and MPEG-4 SP. And then, the audio codecs. So all those sit inside of OpenCORE. And then the camera service both operates in conjunction with the media recorder and also independently for still images. So if your application wants to take a still image, you instantiate a camera object, which again is just a proxy for this camera service.

The camera surface takes care of handling preview for you, so again, we wanted to limit the amount of traffic between the application and the hardware. So this actually provides a way for the preview frames to go directly out to the display. Your application doesn’t have to worry about it, it just happens. And then in the case where the media recorder is actually doing video record, we take those frames into the OpenCORE and it does the encoding there. So kind of looking at what a media playback session would look like. The application provides three main pieces of data. It’s going to provide the source URI. The “where is this file coming from.” It’ll either come from a local file that’s on the– you know, on the SD card. It could come from a resource that’s in the application, the .apk, or it could come from a network stream. And so the application provides that information. It provides a surface that basically, at the application level, called a surface view. This, at the binder level, is an ISurface interface, which is an abstraction for the–the view that you see. And then it also provides the audio types, so that the hardware knows where to route the audio. So once those have been established, the media server basically takes care of everything from that point on. So you–once you have called the prepare function and the start function, the frames–video frames, audio frames, whatever, are– they’re going to be decoded inside the media server process. And they get output directly to either Audio Flinger or Surface Flinger, depending on whether it’s an audio stream or a video stream. And all the synchronization is handled for you automatically. Again, it’s a very low overhead. There’s no data that’s flowing back up to the application at this point–it’s all happening inside the hardware. One other reason for doing that we mentioned earlier is that in the case– in many cases, for example the G1 and the Sapphire, the device that you guys got today– those devices actually have hardware codecs. And so we’re able to take advantage of a DSP that’s in the device to accelerate. In the case of, for example, H.264, we can accelerate the decoded video in there and offload some of that from the main processor. And that frees the processor up to do other things, either, you know, doing sync in the background, or just all sorts of things that it might need– you might need those cycles for. So again, that’s– all that is happening inside the media server process. We don’t want to give applications direct access to the hardware, so it’s another good reason for putting this inside the media server process. So in the media recorder side, we have a similar sort of thing. It’s a little more complex. The application can either, in the case of– it can actually create its own camera and then pass that to the media server or it can let the media server create a camera for it. And then the frames from the camera go directly into the encoders. It again is going to provide a surface for the preview, so as you’re taking your video, the preview frames are going directly to the– to the display surface so you can see what you’re recording. And then you can select an audio source. Right now that’s just the microphone input, but in the future, it could be other sources. You know, potentially you could be recording from, you know, TV or some– some other hardware device that’s on the device. And then–so once you’ve established that, the camera service will then start feeding frames through the camera service up to the media server and then they’re pushed out to the Surface Flinger and they’re also pushed out into OpenCORE for encoding. And then there’s a file authoring piece that actually takes the frames from audio and video, boxes them together, and writes them out to a file. So, get into a little more detail about the codecs. We have a number of different– we have three different video codecs. So one of the questions that comes a lot– comes up a lot from the forums is what kind of codecs are available, what should they be used for, and things like that. So just kind of a little bit of history about the different codecs. So H.263 is a codec from– I think it was– came out about 1996, was when it was standardized. It was originally intended for video conferencing, so it’s really low bit-rate stuff. You know, designed to go over an ISDN line or something like that. So it’s actually worked out pretty well for mobile devices, and a lot of mobile devices support H.263. The encoder is pretty simple. The decoder is pretty simple. So it’s a lightweight kind of codec for an embedded device. It’s part of the 3GPP standard. So it’s adopted by a number of different manufacturers. And it’s actually used by a number of existing video sites– of websites– for their encode. For example, YouTube– if you go to, like, the, typically you’ll end up at an H.263 stream. Because it’s supported on most mobile devices. So MPEG-4 SP was originally designed as a replacement for MPEG-1 and MPEG-2. MPEG-1, MPEG-2–fairly early standardized codecs. They wanted to do something better. Again, it has a very simple encoder model, similar to H.263. There’s just single frame references. And there’s some question about whether it’s actually a better codec or not than H.263, even though they’re– they came out very close together. It’s missing the deblocking filter, so– I didn’t mention that before. H.263 has a deblocking filter. If you’ve ever looked at video, it typically comes out in, like, 8×8 pixel blocks. And you get kind of a blockiness. So there’s an in-loop deblocking filter in H.263, which basically smooths some of those edges out. The MPEG-4 SP, in its basic profile, is missing that. So it–the quality of MPEG-4, some people don’t think it’s quite as good, even though it came out at roughly the same time. Then the final codec we support is a fairly recent development. I think it’s a 2003, or something like that. The H.264 AVC codec came out. Compression’s much better. It includes the ability to have multiple reference frames, although on our current platforms, we don’t actually support that. But theoretically, you could get better compression in the main– what’s called the main profile. We support base profile. It has this mandatory in-loop deblocking filter that I mentioned before, which gets rid of the blockiness in the frames. One of the really nice things is it has a number of different profiles. And so different devices support different levels of–of profiles. It specifies things like frame sizes, bit rates, the–the types of advanced features that it has to support. And there’s a number of optional features in there. And basically, each of those levels and profiles defines what’s in those codecs. It’s actually used in a pretty wide range of things.

Everything from digital cinema, now, HDTV broadcasts, and we’re starting to see it on mobile devices like the G1. When you do a–if you’re using the device itself today, and you do a YouTube playback, you’re actually– on Wi-Fi, you’re actually getting a H.264 stream, which is why it’s so much better quality. On the downside, it’s a lot more complex than H.263 because it has these advanced features in it. So it takes a lot more CPU. And in the case of the G1, for example, that particular hardware, some of the acceleration happens in the DSP, but there’s still some stuff that has to go on the application processor. On the audio side, MP3 is pretty– everybody’s pretty familiar with. It uses what’s called a psycho-acoustic model, which is why we get better compression than a typical, you know, straight compression algorithm. So psycho-acoustic means you look for things in the– that are hidden within the audio. There are certain sounds that are going to be masked by other sounds. And so the psycho-acoustic model will try to pick out those things, get rid of them, and you get better– much better compression there. You get approximately 10:1 compression over a straight linear PCM at 128kbits per second, which is pretty reasonable, especially for a mobile device. And then if you want to, you know, be a purist, most people figure you get full sonic transparency at about 192kbits per second. So that’s where most people won’t be able to hear the difference between the original and the compressed version. For a more advanced codec, AAC came out sometime after MP3. It’s built on the same basic principles, but it has much better compression ratios. You get sonic transparency at roughly 128kbits persecond. So, you know, much, much better compression. And another mark that people use is 128kbits per second– MP3 is roughly equivalent to 96kbits per second AAC. We also find it’s– it’s used, commonly used, in MPEG-4 streams. So if you have an MPEG-4 audio–video stream, you’re likely to find an AAC codec with it. In the case of our high-quality YouTube streams, they’re typically a 96 kilohertz AAC format. And then finally, Ogg Vorbis, which I’d mentioned earlier, we’re using for a lot of our sounds. Again, it’s another psycho-acoustic model. It’s an open source codec, so it doesn’t have any patent, you know, issues in terms of licensing– whereas any of the other codecs, if you’re selling a device, you need to go, you know, get the appropriate patent licenses. Or I probably shouldn’t say that, because I’m not a lawyer, but you should probably see your lawyer. From our perspective, it’s very low overhead. It doesn’t bring in all of the OpenCORE framework, ’cause it’s just an audio codec. So it uses– it’s very lightweight in terms of the amount of memory usage it uses and also the amount of code space that it has to load in in order to play a file. So that’s why we use it for things like ringtones and other things that need fairly low latency and we know we’re gonna use it a lot. The other thing is that, unlike MP3– MP3 doesn’t have a native way of specifying a seamless loop. For those of you who aren’t audio guy– audio experts, “seamless loop” basically means you can play the whole thing as one seamless, no clips, no pops loop to play over and over again.

A typical application for that would be a ringtone, where you want it to continue playing the same sound over and over again without–without the pops and clicks. MP3 doesn’t have a way to specify that accurately enough that you can actually do that without having some sort of gap. There are people that have added things in the ID3 tags to get around that, but there isn’t any standardized way to do it. Ogg does it– actually, both Ogg and AAC have conventions for specifying a seamless loop. So that’s another reason why we use Ogg is that we can get that nice seamless loop. So if you’re doing anything in a game application where you want to get, you know, some sort of– a typical thing would be like an ambient sound that’s playing over and over in the background. You know, the factory sound or, you know, some eerie swamp noises or whatever. That’s the way to do it is to use the Ogg file. You’ll get pretty good compression. It’s pretty low overhead for decoding it. And you can get those loops that won’t click. And then finally, the last codecs we’re going to talk about in terms of audio are the AMR codecs.

AMR is a speech codec, so it doesn’t get the full bandwidth. If you ever try to encode one with music on it, it will sound pretty crappy. That’s because it– it wants to kind of focus in on one central tone. That’s how it gets its high compression rate. But at the same time, it throws away a lot of audio. So it’s typically used for video codecs. And in fact, GSM basically is based on AMR-type codecs. It’s–the input is, for the AMR narrow band, is 8 kilohertz. So going back to Nyquist, that basically means your highest frequency you can represent is just shy of 4 kilohertz. And the output bit-rates are, you know, anywhere from just under 5kbits per second up to 12.2. AMR wide band is a little bit better quality. It’s got a 16 kilohertz input, and slightly higher bandwidths. But again, it’s a speech codec primarily, and so you’re not going to get great audio out of it. We do use these, because in the package, the OpenCORE package, the AMR narrow band codec is the only audio encoder– native audio encoder we have in software. So if your hardware platform doesn’t have an encoder, that’s kind of the fallback codec. And in fact, if you use the audio recorder application like MMS, and attach an audio, this is the codec you’re going to get. If you do a video record today, that’s the codec you’re going to get. We’re expecting that future hardware platforms will provide, you know, native encoders for AAC. It’s a little too heavy to do AAC on the application processor while you’re doing video record and everything else. So we really need the acceleration in order to do it. AMR is specified in 3GPP streams. So most phones that will decode an H.263 will also decode the AMR. So it’s a fairly compatible format. If you look at the–the other phones that are out there that support, you know, video playback, they typically will support AMR as well. So we’ve talked about codecs.

Both audio and video codecs. The other piece of it, when you’re doing a stream, is what’s the container format? And so I’m going to talk a little bit about that. So 3GPP is the stream that’s defined by the 3GPP organization. These are phones that support that standard and are going to support these types of files. 3GPP is actually an MPEG-4 file format. But it’s–very, very restricted set of– of things that you can put into that file, designed for compatibility with these embedded devices. So you really want to use a H.263 video codec for–for broad compatibility across a number of phones. You probably want to use a low bit rate for the video, typically like 192kbits per second. And you also want to use the AMR narrow band codec. For MPEG-4 streams, which we also support, they’re typically higher quality. They typically are going to use either an H.264 or a higher– bigger size H.263 format.

Usually they use an AAC codec. And then on our particular devices, the G1 and the device that you just received today– I’m not even sure what we’re calling it– I– is capable of up to 500kbits per second on the video side and 96kbits per second. So a total of about 600kbits per second, sustained. If you do your encoding well, you’re going to actually get more than that out of it. We’ve actually been able to do better than 1 megabit per second, but you have to be– have a really good encoder. If it gets “burst-y,” it will interfere with the performance of the codec. So one question that comes up a lot on the forums is what container should I use if I’m either authoring or if I’m doing video recording? So for authoring for our Android device, if you want the best quality– the most bang for your bits, so to speak– you want to use an MPEG-4 codec– er, container file with an H.264 encoded stream. It needs to be, for these devices today, a baseline profile roughly, as I was saying before, at 500kbits per second HVGA or smaller, and AAC codec up to 96kbits per second.

That will get you a pretty high quality– that’s basically the screen resolution. So it looks really good on– on the display. For other– you’re going to create content on an Android device, so you have a video record application, for example. And you want to be able to send that via MMS or some other email or whatever to another phone, you probably want to stick to a 3GPP format, because not all phones will support an MPEG-4 stream, particularly the advanced codecs. So in that case we recommend… I’m getting ahead of myself here. So in that case we recommend using the QCIF format. That’s 192kbits per second. Now, if you’re creating content on the Android device itself, intended for another Android device, we have an H.263 encoder. We don’t have an H.264 encoder, so you’re restricted to H.263. And for the same reason I’ve discussed before, we won’t have an AAC encoder, so you’re going to use an AMR narrow band encoder, at least on the current range of devices. So those are kind of the critical things in terms of inter-operability with other devices. And then the other thing is– a question that comes up a lot is if I want to stream to an Android device, what do I need to do to make that work?

The thing where most people fail on that is the “moov” atom, which is the index of frames that tells–basically tells the organization of the file, needs to precede the data– the movie data atom. And…the… Most applications will not do that naturally. I mean, it’s more– it’s easier for a programmer to write something that builds that index afterwards. So you have– you typically have to give it a specific– you know, turn something on, depending on what the application is, or if you’re using FFmpeg, you have to give it a command line option that tell it to– to put that atom at the beginning instead of the end. So… For–we just recently came out with what we’ve been calling the Cupcake release, or the 1.5 release.

That’s the release that’s on the phones you just received today. Some of the new features we added in the media framework. We talked about video recording before. We added an AudioTrack interface and an AudioRecord interface in Java, which allows direct access to raw audio. And we added the JET interactive MIDI engine. These are kind of the– the highlights in the media framework area. So kind of digging into the specifics here… AudioTrack– we’ve had a lot of requests for getting direct access to audio. And…so what AudioTrack does is allow you to write a raw stream from Java directly to the Audio Flinger mixer engine. Audio Flinger is a software mixer engine that abstracts the hardware interface for you. So it could actually– it could mix multiple streams from different applications. To give you an example, you could be listening to an MP3 file while the phone rings. And the ringtone will play while the MP3 file is still playing. Or a game could have multiple sound effects that are all playing at the same time. And the mixer engine takes care of that automatically for you. You don’t have to write a special mixer engine. It’s in– built into the device. Potentially could be hardware accelerated in the future. And it also allows you to… It does sample rate conversion for you. So you can mix multiple streams at different sample rates.

You can modify the pitch and so on and so forth. So what AudioTrack does, it gives you direct access to that mixer engine. So you can take a raw Java stream, you know, 16-bit PCM samples, for example, and you can– you can send that out to the mixer engine. Have it do the sample rate conversion for you. Do volume control for you. It does– has anti-zipper volume filters so–if anybody’s ever played with audio before, if you change the volume, it changes the volume in discrete steps so you don’t get the pops or clicks or what we typically refer to as zipper noise. And that’s all done with… Either you can do writes on a thread in Java, or you can use the callback engine to fill the buffer. Similarly, AudioRecord gives you direct access to the microphone. So in the same sort of way, you could pull up a stream from the microphone.

You specify the sample rate you want it in. And, you know, with the combination of the two of those, you can now take a stream from the microphone, do some processing on it, and now put it back out via the… the AudioTrack interface too, that mixer engine. And that mixer engine will go wherever audio is routed. So, for example, a question that comes up a lot is, well, what if they have a Bluetooth device? Well, that’s actually handled for you automatically. There’s nothing you have to do as an application programmer. If there’s a Bluetooth device paired that supports A2DP, then that audio is going to go directly to the…to the A2DP headset. Your…whether it’s a headset or even your car or whatever. And then we’ve got this call mack– callback mechanism so you can actually just set up a buffer and just keep– when you get a callback, you fill it. You know, if you’re doing a ping-pong buffer, where you have half of it being filled and the other half is actually being output to the device. And there’s also a static buffer mode where you give it a– for example, a sound effect that you want to play and it only does a single copy. And then it just automatically mixes it, so each time you trigger the sound, it will mix it for you, and you don’t have to do additional memory copies. So those are kind of the big highlights in terms of the– the audio pieces of it.

Then another new piece that’s actually been in there for a while, but we’ve finally implemented the Java support, is the JET Interactive MIDI Engine. So JET is– it’s based upon the EAS MIDI engine. And what it does is allow you to pre-author some content that is very interactive. So what you do is you, if you’re an author, you’re going to create content in a– your favorite authoring tool. Digital authoring workstation tool. It has a VST plugin, so that you can, you know, basically write your– your game code– your–your audio in the tool and hear it back played as it would be played on the device. You can take and have multiple tracks that are synchronized and mute them and unmute them synchronous with the segment. So basically, your piece is going to be divided up into a bunch of little segments. And just as an example, I might have an A section, like the intro, and maybe I have a verse and I have a chorus. And I can interactively get those to place one after another. So, for example, if I have a game that, um– it has kind of levels, I might start with a certain background noise, and perhaps, you know, my character’s taking damage. So I bring in some little element that heightens the tension in the game and this is all done seamlessly. And it’s very small content, because it’s MIDI. And then you can actually have little flourishes that play in synchronization with it– with the music that’s going on. So some–for example, let’s say you, you know, you take out an enemy. There’s a little trumpet sound or whatever. A sound effect that’s synchronized with the rest of the– the audio that’s playing. Now all this is done under– under program control. In addition to that, you also have the ability to have callbacks that are synchronized. So a good example would be a <i>Guitar Hero</i> type game where you have music playing in the background. What you really want to do is have the player do something in synchronization with the rhythm of the sound. So you can get a callback in your Java application that tells you when a particular event occurred. So you could create these tracks of–of events that you’ve been– you know, measured– did they hit before or after? And we actually have a sample application in the SDK that shows you how to do this. It’s a–I think a, like, two- or three-level game that with– complete with graphics and sound and everything to show you how to do it. The code–the code itself is written in native code that’s sitting on top of the EAS engine, so again, in keeping with our philosophy of trying to minimize the– the overhead from the application, this is all happening in background. You don’t have to do anything to keep it going other than keep feeding it segments. So periodically, you’re going to wake up and say, “Oh, well, here’s the next segment of audio to play,” and then it will play automatically for whatever the length of that segment is. It’s all open source. Not only is the– the code itself open source, but the tools are open sourced, including the VST plugin. So if you are ambitious and you want to do something interesting with it, it’s all sitting out there for you to play with. I think it’s out there now. If not, it will be shortly. And so those are the big highlights of the– the MIDI– the MIDI engine. Oh, I forgot. One more thing.

The DLS support– so one of the critiques of general MIDI, or MIDI in general, is the quality of the instruments. And admittedly, what we ship with the device is pretty small. We try to keep the code size down. But what the DLS support does with JET is allow you to load your own samples. So you can either author them yourself or you can go to a content provider and author these things. So if you want a high-quality piano or you want, you know, a particular drum set, you’re going for a techno sound or whatever, you can actually, you know, put these things inside the game, use them as a resource, load them in and– and your game will have a unique flavor that you don’t get from the general MIDI set. So… I wanted to talk about a few common problems that people run into. Start with the first one here. This one I see a lot. And that is the behavior of the application for the volume control is– is inconsistent. So, volume control on Android devices is an overloaded function. And as you can see from here, if you’re in a call, what the volume control does is adjust the volume that you’re hearing from the other end of the phone. If you’re not in a call, if it’s ringing, pressing the volume button mutes the–the ringer.

Oh, panic. I’m in a, you know, middle of a presentation and my phone goes off. So that’s how you mute it. If we can detect that a media track is active, then we’ll adjust the volume of whatever is playing. But otherwise, it adjusts the ringtone volume. The issue here is that if your– if your game is– or your application is just sporadically making sounds, like, you know, you just have little UI elements or you play a sound effect periodically, you can only adjust the volume of the application during that short period that the sound is playing. It’s because we don’t actually know that you’re going to make sound until that particular instant. So if you want to make it work correctly, there’s an– there’s an API you need to call. It’s in–it’s part of the activity package. It’s called setVolumeControlStream. So you can see a little chunk of code here.

In your onCreate, you’re going to call this setVolumeControlStream and tell it what kind of stream you’re going to play. In the case of most applications that are in the foreground, that are playing audio, you probably want streamed music, which is kind of our generic placeholder for, you know, audio that’s in the foreground. If your ringtone application, for some– you know, you’re playing ringtones, and you would select a different type. But this basically tells the activity manager, when you press the audio button, if none of those… previous things are– in other words, if we’re not in call, if it’s not ringing, and if there’s– if– if none of these other things are happening, then that’s the default behavior of the volume control. Without that, you’re probably going to get pretty inconsistent behavior and frustrated users. That’s probably the number one problem I see with applications in the marketplace today is they’re not using that. Another common one I see on the–in a– on the forums is people saying, “How do I–how do I play a file from my APK? “I just want to have an audio file that I ship with the– with the package,” and they get this wrong for whatever reason. I think we have some code out there from a long time ago that looks like this. And so this doesn’t work. This is the correct way to do it. So there’s this AssetFileDescriptor. I talked a little bit earlier about the binder object and how we pass things through, so we’re going to pass the file descriptor, which is a pointer to your resource, through the binder to the… I don’t know how that period got in there. It should be setDataSource. So it’s setDataSource, takes a FileDescriptor, StartOffset, and a Length, and so what this will do is, using a resource ID, it will find, you know, open it, find the offset where that raw– that resource starts. And it will, you know, pass– set those values so that we can tell the media player where to find it, and the media player will then play that from that offset in the FileDescriptor. I had another thought there.

Raw resources, make sure that when you put your file in, you’re putting it in as a raw resource, so it doesn’t get compressed. We don’t compress things like MP3 files and so on. They have to be in the raw directory. Another common one I see on the forums is people running out of MediaPlayers. And this is kind of an absurd example, but, you know, just to give you a point. There is a limited amount of resources. This is an embedded device. A lot of people who are moving over from the desktop don’t realize that they’re working with something that’s, you know, equivalent to a desktop system from maybe ten years ago. So don’t do this. If you’re going to use MediaPlayers, try to recycle them. So our solution is, you know, there are resources that are actually allocated when you create a MediaPlayer. It’s allocating memory, it may be loading codecs. It may–there may actually be a hardware codec that’s been instantiated that you’re preventing the rest of the system from using. So whenever you’re done with them, make sure you release them. So you’re going to call release, you set null on the MediaPlayer object. Or you can call reset and set– do a new setDataSource, which, you know, is basically just recycling your MediaPlayer. And try to keep it to, you know, two or three maximum. ‘Cause you are sharing with other applications, hopefully. And so if you get a little piggy with your MediaPlayer resources, somebody else can’t get them. And also, if you go into the background– so, and you’re in– on pause, you definitely want to release all of your MediaPlayers so that other applications can get access to them. Another big one that happens a lot is the CPU… “My CPU is saturated.” And you look at the logs and you see this. You know, CPU is– is– can’t remember what the message is now. But it’s pretty clear that the CPU is unhappy. And this is kind of the typical thing, is that you’re trying to play too many different compressed streams at a time. Codecs take a lot of CPU resources, especially ones that are running on software. So, you know, a typical, say, MP3 decode of a high-quality MP3 might take 20% of the CPU. You add up two or three of those things, and you’re talking about some serious CPU resources. And then you wonder why your, you know, frame rate on your game is pretty bad. Well, that’s why. So we actually have a solution for this problem. It’s called SoundPool. Now, SoundPool had some problems in the 1.0, 1.1 release. We fixed those problems in Cupcake. It’s actually pretty useful. So what it allows you to do is take resources that are encoded in MP3 or AAC or Ogg Vorbis, whatever your preferred audio format is. It decodes them and loads them into memory so they’re ready to play, and then uses the AudioTrack interface to play them out through the mixer engine just like we were talking about before. And so you can get much lower overhead. You know, some are in the order of about 5% per stream as compared to these, you know, 20% or 30%. Depending on what the audio codec is. So it gives you the same sort of flexibility. You can modify–in fact, it actually gives you a little more flexibility, because you can set the rates. It can– will manage streams for you. So if you want to limit the number of streams that are playing, you tell it upfront, “I want,” let’s say, “eight streams maximum.” If you exceed that, it will automatically, based on the priority, you know, select the least priority, get rid of that one, and start the new sound. So it’s kind of managing resources for you. And then you can do things like pan in real time. You can change the pitch. So if you want to get a Doppler effect or something like that, this is the way to do it. So that’s pretty much it. We have about ten minutes left for questions, if anybody wants to go up to a microphone. [applause] Thank you. man: Hi, thank you. That was a great talk. Is setting the streamed music, so you can respond to the volume control– do you have to do that every time you create a new activity, or is it sticky for the life of the app? Sparks: It’s sticky– you’re going to call it in your onCreate function. man: But in every single activity?

Sparks: Yeah, yeah. man: Okay. man: Hi, my first question is that currently, Android app store using the OpenCORE for the multimedia framework. And my question is that does Google has any plan to support any other middleware, such as GStreamer or anything else? Sparks: Not at this time. We don’t have any plans to support anything else. man: Okay. What’s the strategy of Google for supporting other pioneers providing this multimedia middleware? Sparks: Well, so, because of the flexibility of the MediaPlayer service, you could easily add another code–another media framework engine in there and replace OpenCORE. man: Okay. So my second question is that, um– [coughs] that currently– Google, you mentioned implementing the MediaPlayer and the recording service. Is there any plan to support the mobile TV and other, such as video conference, in frameworks? Sparks: We’re–we’re looking at video conferencing. Digital TV is probably a little bit farther out. We kind of need a platform to do the development on. So we’ll be working with partners. Basically, if there’s a partner that’s interested in something that isn’t there, we will–we can work with you on it. man: Okay, thank you. man: Does the media framework support RTSP control?

Sparks: Yes. So RTSP support is not as good as we’d like it to be. It’s getting better with every release. And we’re expecting to make some more strides in the next release after this. But Cupcake is slightly better. man: And that’s specified by… in the URL, by specifying the RTSP? Sparks: Yeah. Right. man: Okay. And you mentioned, like, 500 kilobits per second being the maximum, or– What if you tried to play something that is larger than that? Sparks: Well, the codec may fall behind. What will typically happen is that you’ll get a– if you’re using our MovieView, you’ll get an error message that says that it can’t keep up. man: Mm-hmm. So it will try, but it will– It might fall behind.

Sparks: Yeah. man: Thank you. man: My question is ask– how about– how much flexibility we have to control the camera services? For example, can I control the frame rate, and the color tunings, and et cetera? Sparks: Yeah, some of that’s going to depend on the– on the device. We’re still kind of struggling with some of the device-specific things, but in the case of the camera, there’s a setParameters interface. And there’s access, depending on the device, to some of those parameters. The way you know that is, you do a setParameter. Let’s say you ask for a certain frame rate. You–you do a getParameter. You find out if it accepted your frame rate or not. Because there’s a number of parameters. man: Yeah, but also, in the– for example, the low light. So you want–not only you want to slow the frame rate, but also you want to increase the integration time. Sparks: Right. man: So in the– sometimes you want, even in the low light, but you want to slow the frame rate. But you still want to keep the normal integration time. So how you–do you have those kind of flexibility to control? Sparks: Well, so that’s going to depend on whether the hardware supports it or not. If the hardware supports it, then there should be a parameter for that. One of the things we’ve done is– for hardware dev– manufacturers that have specific things that they want to support, that aren’t like, standard– they can add a prefix to their parameter key value pairs. So that will, you know– it’s unique to that device. And we’re certainly open to manufacturers suggesting, you know, new– new standard parameters. And we’re starting to adopt more of those. So, for example, like, white balance is in there. Scene modes, things like that are all part of it. man: Okay.

Sparks: Yeah. man: I was wondering what kind of native code hooks the audio framework has? I’m working on an app that basically would involve, like, actively doing a fast Fourier transform, you know, on however many samples you can get at a time. And so, it seems like for now– or in the Java, for example, it’s mostly built toward recording audio and– and doing things with that. What sort of active control do you have over the device?

Sparks: So officially, we don’t support native API access to audio yet. The reason for that is, we, you know– any API we publish, we’re going to have to live with for a long whi– a long time. We’re still playing with APIs, trying to, you know, get– make them better. And so the audio APIs have changed a little bit in Cupcake. They’re going to change again in the next two releases. At that point, we’ll probably be ready to start providing native access. What you can do, very shortly we’ll have a native SDK, which will give you access to libc and libm. You can get access to the audio from the Java– official Java APIs, do your processing in native code, and then feed it back, and you’ll be able to do that without having to do MEMcopies. man: And so basically, that would just be accessing the buffer that the audio writes to. And also, just a very tiny question about the buffer.

Does it– does it loop back when you record the audio? Or is it–does it record in, essentially, like, blocks? Do you record an entire buffer once in a row, or does it sort of go back to the start and then keep going?

Sparks: You can either have it cycle through a static buffer, or you can just pass in new buffers each time, depending on how you want to use it. man: Okay. Thanks. man: Let’s say you have a game where you want to generate a sound instantly on a button press or a touch. Sparks: “Instantly” is a relative term. man: As instantly as you can get. Would you recommend, then, the JET MIDI stuff, or an Ogg, or what? Sparks: You–you’re probably going to get best results with SoundPool, because SoundPool’s really aimed at that. What SoundPool doesn’t give you– and we don’t have an API for it, we get a lot of requests for it, so, you know, it’s on my list of things to do– is synchronization. So if you’re trying to do a rhythm game where you–you want to be able to have very precise control of–of, say, a drum track– you–there isn’t a way to do that today. But if you’re just trying to do– man: Like gunfire kind of thing.

Sparks: Gunfire? SoundPool is perfect for that. That’s–that’s what it was intended for. man: Yeah, if I use the audio mixer, can I control the volume of the different sources differently? Sparks: Yes. man: Okay. Sparks: So, SoundPool has a volume control for each of its channels that you– basically, when you trigger a SoundPool sound, you get an ID back. And you can use that to control that sound. If you’re using the AudioTrack interface, there’s a volume control interface on it. man: My question is, for the testing sites, how– does Google have a plan to release a certain application or testing program to verify MediaPlayer and other media middleware like this? Sparks: Right. man: 3D and everything else? Sparks: So we haven’t announced what we’re doing there yet. I can’t talk about it. But it’s definitely something we’re thinking about. man: Okay. Another question is about the concurrency there for the mobile devices. The resource is very limited. So for example, the service you mentioned. The memory is very limited. So how do we handle any– or maybe you have any experience– handle the 3D surface and also the multimedia surface and put together a raw atom surface or something like that? Sparks: So when you say “3D,” you’re talking about– man: Like OpenGL, because you do the overlay and you use the overlay and you– Sparks: Yeah, I’m– I’m not that up on it. I’m not a graphics guy.

I’m really an audio guy. But I actually manage the team that does the 3D stuff. So I’m kind of familiar with it. There’s definitely limited texture memory that’s available–that’s probably the most critical thing that we’re running into– but obviously, you know, that– we’re going to figure out how to share that. And so– I don’t have a good answer for you, but we’re aware of the problem. man: Okay. Yeah. Just one more question is do you have any plan to move OpenGL 2.0 for the Android? Sparks: Yes. If you– man: Do you have a time frame? Sparks: Yeah, if you’re following the master source tree right now, you’ll start to see changes come out for– we’re–we’re marrying 2D and 3D space. So the 2D framework will be running as an OpenGL context, which will allow you, then, to, you know– ES 2.0 context. So you’ll be able to share between the 3D app and the 2D app. Currently, if you have a 3D app, it takes over the frame buffer and nothing else can run. You’ll actually be able to run 3D inside the 2D framework. man: Okay, thank you. man: I think this question is sort of related. I was wondering how would you take, like, the– the surface that you use to play back video and use it as a texture, like in OpenGL? Sparks: That’s coming, yeah. Yeah, that–so you actually would be able to map that texture onto a 3D– man: Is there any way you can do that today with the current APIs?

Sparks: Nope. Yeah, there’s no access to the– to the video after it leaves the media server. man: And no time frame as far as when there’ll be some type of communication as far as how to about doing that in your applications? Sparks: Well, it’s– so it’s in our– what we call our Eclair release. So that’s master today. man: Okay. Okay, thank you. Sparks: I think– are we out of time? woman: [indistinct] Sparks: Okay. woman: Hi, do you have any performance metrics as to what are the performance numbers with the certain playback of audio and video to share, or any memory footprints available that we can look up, maybe?

Sparks: Not today. It’s actually part of some of the work we’re doing that somebody was asking about earlier. That I can’t talk about yet. But yeah. There’s definitely some– some plans to do metrics and to have baselines that you can depend on. woman: And then the second question that I have is that do you have any additional formats that are lined up or are in the roadmap? Like VC-1 and additional audio formats? Sparks: No, not– not officially, no. woman: Okay. woman: Hi, this is back to the SoundPool question. Is it possible to calculate latency or at least know, like, when the song actually went to the sound card so I could at least know when it actually did play– if there’s any sort of callback or anything? Sparks: So you can get a playback complete callback that tells you when it left the player engine. There’s some additional latency in the hardware that we…we don’t have complete visibility into, but it’s reported back through the audio track interface, theoretically, if it’s done correctly. So at the MediaPlayer level, no. At the AudioTrack level, yes. If that’s…makes any sense. woman: Okay, so I can at least get that, even if I can’t actually calculate latency for every single call?

Sparks: Right, right. woman: Okay. Thank you. Sparks: Uh-huh. man: Yeah, this is a question about the samples processing. You partially touched upon that. But in your architecture diagram, where do you think the sound processing effect really has to be placed? For example, it could be an equalizer or different kind of audio post processing that needs to be done. Because in the current Cupcake version, 1.5, I do not see a placeholder or any implementation of that sort. Sparks: So one of the things we’re in the process of doing is we’re– we’re looking at OpenAL– Have I got that right? OpenAL ES? As the, um–possibly the– an abstraction for that. But it definitely is something you want to do on an application-by-application basis. For example, you don’t want to have effects running on, you know, a notification if… The–you–you wouldn’t want the application in the foreground and forcing something on some other application that’s running in background. So that’s kind of the direction we’re headed with that. man: What’s the current recommendation? How do you want the developers to address? Sparks: Well, the– since there isn’t any way, there’s no recommendation. I mean, if you were doing native code, it’s kind of up to you. But our recommendation would be if you’re, you know, doing some special version of the code, you would probably want to insert it at the application level and not sitting at the bottom of the Audio Flinger stack. man: Okay, thanks. woman: Is it better to get the system service once and share it across activities in an application, or let each activity fetch the service? Sparks: I mean, there’s a certain amount of overhead, ’cause it’s a binder call to do it. So if you know you’re going to use it, I would just keep it around. I mean, it’s just a– a Java object reference. So it’s pretty cheap to hold around. man: Is there any way to listen to music on a mono Bluetooth? Sparks: Ah, on a SCO? Yeah, no. [chuckles] The reason we haven’t done that is the audio quality is really pretty poor. I mean, it’s designed for– for call audio. So the experience isn’t going to be very good.

Theoretically, you know, it’s possible. We just don’t think it’s a good idea. [chuckling] man: If you want to record for a long period of time, you know, like a half-hour, can you frequency scale the processor or put it to sleep, or… Sparks: It–well, that happens automatically. I mean, it’s– it’s actually going to sleep and waking up all the time. So it’s just depending on what’s– man: But if you’re doing, like, a raw 8k sample rate, how big a buffer can you have, and then will it sleep in– while that buffer’s filling? Sparks: So the–the size of those buffers is defined in the media recorder service. And I think they’re… I want to say they’re like 2– 2k at… whatever the output rate is. So they’re pretty good size. I mean, it’s like a half a second of audio. So the processor, theoretically, would be asleep for quite some time. man: So is that handled by the codec, or is it handled by– I mean, the DSP on a codec? Or is it handled by– Sparks: So the… the process is going to wake up when there’s audio available. It’s going to… you know, route it over to the AMR encoder. It’s going to do its thing. Spit out a bunch of bits that’ll go to the file composer to be written out. And then theoretically, it’s gonna go back to sleep again. man: No, I mean on the recorder. If you’re recording the audio. If you’re off the microphone. Sparks: I’m sorry? man: If you’re recording raw audio off the microphone. Sparks: Yeah. Oh, oh, are you talking about using the AudioTrack or AudioRecord interface? man: The AudioRecord interface. ADPCM. Sparks: Yeah, that’s… So it’s pretty much the same thing. I mean, if you define your buffer size large enough, whatever that buffer size is, that’s the buffer size it’s going to use at the lower level. So it’ll be asleep for that amount of time. man: And the DSP will be the one filling the buffer? Sparks: Yeah, yeah. The DSP fills the buffer. man: All right, thanks. man: One last question. From a platform perspective, would you be able to state a minimum requirement on OpenGL performance? Sparks: I’m not ready to say that today. But… at some point we’ll– we’ll be able to tell you about that. man: Okay, thanks. Sparks: Uh-huh. Guess that’s my time. Thanks, everyone. [applause]

Leave a Reply

Your email address will not be published. Required fields are marked *