Wednesday, July 7, 2010

What is a score video and how are they made?

Lately, I have been receiving the following question a lot on YouTube:

How do you make this?

The question refers to a genre of videos I refer to as "score videos" (see Vid. 1.). In the most general sense, a score video is a slide show comprised of screen captures of digital representations of organized music notation accompanied by a computer-generated audio track that is loosely synchronized with what notation is being displayed on the viewable area at any given time.

Vid. 1. Score video of my original piano arrangement of Silence and Motion, a piece of incidental music composed by Nobuo Uematsu for the game Final Fantasy VIII (PlayStation).

Creating a score video takes many steps. The entire process can take anywhere from a few hours to several days, depending on many factors such as my interest in the project how much time I have. While the order given below is not fixed it describes one of the possible work flows.

Conceptualize

You can't make a score video without a score (haha). So that first step is to compose the darn thing. There are several ways I go about this. I may:

  • improvise at the piano and come up with something that I think has potential for a composition;
  • come up with something in my head on the bus on the way to work; or
  • decide to arrange a piece of existing music that is not for solo piano for solo piano.

Composing and outputting a score

When people talk about composing a piece of music, we may imagine the idea of a composer improvising at the piano and jotting down ideas on paper with a pencil. Or if music notation is not your thing, you may prefer to memorize the general chord structure of your composition and recall it from memory when it is time to share it with others. Anyway, why am I saying all this? One of my key points in this section is that the definition of the act of composing a work is not well defined. It varies among different composers from different backgrounds. Composing, to me, then, is the combined act of conceptualizing and notating ones musical composition in its most clear form and in as much detail as necessary to communicate the composer's intent.

Using my own definition of composing music from above, the first step in the process, then, is to conceptualize musical ideas and then to notate and organize them using a musical notation software, such as Sibelius. This includes notating details such as phrasing, articulations, dynamics, etc. Once you are satisfied with the layout of everything, you are ready to output the composition as a score. (At this point, the work flow diverges and becomes non-linear). What I do is first save the score as a PDF and then manipulate it later. I use a printer driver called CutePDF, which allows you to save any printable output as a PDF. PDFs are quite handy in that, unlike editable files such as Word documents, everything is locked in place, and all fonts are automatically embedded, so you don't have to worry about sharing files with people who may not have your fonts.

(The following is an aside.) Being a classically-trained musician, music notation is something I hold to a high regard. It is a means of universal communication. In the most general sense, music notation is a means of communicating musical ideas to others without having to perform the piece. It is quite useful for the sake of musical analysis. What I am about to say may not be well received, especially by composers of contemporary music, but it is a belief that forms the basis of my philosophy on music composition. I feel that in order for a composer to call something his own composition, he has to be able to preserve it in one way or the other, whether it means to be able to perform it more than once, or to keep a record of the work by notating it. Now, whether this is accurate is for a different discussion, but it is definitely an important point worth mentioning.

Cut and splice the score

Since we are uploading our score video up on YouTube, the pre-defined video dimensions put forth by YouTube will be the constraints on how we can optimize our screen captures. Generally speaking, we want to display as much notation on a screen with a height:width ratio of 16:9. (There are other constraints and aesthetic decisions that come later on.) Now the PDF we have generated was probably optimized for letter-sized printing (8.5 inches by 11 inches) so we have to render, scale and crop.

So I open up the PDF page-by-page in Photoshop, starting with the first page. Since Photoshop deals with raster graphics, which is the format we will be using, we need to first render the vector graphics as bitmaps. Keeping the original dimensions (8.5 inches by 11 inches) I set the DPI (or PPI) to 300, as it will produce a bitmap that is not too large or too small to work with (I think this was an arbitrary decision on my part). So keeping with the constraint that all screen captures must be optimized for a 16:9 screen, I cut the bitmap into smaller bitmaps, which I refer to as “views.” (In fact, I even name the these bitmaps as “view01.bmp”, “view02.bmp”, etc.) Since each screen capture may contain a different number of systems, the page breaks of the original PDF may not always coincide with a (for the lack of a better word) “screen-capture break.” So you may have to splice together two systems from adjacent pages.

Free-floating elements that are not time dependent, such as the (i) title and credits; copyright info (ii); and (iii) footnotes are handled separately. The title and credits I usually include on the first view, above the first system. Copyright info, which generally appears on the first page of the PDF, is moved onto the last view of the slide show. Similarly, footnotes that appear at the bottom of a page in the PDF are moved to the bottom of the screen on which the reference now resides.

This step is pretty much reflowing your PDF document for a format where pages have the dimensions 16:9, sort of like a music book in landscape orientation, with very large print. On a more aesthetic note (pun intended!) you may prefer to display the same number of systems on a each page, or follow some sort of convention that you put forth. Interestingly, because we have reflowed the document, our repetition points have also moved around, so a start-repeat barline that appeared at the top of a page in the PDF may well appear on the second system of a view. When we put together the slide show later on, it may be confusing to the viewer who is trying to find his place on the screen following a jump due to an end-repeat barline.

So what we will have is a collection of views optimized for a slide show that will be uploaded to YouTube.

Audio production

The most obvious way would be, perhaps, to audio record yourself or an ensemble performing the music, maybe recording studio or something to that effect. This is a lengthy process, and depending on what resources you have, it can also be quite costly. So instead, what I do is output an audio file using Sibelius, directly. But before we do so, we have to tweak the performance so that it sounds less mechanical.

One of the things I love about Sibelius is it supports sample libraries, so you can preview your work as you go with pre-sampled sounds on the fly. The stuff you notate in Sibelius is stored as MIDI instructions. A single note has properties such as pitch, duration, and even articulation. However, the audio generated by Sibelius is not perfect (perfect in the sense that it captures your intended ideal performance). Maybe a note will be too loud. Maybe you want manually control the rubato (natural variation in tempo to express emotion). So we have to tweak it. Sibelius facilitates this using its “dictionary” mechanism, a collection of keywords that automatically apply or remove audio effects. For example, if you want a particular section of your composition to be played using the pizzicato technique, all you need to do is simply enter pizz. or pizzicato on your score and attach it to the note at the point at which you would like the technique to be applied. To end the usage of the technique, you might do the same at a later point but using the indication arco or norm. These keywords are stored in the dictionary. Basically, if Sibelius comes across a keyword in your score, the proper audio effect will be applied.

So this step is basically doing the above for every instance where you want to breathe some life into your computer-generated audio. For me, it is generally one of the most time-consuming steps of the process. When you have previewed the audio in Sibelius and are more or less satisfied with the results, it is time to output the audio. Currently, for this purpose, Sibelius can only export WAV files, but that is good, because we want high-quality media. When the final product is uploaded to YouTube, there will be some degree of compression, so the more detail we have, the better! Once you have outputted the WAV file, if you like, you may want to edit it further using audio editing tools. I use a software called Audacity. Sometimes the audio isn't loud enough, so I use a feature called “compress” which amplifies the entire waveform such that the highest point in the waveform is the maximum and prevents peaking.

Putting it all together

Just to summarize, this is all the usable media we have at this point:

  • a collection of slide show views (bitmap files); and
  • a computer-generated audio representation of your composition (WAV file).

Now what want to do is put together a slideshow that synchronizes your audio with your collection of slide show views. The software I use for this step is

(drum roll please ...)

Windows Movie Maker. Appalled? I am sure there are far more efficient methods of doing what I do, but I have used Windows Movie Maker for quite some time, and I suppose I have gotten used to the work flow.

This step is probably the most straightforward of them all. First, I import all my usable media into the “collections” interface. I then place the audio onto the time line. And then one by one, I place the slide show views onto the visual track of the time line, adjusting the viewing durations as necessary. From my experience in my early days of doing score videos, when a video gets uploaded to YouTube, the audio and visual lose some degree of synchronicity part way into the video, so for this reason, fading between views is a good idea as it gives the viewer time to direct his vision to the correct location on the next screen as it comes into view.

This concludes my explanation of the entire score video process. I hope it has been insightful and inspiring in some way. (If you are still reading, I assume that is true to some extent.) Please feel free to leave a comment or question. I am always up for discussion.