What do spectrograms of sentences show




















A voiced plosive may have a low-frequency voicing bar of striations, usually thought of as the sound of voicing being transmitted through the flesh of the vocal tract. However, due to passive devoicing , it may not. And due to perseverative voicing even a 'voiceless' plosive may show some vibration as the pressures equalize and before the vocal folds fully separate. But let's not get lost in too many details. Generally we can think about the English plosives as occurring at three places of articulation—at the lips, behind the incisors, and at the velum with some room to play around each.

The bilabial plosives, [p] and [b] are articulated with the lower lip pressed against the upper lip. The coronal plosives [t,d] are made with the tongue blade pressing against the alveolar ridge or thereabouts.

I tend to use the 'dorsal' and 'velar' interchangeably, which is very bad. I use 'coronal' because it's more accurate than 'alveolar', in the sense that everybody uses their tongue blade if not the apex for [t,d], but not everybody uses only their alveolar ridge. That controversy aside, the thing to remember is that during a closure, there's no useful sound coming at you—there's basically silence. So while the gap tells you it's a plosive, the transitions into and out of the closure i.

Figure 4 contains spectrograms of me saying 'bab' 'dad' and 'gag'. Figure 4. Spectrograms of "bab" "dad" and "gag". There's no voicing during the initial closure of any of these plosives, confirming what your teachers have always told you: "voiced" plosives in English aren't always fully voiced during closure.

Then suddenly, there's a burst of energy and the voicing begins, goes for a couple hundred milliseconds or so, followed by an abrupt loss of energy in the upper frequencies above Hz or so , followed by another burst of energy, and some noise.

The first burst of energy is the release of the initial plosive. Notice the formants move or change following the burst, hold more or less steady during the middle of the vowel, and then move again into the following consonant. We know there's a closure because of the cessation of energy at most frequencies. The little blob of energy at the bottom is voicing, only transmitted through flesh rather than resonating in the vocal tract. Look closely, and you'll see that it's striated, but very weak.

The final burst is the release of the final plosive, and the last bit of noise is basically just residual stuff echoing around the vocal tract. Take a look at those formant transitions out of and into each plosive. Notice how the transitions in the F2 of 'bab' point down i. Notice how in 'gag' the F2 and F3 start out and end close together? Notice how the F3 of 'dad' points slightly up at the plosives?

Notice how the F1 always starts low, rises into the vowel, and then falls again. Okay, these aren't necessarily the best examples, but basically, labials have downward pointing transitions usually all visible formants, but especially F2 and F3 , dorsals tend to have F2 and F3 transitions that 'pinch' together hence 'velar pinch' , and the the F3 of coronals tends to point upward. The direction any transition points obviously is going to depend on the position of the formant for the vowel, so F2 of [t,d] might go up or down.

A lot of people say coronal transitions point to about or Hz, but that's going to depend a lot on speaker-individual factors. Generally, I think of coronal F2 transitions as pointing upward unless the F2 of the vowel is particularly high. Another thing to notice is the burst energy.

Notice that the bursts for "dad" are darker stronger than the others. Notice also that they get darker in the higher frequencies than the lower. The burst of [b] is sort of broad—across all frequencies, but concentrated in the lower frequencies, if anywhere. So bursts and transitions also give you information about place. In final position, they tend to surface as voiced, although there's room for variation here too.

Frankly, fricatives are not my favorite. They're acoustically and aerodynamically complex, not to mention phonologically and phonetically volatile. There's not a lot you can say about them without getting way too complicated, but I'll try. Fricatives, by definition, involve an occlusion or obstruction in the vocal tract great enough to produce noise frication. Frication noise is generated in two ways, either by blowing air against an object obstacle frication or moving air through a narrow channel into a relatively more open space channel frication.

In both cases, turbulence is created, but in the second case, it's turbulence caused by sudden 'freedom' to move sideways Keith Johnson uses the terrific analogy of a road suddenly widening from two to four lanes, with a lot of sideways movement into the extra space , as opposed to air crashing around itself having bounced off an obstacle Keith's freeway analogy of a road narrowing from four lanes to two works here, but I don't really want to think about serious sibilance in this respect Sibilant fricatives involve a jet of air directed against the teeth.

While there is some channel turbulence, the greater proportion of actual noise is created by bouncing the jet of air against the upper teeth. The result is very high amplitude noise. Non-sibilant fricatives are more likely 'pure' channel fricatives, particularly bilabial and labiodental fricatives, where there's not a lot of stuff in front to bounce the air off of.

In Figure 5 , there are spectrograms of the fricatives, extracted from a nonce word "uffah", "ussah", etc. Figure 5: Top row, left to right: f, theta, s, esh. Bottom row, left to right: v, eth, z, yogh. Let's start with the sibilants "s" and "sh", in the upper right of Figure 5. They are by far the loudest fricatives. The darkest part of [s] noise is off the top of the spectrograms, even though these spectrograms have a greater frequency range than the others on this page.

The postalveolar "sh", on the other hand, while almost as dark, has most of its energy concentrated in the F3-F4 range. Remember, however, that a lot of underlyingly voiced fricatives in English have voiceless allophones.

What other cues are there to underlying voicing? Take a good look at the voicing bar through the fricatives in the bottom row. You may never see a fully voiced fricative from me again. Labiodental and inter dental nonsibilant fricatives are notoriously difficult to distinguish, since they're made at about the same place in the vocal tract i.

Having established in a mystery spectrogram that a fricative isn't loud enough to be a sibilant, you can sometimes tell from transitions whether it is labiodental or interdental—labiodental will have labial-looking transitions, interdentals might have slightly more coronal looking transitions. But that's poor consolation—often underlying labiodental and interdental fricatives don't have a lot of noise in the spectrogram at all, looking more like approximants.

Sometimes, the lenite into approximants, or fortisize to stoppy-looking things. I hate fricatives. Before moving on, we need to talk about [h]. Aspiration noise, which is also [h]-like, is produced by moving a whole lot of air through a very open glottis. I heard a paper once where they described the spectrum of [h]-noise as 'epiglottal', implying that the air is being directed at the epiglottis as an obstacle. Generally speaking, we don't think of the vocal cords moving together to form a 'channel' in [h], although breathy-voicing and voiced [h]s in English as many intervocalic [h]s are produced maybe be produced this way.

So I don't know. What I do know about [h]s is that the noise is produced far enough back in the vocal tract that it excites all the forward cavities, so it's a lot like voicing in that respect. It's common to see 'formants' excited by noise rather than harmonics in spectrograms of [h].

Certainly, the noise will be concentrated in the formant regions. Compare the spectrograms in Figure 6. Figure 6. Spectrograms of "hee" "ha" and "who". Notice how different the frication looks in each spectrogram.

In "hee", the noise is concentrated in F2, F3 and higher, with every little in the Hz range. In "ha", in which F1 and F2 straddle Hz, the [h] noise is right down there. In "who", there is a lot less amplitude to the noise between and Hz, but there around F2 around Hz and lower, there's a great deal. You can even see F2 really clearly in the [h] of "who". So that's [h]. Don't ask me. It's not very common in my spectrograms Nasals have some formant stucture, but are better identified by the relative 'zeroes' or areas of little or no spectral energy.

In Figure 7 , the final nasals have identifiable formants that are lesser in amplitude than in the vowel, and the regions between them are blank. Nasality on vowels can result in broadening of the formant bandwidths fuzzying the edges , and the introduction of zeroes in the vowel filter function. Nasals can be tough, and I hope to get someone who knows more about them than I do to say something else useful about them.

You can sometimes tell from the frequency of the nasal formant and zero what place of articulation was, but it's usually easier to watch the formant transitions.

This is particularly true of initial nasals; final nasals I usually don't worry about--if you can figure out the rest of the word, there's only three possible nasals it could end with. Actually, being loose with the amount of information you actually have before you start trying to fit words to the spectrogram is one of the tricks to the whole operation.

Figure 7. Spectrograms of "dinner", "dimmer", "dinger". The real trick to recognizing nasals stops is a formant structure, but b relatively lower-than-vowel amplitude. In the nasal itself, the pole nasal formant is up in the neutral F3 region. The pole for [m] in 'dimmer' is lower, closer to Hz, but there's still a zero between it and what we might call F1.

Note also that the transitions moving into the [m] of dinner are all sharply down-pointing, even in the higher formants, a very strong clue to labiality, if you're lucky enough to see it. In case you're not familiar with the term generally attibuted to Ladefoged's Phonetic Study of West African Languages or as modified in Catford's Fundamental Problems in Phonetics , the approximants are non-vowel oral sonorants.

They are characterized by formant structure like vowels , but constrictions of about the degree of high vowels or slightly closer. Generally there's no friction associated with them, but the underlying approximants can have fricative allophones, just as fricative phonemes can occasionally have frictionless i.

Canonically, the English approximants are those consonants which have obvious vowel allophones. The classic examples are the [j-i] pair and the [w-u] pair. Syllabic [l]s are all at least plausibly derived from underlying consonants, but I'm guessing that'll change in the next hundred years.

Figure 8. Spectrograms of 'ball', 'bar', 'bough', 'buy'. Note that in all four words, the F1 is mid-to-high, indicating a more open constriction than with a typical high vowel. The F3, on the other hand, is very high, higher than one ever sees unless the F2 is pushing it up out of the way. Compare the position of the F3 in "bar" with that in "bough" and "buy", where the F3 is relatively unaffected by the constriction.

In "bough", the F2 is very low, as the tongue position is relatively back and the lips are relatively rounded. Note that the this has no effect on F3, so let it be known that lip rounding has minimum effect on F3. The next reviewer who brings up lip rounding without having some data to back it up is going to get it between the eyes. In 'buy', the offglide has a clearly fronting rising F2. One of the absolutely characteristic features of American English is "flapping".

I refer the reader to Susan Banner-Inouye's M. But the easiest thing to do is compare them. Figure 9. Spectrograms of "a toe", "a doe" and "otto". The actual length varies a lot, but notice how short the 'closure' of the flapped case is in comparison. It's just a slight 'interruption' of the normal flow, a momentary thing, not something that looks very forceful or controlled. It doesn't even really have any transitions of its own.

The interruption is something on the order of three pulses long, between 10 and 30 ms. That's basically the biggest thing. Sometimes they're longer, sometimes they're voiceless occasionally even aspirated , but basically a flap will always be significantly shorter than a corresponding plosive. Okay, so let's turn back to the proper plosives. Frankly, we're lucky to get any real voicing during the closure at all. Our spectrogram web pages show groups of spectrograms from multiple stations, usually 5 to 7, with the stations ordered from closest top of the multi-spectrogram display to furthest bottom relative to a point of interest such as a volcano.

Weak seismic sources that originate at or above the surface wind gusts, animal footsteps, helicopters, thunder, car traffic, etc. Interpreting spectrograms. There are three major categories of earthquake that you may see on spectrograms. These are divided according to distance from the seismic network and are traditionally called local within the Pacific Northwest , regional near the Pacific Northwest such as from British Columbia, California or offshore and teleseisms more than 1, km or miles from the Pacific Northwest.

Teleseisms :. Tremor non-volcanic :. Wind :. Avalanches :. Calibration :. Sonic :. Quick Links. What is a Spectrogram?

What is a spectrogram? Interpreting spectrograms There are three major categories of earthquake that you may see on spectrograms. Contact Us Email: pnsn uw.



0コメント

  • 1000 / 1000