What is the Philadelphia Neighborhood Corpus?

Since 1972, W. Labov (later with G. Sankoff) has been teaching a bi-yearly course at the University of Pennsylvania entitled "The Study of the Speech Community" (LING560). The LING560 reports and recordings comprise by far the largest single sociolinguistic corpus of any speech community.

A total of 59 studies compiled 1,087 recordings. The selection of neighborhoods in the LING560 studies was informed by the NSF-funded research project on Linguistic Change and Variation in Philadelphia in the 1970's [LCV], designed to identify the social location of the leaders of linguistic change (Labov 1980, 2001). The socio-economic range of the LCV base sample of 116 speakers ranged from lower working class to upper class. The most advanced speakers were found in the middle working class neighborhoods of Kensington and the upper working class neighborhoods of South Philadelphia.

Google map of Philadelphia neighborhoods included in the PNC corpus

Philadelphia neighborhoods sampled in the PNC corpus. Click on the image for a larger version.

The L560 studies were concentrated in these two regions of the city. Many of the L560 reports study residents of similar neighborhoods so that it is possible to trace the similarities and differences in linguistic behavior in closely spaced intervals.

The recordings generated by each group have been deposited in the Linguistics Laboratory archive at the University of Pennsylvania, along with the final report at the end of the academic year. The final report contains a history of the fieldwork, with all personal and place names replaced by pseudonyms. It includes demographic information on each speaker, sociometric diagrams of local social structure, and linguistic analyses of linguistic variables selected by the group for quantitative and experimental study.


The interviews currently included in the corpus have been transcribed as part of the ongoing NSF research project "Automatic Alignment and Analysis of Linguistic Change" (NSF grant 921643 to W. Labov).

All interviews were transcribed according to the PNC transcription guidelines.

Size of the corpus

The transcribed material in the corpus currently comprises data from

  • 49 different Philadelphia neighborhoods
  • 318 different speakers
  • over 150 hours of speech
  • over 1.6 million words
  • more than half a million vowel tokens

More detailed information on the contents of the corpus can be found here.

(Please note that the corpus is still under construction, and new material will be added regularly.)

Access policy

  1. Only members of the research group have access.
  2. Contributing to the transcription or analysis of the PNC corpus qualifies a person as member.
  3. Researchers wishing to access the corpus must sign the confidentiality agreement.

For more information on how to obtain access to the corpus, please contact W. Labov.