It strikes me that we are not being very explicit about genre, purpose of recording, and purpose of publication. If we were talking about a conversation, for example, there would be no question that *everything* would be reproduced – hesitations, back-channelling, repairs, pauses, codeswitches, the lot. But such transcripts can be quite difficult to read, especially when interlinearisation and free translations are added.
Most descriptive linguists don’t work with conversation data, though – they work with narratives. Narratives are often recorded as much for the information they contain as the linguistic structures they exhibit. I suspect that this a hang-over from the days of descriptive linguistic anthropology and the Boasian tradition.
Individual storytellers have different styles and different levels of fluency (independent of their fluency in the language). Speakers themselves may want to edit out some of the hesitations and code-switches in order to present a text which conforms more closely to the constraints of written genres (that is, making the text a written document, not a transcription of a spoken document).
The problems that Peter talks about arise when we treat a written text and a transcription as the same type of document (or mistake one for the other).
A question of this type has come up fairly often with the Laves materials. These are texts which were dictated; we do not have any original sound files. The texts have little punctuation. If we were to make a faithful reproduction of those texts, it is already one or two steps removed from the original performance. It would be like trying to recreate an authentic 19th performance of a Bach cantata. There are reasons to do such a thing, but there are also reasons not to.
The approach I’ve taken in the Laves materials a classical apparatus criticus – brief annotations at the bottom of the page for textual emendations, major spelling variants, etc, and endnotes for interpretive comments to guide the reader. It’s not ideal – I suspect it’s almost as irritating for those not familiar with this sort of textual work as reading a conversation transcript. But it is a compromise between an almost uninterpretable string of words and a glossed text with hazy relations to the ‘original’.