Semantic William claude dukenfields And Polysemy: Essay, Research Paper
Semantic William claude dukenfields and Polysemy: A Correspondence Analysis Approach
Semantic William claude dukenfields
Analyzing semantic Fieldss or literary subjects in texts rapidly confronts the research worker with a paradox. A computing machine threading hunt will bring forth a list of the frequences of words potentially related to the semantic field. But polysemy & # 8211 ; the fact that many words have multiple meanings & # 8211 ; means that there is an un-measured difference between the possible and the existent allusions.
The semantic field of & # 8220 ; solitude & # 8221 ; or & # 8220 ; loneliness & # 8221 ; is a instance in point. Important from a sociological and psychological position as an indicant of imperfect version to one & # 8217 ; s surroundings, it is besides a often happening literary subject. In Gallic, the job of lexical ambiguity is acute because & # 8220 ; seul & # 8221 ; means both & # 8220 ; entirely & # 8221 ; and & # 8220 ; merely & # 8221 ; , so it is hard to conceive of that the look & # 8220 ; ma seule cravate ( my merely tie ) & # 8221 ; holding much to make with purdah. Similarly, the verb & # 8220 ; abandonner ( to abandon ) & # 8221 ; is identical by computing machine from & # 8220 ; s & # 8217 ; abandonner ( to allow oneself travel ) .
The Problem of Disambiguation
The lone practical response to such troubles seems to be disambiguation by human sources. But one can lawfully be concerned about the dependability of such a procedure. The consistence of consequences from one source to another when analyzing the same informations would look to be a sensible touch-stone for the dependability of the disambiguation procedure. It seems appropriate to anticipate some personal variableness among sources, so the pick of statistical methods to be used requires some attention.
An intra-class correlativity coefficient similar to Cohen & # 8217 ; s kappa has been chosen as a step of understanding among the single picks made by each source on each word potentially arousing purdah. Correspondence analysis has been used in order to supply a ocular representation of how the information interact.
Nine twentieth century novels written in the first individual were chosen for this analysis: Bernanos, Journal d & # 8217 ; un mongrel de campagne, Camus, L & # 8217 ; tranger and La Chute, C line, Voyage au bout de la nuit, Gide, L & # 8217 ; Immoraliste and La Porte troite, Mauriac, Le Noeud de Vip RESs, Proust, La Fugitive, and Sartre, La Naus e. The size of these novels ranges between 31,272 and 192,559 words.
Some 70 words ( in the sense of lemmas ) related to the construct of purdah in Gallic synonym finder can be identified. They cut down to thirty strings of the type & # 8220 ; seul* & # 8221 ; . These strings were used to seek the texts for words related to solitude in the ARTFL database. Consequences ranged between 473 and 73 happenings, and it must be recalled that these Numberss relate to possible evocations of purdah merely.
The words found by the ARTFL hunt engine, centred in 60 characters of context were downloaded, and given to a squad of sources with minimum instructions: take the words arousing human purdah from a reading of the context, and travel back to the ARTFL database for more context in dubious instances. Eight sources were used: two Gallic literature professors, two Gallic literature alumnus pupils whose native linguistic communication was Gallic, two Gallic literature alumnus pupils whose native linguistic communication was English, and two high school pupils who had taken submergence French. They provided consequences runing between 122 and 11.
Analysis of the frequences provided by the sources as stand foring the true allusions to solitude in the nine texts demonstrates that the frequences provided are non a additive map of the figure of possible allusions to solitude. Pearson & # 8217 ; s correlativity coefficient, and the chi- squared eventuality table trial would look like appropriate analytic tools, but they have disadvantages. The scrutiny of either the correlativity coefficient tabular array or the single chi-squared values doing up the chi-squa
ruddy statistic for tendencies or inclinations is rendered rather hard by the volume of the information ( nine novels and eight sources ) .
These consequences have the disadvantage of using to aggregate informations, fall ining into a individual sum what may good hold been different single picks. The intraclass correlativity coefficient is a step of understanding among single picks. Although it has been modified utilizing Cronbach & # 8217 ; s alpha to take into history that a bulk sentiment is been used as a criterion, it is similar to Cohen & # 8217 ; s kappa, and like the latter step, a value of 0.55 or greater can be taken as declarative of good understanding among single picks ( May ) . The mark for the seven sources for whom single marking informations were available was 0.55125. Application of the Cohen & # 8217 ; s kappa modus operandi in the JMP-IN package on a pair-wise footing produces low readings for Student 2 ( informations for Student 1 were non available ) . In the visible radiation of these inconsistent consequences a agency of visualizing what is traveling on in the information is desirable, and correspondence analysis was chosen for this.
The correspondence analysis technique ( Benz cri, Greenacre ) is mathematically complex, but widely available. Basically, it provides to the user representation of the variableness of the informations by projecting onto a planar infinite the parts of both the rows and the columns of the chi-squared eventuality tabular array to the chi-squared statistic in such a manner that the farther two points are apart the greater their difference, and the closer they are the greater the similarity of the distributions they represent. Figure 1 illustrates the relationship of the frequences for purdah by all eight sources, every bit good as the natural frequences of the informations on which they worked. The place of the natural frequences at the utmost left of the map and of Student 2 in the upper right quarter-circle clearly place them as outliers. The remainder of the sources are clustered in the lower right quarter-circle.
Figure 2 shows the same frequences after the natural frequences and Student 2 have been removed, every bit good as Student 1 whose consistence could non be measured on the footing of single picks. The construction of the information is manifest. There is fluctuation, owing to differences in judgement. The texts group good on the left side of the map with Sartre and Camus, existential philosopher authors busying this infinite. Proust, Gide and Mauriac, all businessperson writers, are at the lower right. C line and Bernanos both rightist critics of society are in the upper right quarter-circle.
Most of import, the sources do non constellate harmonizing to degree of instruction or lingual background, as shown by the distance between p1 and p2, every bit good as between s3 and s4. In short, the informations present no clear form on the footing of background or degree of instruction, and fluctuation can be moderately ascribed to differences in personal reading.
Most of us study linguistic communication and literature by computing machine because we have deep-rooted reserves about techniques that rely merely on the feelings of the research worker. Many of us, peculiarly in literature, are loath to manus over to student helpers the occupation of making preliminary analysis of stuff on which we will later establish our readings. Many of us would prefer to make the work ourselves instead than rely on the sentiment of others. Linguists, on the other manus, have long used native talker sources.
The consequences reported here exemplify the utility of correspondence analysis for construing complex informations. They besides suggest that a individual with native-speaker ability in a linguistic communication, even an originally English-speaking alumnus pupil in French, will bring forth about the same consequences as a professor of Gallic literature. It would look so that the usage of sources for analyzing semantic Fieldss, or literary subjects, is a justifiable endeavor from the statistical position.