Serveur d'exploration sur l'opéra

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Document clustering : Applications in a collaborative digital library

Identifieur interne : 001935 ( Main/Exploration ); précédent : 001934; suivant : 001936

Document clustering : Applications in a collaborative digital library

Auteurs : Fuad Rahman [États-Unis] ; Aman Kumar [États-Unis] ; Yuilya Tamikova [États-Unis] ; Hassan Alam [États-Unis]

Source :

RBID : Pascal:07-0377980

Descripteurs français

English descriptors

Abstract

This paper introduces a document clustering method within a commercial document repository, FileShare®. FileShare® is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft® Internet Explorer®, Netscape® or Opera®) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare® repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Document clustering : Applications in a collaborative digital library</title>
<author>
<name sortKey="Rahman, Fuad" sort="Rahman, Fuad" uniqKey="Rahman F" first="Fuad" last="Rahman">Fuad Rahman</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Kumar, Aman" sort="Kumar, Aman" uniqKey="Kumar A" first="Aman" last="Kumar">Aman Kumar</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Tamikova, Yuilya" sort="Tamikova, Yuilya" uniqKey="Tamikova Y" first="Yuilya" last="Tamikova">Yuilya Tamikova</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Alam, Hassan" sort="Alam, Hassan" uniqKey="Alam H" first="Hassan" last="Alam">Hassan Alam</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">07-0377980</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 07-0377980 INIST</idno>
<idno type="RBID">Pascal:07-0377980</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000332</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000235</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000311</idno>
<idno type="wicri:doubleKey">0277-786X:2006:Rahman F:document:clustering:applications</idno>
<idno type="wicri:Area/Main/Merge">001970</idno>
<idno type="wicri:Area/Main/Curation">001935</idno>
<idno type="wicri:Area/Main/Exploration">001935</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Document clustering : Applications in a collaborative digital library</title>
<author>
<name sortKey="Rahman, Fuad" sort="Rahman, Fuad" uniqKey="Rahman F" first="Fuad" last="Rahman">Fuad Rahman</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Kumar, Aman" sort="Kumar, Aman" uniqKey="Kumar A" first="Aman" last="Kumar">Aman Kumar</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Tamikova, Yuilya" sort="Tamikova, Yuilya" uniqKey="Tamikova Y" first="Yuilya" last="Tamikova">Yuilya Tamikova</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Alam, Hassan" sort="Alam, Hassan" uniqKey="Alam H" first="Hassan" last="Alam">Hassan Alam</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithm</term>
<term>Automatic classification</term>
<term>Electronic library</term>
<term>Experimental device</term>
<term>High precision</term>
<term>Internet</term>
<term>K means algorithm</term>
<term>Signal classification</term>
<term>Unsupervised classification</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Algorithme</term>
<term>Classification automatique</term>
<term>Bibliothèque électronique</term>
<term>Internet</term>
<term>Algorithme k moyenne</term>
<term>Classification non supervisée</term>
<term>Précision élevée</term>
<term>Dispositif expérimental</term>
<term>Classification signal</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper introduces a document clustering method within a commercial document repository, FileShare®. FileShare® is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft® Internet Explorer®, Netscape® or Opera®) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare® repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Californie</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Californie">
<name sortKey="Rahman, Fuad" sort="Rahman, Fuad" uniqKey="Rahman F" first="Fuad" last="Rahman">Fuad Rahman</name>
</region>
<name sortKey="Alam, Hassan" sort="Alam, Hassan" uniqKey="Alam H" first="Hassan" last="Alam">Hassan Alam</name>
<name sortKey="Kumar, Aman" sort="Kumar, Aman" uniqKey="Kumar A" first="Aman" last="Kumar">Aman Kumar</name>
<name sortKey="Tamikova, Yuilya" sort="Tamikova, Yuilya" uniqKey="Tamikova Y" first="Yuilya" last="Tamikova">Yuilya Tamikova</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Musique/explor/OperaV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001935 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001935 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Musique
   |area=    OperaV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:07-0377980
   |texte=   Document clustering : Applications in a collaborative digital library
}}

Wicri

This area was generated with Dilib version V0.6.21.
Data generation: Thu Apr 14 14:59:05 2016. Site generation: Thu Oct 8 06:48:41 2020