Document clustering : Applications in a collaborative digital library
Identifieur interne : 001935 ( Main/Exploration ); précédent : 001934; suivant : 001936Document clustering : Applications in a collaborative digital library
Auteurs : Fuad Rahman [États-Unis] ; Aman Kumar [États-Unis] ; Yuilya Tamikova [États-Unis] ; Hassan Alam [États-Unis]Source :
- Proceedings of SPIE, the International Society for Optical Engineering [ 0277-786X ] ; 2006.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
This paper introduces a document clustering method within a commercial document repository, FileShare®. FileShare® is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft® Internet Explorer®, Netscape® or Opera®) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare® repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000332
- to stream PascalFrancis, to step Curation: 000235
- to stream PascalFrancis, to step Checkpoint: 000311
- to stream Main, to step Merge: 001970
- to stream Main, to step Curation: 001935
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Document clustering : Applications in a collaborative digital library</title>
<author><name sortKey="Rahman, Fuad" sort="Rahman, Fuad" uniqKey="Rahman F" first="Fuad" last="Rahman">Fuad Rahman</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Kumar, Aman" sort="Kumar, Aman" uniqKey="Kumar A" first="Aman" last="Kumar">Aman Kumar</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Tamikova, Yuilya" sort="Tamikova, Yuilya" uniqKey="Tamikova Y" first="Yuilya" last="Tamikova">Yuilya Tamikova</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Alam, Hassan" sort="Alam, Hassan" uniqKey="Alam H" first="Hassan" last="Alam">Hassan Alam</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">07-0377980</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 07-0377980 INIST</idno>
<idno type="RBID">Pascal:07-0377980</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000332</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000235</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000311</idno>
<idno type="wicri:doubleKey">0277-786X:2006:Rahman F:document:clustering:applications</idno>
<idno type="wicri:Area/Main/Merge">001970</idno>
<idno type="wicri:Area/Main/Curation">001935</idno>
<idno type="wicri:Area/Main/Exploration">001935</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Document clustering : Applications in a collaborative digital library</title>
<author><name sortKey="Rahman, Fuad" sort="Rahman, Fuad" uniqKey="Rahman F" first="Fuad" last="Rahman">Fuad Rahman</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Kumar, Aman" sort="Kumar, Aman" uniqKey="Kumar A" first="Aman" last="Kumar">Aman Kumar</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Tamikova, Yuilya" sort="Tamikova, Yuilya" uniqKey="Tamikova Y" first="Yuilya" last="Tamikova">Yuilya Tamikova</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Alam, Hassan" sort="Alam, Hassan" uniqKey="Alam H" first="Hassan" last="Alam">Hassan Alam</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Human Computer Interaction Group, BCL Technologies Inc., 990 Linden Drive, Suite #203</s1>
<s2>Santa Clara, CA 95050</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
<imprint><date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithm</term>
<term>Automatic classification</term>
<term>Electronic library</term>
<term>Experimental device</term>
<term>High precision</term>
<term>Internet</term>
<term>K means algorithm</term>
<term>Signal classification</term>
<term>Unsupervised classification</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Algorithme</term>
<term>Classification automatique</term>
<term>Bibliothèque électronique</term>
<term>Internet</term>
<term>Algorithme k moyenne</term>
<term>Classification non supervisée</term>
<term>Précision élevée</term>
<term>Dispositif expérimental</term>
<term>Classification signal</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper introduces a document clustering method within a commercial document repository, FileShare®. FileShare® is a commercial collaborative digital library offering facilities for sharing and accessing documents over a simple Internet browser (e.g. Microsoft® Internet Explorer®, Netscape® or Opera®) within groups of people working on common projects. As the number of documents increases within a digital library, displaying these documents in this environment poses a huge challenge. This paper proposes a document clustering method that uses a modified version of the traditional K-Means algorithm to categorize documents by their themes using lexical chaining within the FileShare® repository. The proposed algorithm is unsupervised, and has shown very high accuracy in a typical experimental setup.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Californie</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Californie"><name sortKey="Rahman, Fuad" sort="Rahman, Fuad" uniqKey="Rahman F" first="Fuad" last="Rahman">Fuad Rahman</name>
</region>
<name sortKey="Alam, Hassan" sort="Alam, Hassan" uniqKey="Alam H" first="Hassan" last="Alam">Hassan Alam</name>
<name sortKey="Kumar, Aman" sort="Kumar, Aman" uniqKey="Kumar A" first="Aman" last="Kumar">Aman Kumar</name>
<name sortKey="Tamikova, Yuilya" sort="Tamikova, Yuilya" uniqKey="Tamikova Y" first="Yuilya" last="Tamikova">Yuilya Tamikova</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Musique/explor/OperaV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001935 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001935 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Musique |area= OperaV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:07-0377980 |texte= Document clustering : Applications in a collaborative digital library }}
![]() | This area was generated with Dilib version V0.6.21. | ![]() |