Home > Cross-domain Collaboration Recommendation

Cross-domain Collaboration Recommendation

Page 1
Cross-domain Collaboration Recommendation
Jie Tang, Sen Wu, Jimeng Sun, and Hang Su
Department of Computer Science and Technology, Tsinghua University IBM TJ Watson Research Center, USA
jietang@tsinghua.edu.cn, ronaldosen@gmail.com, jimeng@us.ibm.com, suhang@sse.buaa.edu.cn
Interdisciplinary collaborations have generated huge impact to so- ciety. However, it is often hard for researchers to establish such cross-domain collaborations. What are the patterns of cross-domain collaborations? How do those collaborations form? Can we predict this type of collaborations? Cross-domain collaborations exhibit very different patterns com- pared to traditional collaborations in the same domain: 1) sparse connection: cross-domain collaborations are rare; 2) complemen- tary expertise: cross-domain collaborators often have different ex- pertise and interest; 3) topic skewness: cross-domain collaboration topics are focused on a subset of topics. All these patterns violate fundamental assumptions of traditional recommendation systems. In this paper, we analyze the cross-domain collaboration data from research publications and confirm the above patterns. We propose the Cross-domain Topic Learning (CTL) model to address these challenges. For handling sparse connections, CTL consoli- dates the existing cross-domain collaborations through topic layers instead of at author layers, which alleviates the sparseness issue. For handling complementary expertise, CTL models topic distri- butions from source and target domains separately, as well as the correlation across domains. For handling topic skewness, CTL only models relevant topics to the cross-domain collaboration. We compare CTL with several baseline approaches on large pub- lication datasets from different domains. CTL outperforms base- lines significantly on multiple recommendation metrics. Beyond accurate recommendation performance, CTL is also insensitive to parameter tuning as confirmed in the sensitivity analysis.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Text Mining; J.4 [Social Behavioral Sciences]: Miscellaneous
General Terms
Algorithms, Experimentation
Collaboration recommendation, Social network, Social influence
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD��12, August 12–16, 2012, Beijing, China. Copyright 2012 ACM 978-1-4503-1462-6 /12/08 ...$10.00.
Social network analysis focuses on modeling interactions be- tween people. Researchers have studied various issues in social networks, such as network properties [6, 11] and generation pro- cesses [18], link predictions [19, 20, 21, 32] and recommenda- tions [2, 7, 17]. Despite all the existing research in social networks, little has been done on analyzing collaborations across two differ- ent domains. Interdisciplinary collaborations have generated huge impact to society. For example, collaborations between biology and com- puter science revolutionized the field of bioinformatics. Because of these cross-domain collaborations, originally extremely expensive tasks such DNA sequencing have become scalable and affordable to a much broader population. Now medicine and data mining are working together in the field of medical informatics, which is a big growth area that is expected to have huge impact on medicine [24]. Indeed, cross-domain collaboration has become increasingly im- portant. Figure 1 shows the increasing trend of cross-domain col- laborations over the past fifteen years across different domains in a publication database (Cf. �� 4 for details). In most of the cases, there exists a clear increasing trend of the cross-domain collaborations. However, it is often hard for researchers to establish such cross- domain collaborations. What are the patterns of cross-domain col- laborations? How do those collaborations form? Can we predict this type of collaborations? Cross-domain collaborations often ex- hibit very different challenges compared to traditional collabora- tions in the same domain: First, sparse connection, cross-domain collaborations are rare compared to traditional collaborations within a domain, partly be- cause it is difficult for an outsider to find the right collaborator in the field that one does not know. This also makes it challenging to directly use a supervised learning approach due to the lack of training samples. Second, complementary expertise, cross-domain collaborators often have different expertise and interest; For example, data min- ing researchers can easily identify who they want to work with in the data mining field, because the topics are known to them. How- ever, for a cardiologist who wants to apply data mining techniques to predict heart failures, it will be difficult for her to find the right collaborators in data mining. Because these two fields (cardiology and data mining) are completely different with different terminol- ogy and problems. It is very difficult for one from cardiology to identify the right topics in data mining to look for collaborators. Third, topic skewness, not all topics are relevant for cross- domain collaborations. In fact, in our study, only less than 9% of all possible topics pairs across domains have collaborations. There- fore, for the task of cross-domain collaboration recommendation, 1285

Page 2
Year Probability
1990 1995 2000 2005 0 1 2 3 4 5 6 7 x 10−3 Existing Collaborations New Collaborations
(a) DM - TH
Year Probability
1990 1995 2000 2005 0 0.002 0.004 0.006 0.008 0.01 Existing Collaborations New Collaborations
(b) DM - MI
Year Probability
1990 1995 2000 2005 2 4 6 8 10 12 x 10−3 Existing Collaborations New Collaborations
(c) DM - VIS
Year Probability
1990 1995 2000 2005 0 1 2 3 4 5 6 7 8 x 10−3 Existing Collaborations New Collaborations
(d) MI - DB Figure 1: The comparison of existing collaboration and new collaboration trends over years. DM - Data Mining domain; MI - Medical Informatics domain; TH - Theory domain; VIS - Visualization domain; DB - Database domain. The trends of cross-domain collaborations in all but one case are growing (The exception between DM and VIS remain roughly constant over time). Newly formed cross-domain collaborations are significantly in all cases. we should focus on better modeling those topics with high proba- bility of having cross-domain collaborations. Despite of the above challenges, once such cross-domain col- laboration is successfully formed, its impact is usually tremen- dous. In our study, cross-domain collaborations constitute a small portion of all possible collaborations as shown in Figure 1. The trends of cross-domain collaboration in many cases are growing. Newly formed cross-domain collaborations are significant in all cases, which confirmed the potential need for cross-domain col- laborations. Based on these observations, we propose the Cross-domain Topic Learning (CTL) method that addresses all three challenges including sparse connection, complementary expertise and topic skewness. CTL is a generative topic model that differentiates rele- vant topics to cross-domain collaboration from other topics. We compare CTL with several baseline approaches on large pub- lication data sets of different domains. CTL outperforms others significantly on recommendation metrics. Beyond accurate rec- ommendation performance, CTL is also insensitive to parameter tuning as confirmed in the sensitivity analysis. Finally, we inte- grate CTL into a large-scale web application for recommending cross-domain research collaborators, which further demonstrates the scalability of CTL in handling real-time queries. The rest of this paper is organized as follows: Section 2 for- mulates the cross-domain recommendation problem formally; Sec- tion 3 presents our proposed methods on cross-domain recommen- dation; Section 4 describes the experiments; Section 5 presents the related work; then we conclude in Section 6.
We present required definitions and formulate the problem of cross-domain collaboration recommendation. Without loss of gen- erality, we assume there are two domains, the source domain and the target domain. Our goal is to recommend potential collabora- tors in the target domain for a specific user from the source domain. Definition 1. Source/Target domain. The source (or target) do- main can be represented as a social network G = (V,E,X), where V is a set of |V | = N users and E ⊆ V �� V is a set of undirected (collaborative) relationships between users, X is an N ��d attribute matrix in which every row corresponds to a vector of attribute val- ues of a user. We use xj to denote the jth attribute. We use superscript S and T to differentiate the source domain and the target. If there is no ambiguity, we will omit S for the source domain and use superscript for the target, for brevity. Sup- pose each user vi is associated with d attributes. For example, in the research collaboration network, each user is associated with a set of publication papers or a set of words appearing in those papers. Given this, we have the following definition: Definition 2. Domain-specific topic models. A topic model ��i of a user vi is a multinomial distribution of attributes {P(xj |��i)}j . Then a domain is considered as a mixture of multiple user-specific topic models. The assumption behind is that attributes associated with the user are sampled following a distribution corresponding to each topic, i.e., P(x|��i). Such a definition is usually used in the LDA/PLSI style topic models [4, 15]. According to the above definition, attributes with the highest probability associated with each topic would suggest the semantics represented by the topic. For example, a ��Data Mining�� topic discovered from the publication data can be represented by keywords ��clustering��, ��learning��, ��classification��, etc. The input of our problem consists of a source domain GS and a target domain GT , each associated with topic models. Please note that the source domain and the target domain can be overlapping, i.e., V S �� V T = 0. Given this, we can precisely define the fol- lowing problem: Problem 1. Cross-domain collaboration recommendation. Given (1) a source domain GS and a target domain GT , (2) topic models �� and �� associated with users in the two domains respec- tively, the goal is to rank and recommend potential collaborators in the target domain for a specific user vq from the source domain. The fundamental challenge of this problem is how to capture the collaboration patterns across different domains. Within the same domain, homophily is often considered as the driven force for the formation of collaborative relationships, which suggests that peo- ple with the similar interest (topic model ��) tend to collaborate with each other. However, in the cross-domain setting, the problem is very different. Technically, it is challenging to extract and discrim- inate topics in the two domains. In particular, given a specific user and her topic distribution from the source domain, on which topics and with whom should she collaborate in the target domains?
We begin by considering some baseline solutions and then pro- pose our cross-domain topic learning approach. A simple approach to the problem is to construct a collaboration graph connecting users between source and target domains and then use a random walk with restart algorithm [28] to rank collaborators in the target 1286

Page 3
Source domain Target domain
v2 vN vq v'1 v'2 v'N' ... GS GT ...
(a) Author matching
v2 vN vq v'1 v'2 v'N' ... GS GT ...
Topics Topics
z1 ... zT z'1 z'2 z'3 ... z'T
Source domain Target domain
z2 z3
(b) Topic matching
z1 ... zK
Source domain Target domain
v2 vN vq v'1 v'2 v'N' ... GS GT ... z2 z3
(c) CTL Figure 2: Graphical representation of the three recommendation models: author matching, topic matching, and CTL. domain. We call this method Author Matching. The details of the algorithm are described in Section 3.1. The problem with Author Matching is the sparse connections be- tween authors across two domains. To alleviate this problem, the second model is to consolidate the correlation between the under- lying topics. Suppose each domain has T different topics and each user has a distribution over the topics. We can augment the collab- oration graph with two topic layers (as shown in Figure 2(b)). The links between the two topic layers indicate the alignment between topics, which implicitly represents the complementary expertise between users. Based on this representation, a random walk with restart algorithm can be again applied to the graph to rank (both topic and user) nodes in the target domain. We call this method Topic Matching, and details are described in Section 3.2. Topic matching improves the cross domain connections through a subset of topic pairs from source domain to target domain. How- ever, not all topic pairs are relevant for collaboration (topic skew- ness). Therefore, blindly computing all topics from source and tar- get domains are not necessary for collaboration recommendation and often lead to sub optimal results. One challenge here is how to differentiate relevant ��collaboration�� topics from other topics. We further design a Cross-domain Topic Learning (CTL) algorithm to address this challenge in Section 3.3.
3.1 Author Matching
Based on the historic cross-domain collaborations, we create a collaboration graph, as shown in Figure 2(a). The problem is to rank relevant nodes in the target domain GT for a given query node vq in the source domain GS . Measuring the relatedness of two nodes in the graph can be achieved using the Random Walks with Restarts (RWR) theory [22, 28]. Starting from node vq, a RWR is performed by following a link to another node according to the weight of the link at each step.1 Also, in every step, there is a probability �� to return the node vq. The relatedness score of node vi wrt node vq is defined as the stationary probability ri that the random walk will finally arrive node vi, i.e., r
(t+1) = (1 − ��)S · r (t) + ��q
(1) where r
is a vector with each element rt
i denoting the probability
that the random walk at step t arrives at node vi; q is a vector of zero with the element corresponding to the starting node vq set to 1, i.e., qvq = 1; S defines the transition probability of the random
1In the author matching method, we use a uniform weight, i.e.,
weights of links of a node v to its neighbors are defined as
1 NB(v)
, where NB(v) is the number of neighbors of node v. In ��3.2, we will introduce how to define the weight based on topic model. walk, with element Sij denoting the random walking probability from node vi to node vj .
3.2 Topic Matching
The author matching method only considers the network struc- ture information, but ignores the content (topic) information. How do people collaborate across different domains? And what are the hottest topics on which people from different domains tend to col- laborate? Recently, probabilistic topic models have been successfully ap- plied to multiple text mining tasks to extract topics from text [4, 15, 27]. We employ an Author-Conference-Topic (ACT) model [31], which utilizes the topic distribution to represent the inter- dependencies among authors, papers, and publication venues.2 The model simulates the process when people collaborate on a work, e.g., writing a scientific paper, using a series of probabilistic steps. In essence, for each object it estimates a mixture of topic distribu- tions which represent the probability of the object associated with every topic. Such as for each author v, we have a set of probabilities {P(zi|v)}i or {��vzi }i, respectively denoting how likely author v is interested in topic zi. Similarly, we have {P(xj |z)}j or {��zxj }j , the probability of attribute xj (e.g., a keyword) given topic z. We use Gibbs sampling to learn the probabilities. The interested reader can refer to [31] for more details. Combining topic model into random walk. We now discuss how to combine the topic model into the random walk framework. First, we apply the ACT model to the source and the target do- mains respectively and obtain two sets of topic distributions. Then we estimate the alignment between topics of these two domains. We calculate the alignment according to the historic cross-domain collaborations. Specifically, the strength of the alignment between topic zi from the source domain and topic zj from the target do- main is estimated by: Szizj = 1 �� ��
(v,v )��EST
[P(zi|v) + P(zj |v )] (2) where �� is a normalization factor; (v, v ) �� EST indicates a cross- domain collaboration between v and v . We augment the graph generated in the author matching method with topic nodes {z} and {z } extracted from the two domains. Figure 2(b) shows the graphical structure, which suggests that a random walk can be performed from a user v to a topic z and from
2The ACT model can be considered as an extension of LDA [4],
but considers the collaborative relationships between users and the difference of various objects (e.g., author, paper, and confer- ence/journal). 1287

Page 4
Input: a source domain GS and a target domain GT Output: estimated parameters ��,�� ,��, ϑ, and �� Initialize an ACT model in GS by learning from documents written by authors only from GS ; Similarly, initialize an ACT model for target domain GT ; foreach collaborated document d do foreach word xdi �� d do Toss a coin sdi according to bernoulli(sdi) ∼ beta(��t, ��), where beta(.) is a Beta distribution, and ��t and �� are two parameters; if sdi = 0 then Randomly select a pair (v, v ) from d��s authors, where v is an author from GS and v from GT ; Draw a topic zdi ∼ multi(ϑvv ) from the topic mixture ϑvv specific to (v, v ); end if sdi = 1 then Randomly select a user v; Draw a topic zdi ∼ multi(��v) from the topic model of user v; end end Draw a word xdi ∼ multi(��zdi ) from zdi-specific word distribution; end
Algorithm 1: Probabilistic generative process in CTL.
�� �� Ad s v x
Collaborated document d
�� ��
target domain source domain
(v, v')
v v'
Figure 3: Graphical representation of CTL model. a topic z of the source domain to a topic z of the target domain (and vice versa). The link weight between user node v and topic node z is defined as the probability P(z|v) obtained from the ACT model. Then the relatedness of the query node to a target topic z is defined by a similar formula as that in Eq. 1 and analogously we can define the relatedness between the query node and user nodes in the target domain.
3.3 Cross-domain Topic Learning (CTL)
The topic matching method does not discriminate the ��collabo- ration�� topics from those topics existing in only one domain. As a result, the ��irrelevant�� topics (irrelevant to collaboration) may hurt the collaboration recommendation performance. We develop a new topic modeling approach called Cross-domain Topic Learning (CTL) to model topics of the source domain and the target domain simultaneously. Model description. The basic idea here is to use two correlated generative processes to model the source and the target domains together. The first process is to model documents written by authors from single domain (either source or target). The second process is to model collaborated documents. For each word in a collaborated document, we use a Bernoulli distribution to determine whether it is generated from a ��collaboration�� topic or a topic-specific to one domain only. Figure 3 shows the graphical structure of the Table 1: Notations in the CTL model.
SYMBOL DESCRIPTION T number of topics d a collaborated document Ad a set of authors of document d xdi the ith attribute (word) in document d zdi the topic assigned to attribute xdi sdi if xdi is a word from a single domain or a cross domain ��v multinomial distribution over topics specific to author v ϑvv multinomial distribution over topics specific to author pair (v,v ) ��z multinomial distribution over words specific to topic z ��, �� Dirichlet priors to multinomial distributions ��, �� and �� �� parameter for sampling the binary variable s ��, ��t Beta parameters to generate ��
CTL model. (For simplicity, we omit the modeling part for single domain and focus on the modeling of the collaborated documents.) CTL models each cross-domain collaborated document using topic models of authors from the source domain and the target domain. Let us briefly introduce notations. Ad is a set of authors of doc- ument d; v is an author and (v, v ) is an author pair randomly sam- pled to be responsible for word x; s is a binary variable indicating whether the current word inherits the topic from a single domain (s = 1) or by a cross-domain collaboration s = 0; �� and �� are topic models from the source domain and the target domain, re- spectively; ϑvv is a collaboration topic model specific to author pair (v, v ); �� is the Dirichlet hyperparameter; �� is a parameter for sampling the binary variable s; �� and ��t are Beta parameters to generate ��. Table 1 summarizes the notations used in the CTL model. Formally, the generative process is described in Algorithm 1: first, documents of the two domains GS and GT are partitioned into three clusters: documents written by authors only from the source domain, documents written by authors only from the tar- get domain, and documents collaborated by authors from both do- mains. Then CTL respectively extracts topics of authors from the first two document clusters (without cross-domain collaborations) according to the distributionp(��v|��) and p(��v |��), where �� is the Dirichlet prior. For simplicity, we use the same prior �� for both source and target domains. Second, CTL models the cross domain collaboration documents. For each word xdi in document d, a coin s is tossed according to p(s|d) ∼ beta(��t, ��), where beta(.) is a Beta distribution. When s = 1, a single user v (or v ) is chosen according to a uniform distribution, then the word xdi is sampled from a selected topic zdi specific to the user v, according to ��v (therefore, this is not a cross-domain collaboration). When s = 0, a pair of cross-domain collaborators (v, v ) are selected, and a new multinomial distribu- tion ϑvv is constructed by combining ��v and ��v (therefore, cross- domain collaboration is formed). More specifically, we first expand the source and target topic spaces to be of the same dimension. For example, if source domain has 10 topics and target domain 5 top- ics, the expanded topic space will have 15 topics (10 from source domain and 5 from target domain). The expanded source topic distribution ˜��v =< ��v, 0,..., 0 >, where we set 0 on the target topics. Similarly, we define the expanded target topic distribution to be ˜��v =< 0,..., 0, ��v >. The new distribution ϑvv is then defined as ˜��v + ˜��v , a simple mixture of the two expanded multi- nomials of ��v and ��v [5]. Finally the word xdi is sampled from a collaboration topic zdi according to the new distribution ϑvv . Figure 4 illustrates an example of CTL learning. Before CTL learning, each author only has topic distribution in either source or 1288

Page 5
Source domain
v2 v'1 v'2 ... GS GT z1
0.2 0.6 0.2
z1' z3 z2' z2
CTL Learning Topics
Source domain Target domain
v2 v'1 v'2 ... GS GT z1
0.2 0.5 0.15 0.05
z1' z3 z2' z2
z3' z3'
0.3 0.5 0.2 0.1 0.2 0.45 0.1 0.15
Target domain
P(z1| s=0)<�� P(z3'| s=0)<��
Figure 4: Intuitive explanation of the CTL learning. ϵ is a pa-
rameter to select collaboration topics.
target domain (zero probability on topics from the other domain). Then, CTL smoothes topics distributions across the two domains. Users from the source domain will also have a probability over top- ics extracted from the target domain, and vice versa. After training the CTL model, we also obtain a set of ��collaboration topics�� be- tween the two domains, i.e., topics with the highest posterior prob- abilities P(z|s = 0, ·) (or P(z|s = 0, ·) > ϵ) in the collaborated documents. (Here, · indicates all the other parameters we should consider when calculating the probability.) For example in right hand side of Figure 4, the box indicates those collaboration topics. Model inference. We use Gibbs sampling to estimate unknown parameters {��, �� , ϑ, ��, ��} in the CTL model. In particular, we evaluate (a) the posterior distribution on z (or z) for each word in the document written by authors only from a single domain and then use the results to infer �� (or ��); (b) the posterior distribution on s, and then use the sampling results of z and z according to s to update ϑ, �� and �� . Finally, �� and �� can be inferred from the obtained topic models. More specifically, we begin with the joint probability of all documents in the two domains, and then using the chain rule, we obtain the posterior probability of sampling the topic for each word. For (a) we use the same sampling algorithm as that for the LDA model (or the ACT model) (cf. [13] or [31]), i.e. with the posterior probability:
P(zdi|z−di, x, ·) = n
−di vzdi + ��
z(n −di vz
+ ��) �� m
−di zdixdi + ��
x(m −di zdix + ��)
where nvz is the number of times that topic z has been sampled from the multinomial distribution specific to a randomly selected author v; mzx is the number of times that word x has been gener- ated by topic z; the number n
with the superscript −di denotes a quantity, excluding the current instance. We use a similar process for both domains. For parameter estimation in (b), we consider a two-step Gibbs sampling. We first sample the coin s according to the posterior probability: (Detailed derivation is given in Appendix.)
P(sdi = 0|s−di, z) = n
−di ds0
+ ��t n
−di ds0
+ n
−di ds1
+ ��t + �� �� n
−di vv zdi
+ (nvzdi + nv zdi ) + �� ��
z(n −di vv z + (nvzdi + nv zdi ) + ��)
where nds0 is the number of times that s = 0 has been sampled in document d; (v, v ) is the selected user pair to be responsible for xdi; nvv z is the number of times that topic z has been sampled from user pair (v, v ). P(sdi = 1) can be analogously defined as the above equation. The only difference is to replace the sum of the two terms (nvzdi + nv zdi ) with the number by a selected single user v (or v ). The posterior probability of topic z is defined as:
P(zdi|sdi = 0, x, z−di,·) = m
−di zdixdi + mzdixdi + mzdixdi + ��
x(m −di zdix + mzdix + mzdix + ��)
�� n
−di vv zdi
+ (nvzdi + nv zdi ) + �� ��
z(n −di vv z + (nvz + nv z) + ��)
where m
−di zx
is the number of times that word x has been generated by topic z in the collaborated documents; mzx and mzx respec- tively represents the number of times that word x has been gener- ated by topic z in the source domain and that in the target domain. During the parameter estimation, the algorithm keeps track of a V �� T (user by topic) count matrix for both domains, a D �� 2 (collaborated document by coin), a 2 �� T (coin by topic) count matrices, and a AP �� T (user pair by topic) count matrix (AP is the number of user pairs). Given these matrices, we can estimate the probabilities of ��, �� , ϑ, ��, and ��. Cross-domain recommendation via random walk. We com- bine the learned ��collaboration�� topics by CTL into the collabora- tion graph (Cf. Figure 2(c)). In principle, there could be a link between any user node and topic node (the difference is the link weight). To control the density of the constructed network, we define a parameter ϵ and add links between users and topics only when P(z|s = 0, ·) > ϵ. A smaller ϵ results in a more dense network. Random walk with restart is then performed on the topic augmented graph to calculate the relatedness between users from the target domain and the query user node in an analogous way as done in Eq. 1. Finally we rank users in the target domain accord- ing to the estimated relatedness scores and recommend users with the highest relatedness. One advantage of the CTL model is that it is able to recommend ��related�� collaboration topics based on the relatedness scores between the query node and the topic nodes. In topic matching, we could also consider recommending topics based on the relatedness scores; however, the recommended topics might be irrelevant to collaboration. In CTL, the recommended topics directly reflect existing collaborations across the two domains. The CTL model can be also generalized to multiple domains. The basic idea is to use a multinomial distribution to replace the Bernoulli distribution. The multinomial represents collaboration topics among multiple domains, between two specific domains, or those in single domain. Based on the learned topics, we can con- struct a topic-centered network (similar to Figure 2(c)). Then the random walk with restart can be performed on the network to esti- mate the relatedness scores of users from different domains.
In this section, we evaluate the proposed methods on large pub- lication datasets of different domains. All data sets and codes are publicly available3.
4.1 Experimental Setup
Data sets. The data set is extracted from Arnetminer.org [31], an academic search system, which contains 1,436,990 authors and

Page 6
1,932,442 publications. The data we used in our experiments spans from 1990 to 2005. We consider the following five sub-domains: Data Mining: We use papers of the following data min- ing conferences: KDD, SDM, ICDM, WSDM and PKDD as ground truth, which result in a network with 6,282 authors and 22,862 co-author relationships. Medical Informatics: We include the following journals: Journal of the American Medical Informatics Association, Journal of Biomedical Informatics, Artificial Intelligence in Medicine, IEEE Trans. Med. Imaging and IEEE Transac- tions on Information and Technology in Biomedicine, from which we obtain a network of 9,150 authors and 31,851 co- author relationships. Theory: We include the following conferences, i.e., STOC, FOCS and SODA, from which we get 5,449 authors and 27,712 co-author relationships. Visualization: We include the following conferences and journals, CVPR, ICCV, VAST, TVCG, IEEE Visualization and Information Visualization. The obtained coauthor net- work is comprised of 5,268 authors and 19,261 co-author re- lationships. Database: We include the following conferences, i.e., SIG- MOD, VLDB and ICDE. From those conferences, we extract 7,590 authors and 37,592 co-author relationships. Based on the above five sub domains, we create four cross- domain test cases: Data Mining to Theory, Medical Informatics to Database, Medical Informatics to Data Mining, and Visualization to Data Mining. Comparison methods. We compare the following methods for collaboration recommendation: Content Similarity (Content): It calculates similarity between authors based on papers published by them. Specifically, we con- struct feature vector wq and wv of words used in papers published by query author q and target author v , respectively. Those feature vectors are normalized by TFIDF [1]. The similarity score is the Cosine similarity between wq and wv Sim(vq, v ) = wq · wv wq wv (6) Collaborative Filtering (CF): It leverages the existing collab- orations to make the recommendation. The basic idea is that if a query author q has the same or similar collaborators as a person x within the same domain, q is then likely to have the same cross- domain collaborators as x. We employ a memory-based collabo- rative filtering algorithm [8], in which recommendations are made for a query user q using the following formula: CF_score(q, v ) = ��
x��V S
I(x, v )r(q, x) (7) where r(q, x) is the similarity between authors in the source do- main, e.g., Cosine similarity based on collaboration connections; the indicator variable I(x, v ) is 1 if the author x has a cross- domain collaboration with v and 0 otherwise. Hybrid: It considers a linear combination of the scores obtained by the Content and the CF methods, specifically, Hybrid(vq, v ) = ��CF_score(vq, v )+(1−��)Sim(vq, v ) (8) where �� is a balance parameter. We empirically set it as 0.5. Katz: It is the best link predictor in [20]. It sums over all possi- ble paths between the query user and a candidate user, and then use the summation score to rank all candidates. Author Matching: (Cf. ��3.1) It makes recommendation by per- forming the random walk with restart on the collaboration graph. Topic Matching: (Cf. ��3.2) It makes recommendation by com- bining the extracted topics into random walking algorithm. CTL: (Cf. ��3.3) It is the proposed method, which considers topic skewness and extracts relevant topics to cross-domain collab- oration. The relevant topics are then integrated into the random walk framework for recommendation.4 Evaluation metrics. To quantitatively evaluate the proposed methods, in each test case, we use historic collaboration data (data before 2001) for training and the last four years (2001-2005) for validation. In evaluation, we consider those candidates who already have cross-domain collaborations and then our task is to predict if they will maintain the collaborations or expand new cross-domain collaborations. If the system recommends a cross-domain collab- oration and later the collaboration has been built, then we say the system made a correct recommendation; otherwise we say the sys- tem made a wrong recommendation. Based on this, we evaluate the recommendation performance in terms of P@10 (Precision for the top 10 recommended results), P@20, R@100 (Recall for the top 100 results), MAP (Mean Average Precision), and Average Recip- rocal Hit-Rank (ARHR) [9]. All codes are implemented in C++, and all the experiments are conducted on an x64 server with E7520 1.87GHz Intel Xeon CPU and 128G RAM. The operation system is Microsoft Windows Sever 2008 R2 Enterprise. For training the ACT and the CTL models, it takes about 12 hours and 15 hours respectively on the entire data set (1,436,990 authors and 1,932,442 publications). Recognizing the computation complexity of LDA style models, we are currently looking into developing more efficient computation mechanism to speed up the process.
4.2 Recommendation Performance Analysis
Table 2 lists the performance of cross-domain collaboration rec- ommendation by the comparison methods on the four different test cases. The proposed CTL method clearly outperforms the base- line methods (+2.2-30% in terms of MAP). Content only considers the content information, which leads to a bad performance. The two methods (Hybrid and Topic Matching), combining the content and the network information, improve the recommendation per- formance compared to the simple baselines such as Content, CF and Author Matching. Moreover, Topic Matching considers the topic information extracted from the two domains, and thus per- forms better than the Hybrid method adopting a simple combina- tion. CTL differentiates ��collaboration topics�� from those irrele- vant topics and obtains significant improvement over both Hybrid and Topic Matching. Cross-domain topics analysis. How many topics are enough for the cross-domain recommendation? We perform an analysis by varying the number of cross-domain topics in the proposed CTL method. Figure 5(a) shows its MAP performance with the num-
4As for the hyperparameters ��, ��t, and ��, following LDA [4], we
empirically take fixed values (i.e., �� = ��q = 50/T, and �� = 0.01). �� and ��t are defined to represent our preference for cross- domain collaborations (i.e., ��q = 3.0 and �� = 0.1). We did try different settings and found that the estimated topic models are not very sensitive to the hyperparameters. 1290

Page 7
Table 2: Recommendation performance by different methods on the four cross-domain test cases (%). ContentContent
Similarity; CFCollaborative Filtering; AuthorAuthor Matching; TopicTopic Matching. Cross domain ALG P@10 P@20 MAP R@100 ARHR -10 ARHR -20 Data Mining (S) to Theory (T) Content 10.3 10.2 10.9 31.4 4.9 2.1 CF 15.6 13.3 23.1 26.2 4.9 2.8 Hybrid 17.4 19.1 20.0 29.5 5.0 2.4 Author 27.2 22.3 25.7 32.4 10.1 6.4 Topic 28.0 26.0 32.4 33.5 13.4 7.1 Katz 30.4 29.8 31.6 27.4 11.2 5.9 CTL 37.7 36.4 40.6 35.6 14.3 7.5 Medical Info. (S) to Database (T) Content 10.1 10.9 12.5 45.9 3.6 2.1 CF 18.3 20.2 21.4 47.6 5.3 3.9 Hybrid 25.0 26.5 28.4 59.1 6.4 4.2 Author 26.2 29.6 32.2 54.8 10.5 5.4 Topic 29.4 26.3 34.7 59.3 11.5 5.2 Katz 27.5 28.3 30.7 57.2 10.5 5.0 CTL 32.5 30.0 36.9 59.8 11.4 5.4 Medical Info. (S) to Data Mining (T) Content 5.8 5.7 9.5 19.8 1.9 0.9 CF 13.7 17.8 18.9 34.3 2.7 1.3 Hybrid 18.0 19.0 19.8 36.7 3.4 1.3 Author 20.1 23.8 29.3 64.4 5.3 2.1 Topic 26.0 25.0 33.9 48.1 10.7 5.6 Katz 21.2 23.8 32.4 48.1 10.2 4.8 CTL 30.0 24.0 35.6 49.6 12.2 6.0 Visual. (S) to Data Mining (T) Content 9.6 11.8 13.2 18.9 3.1 1.8 CF 14.0 20.8 26.4 29.4 6.9 4.3 Hybrid 16.0 20.0 27.6 30.1 6.3 4.4 Author 22.0 25.2 27.7 31.1 11.9 6.7 Topic 26.3 25.0 32.3 31.4 13.2 8.8 Katz 23.0 25.1 29.3 30.2 10.4 5.4 CTL 28.3 26.0 32.8 36.3 14.0 9.1
ber of cross-domain topics varied. We see, when the number is small (< 80), increasing the number often obtains a performance improvement. The trend becomes stable when the number is up to 150. This demonstrates the stability of the CTL method with respect to the number of topics. Hyperparameter analysis. We use �� as the example to ana- lyze how hyperparameter influences the performance of the CTL method. Figure 5(b) shows the performance of CTL with the pa- rameter �� varied (all the other hyperparameters fixed and the num- ber of topics is set as T = 120). We see although the performance changes when varying the value of ��, the largest difference is less than 0.03 This confirms CTL method is not sensitive to the partic- ular choice of ��. Restart parameter analysis. We study how the parameter �� influ- ences the process of random walk with restart. Figure 5(c) plots the performance of the CTL method on the four test cases with the pa- rameter �� varied. In general, the recommendation performance is not sensitive to the restart parameter ��. By a careful investigation, we find that a small �� makes the random walk diffuse too quickly thus can hurt the precision, while a large �� limits the diffusion pro- cess and thus can result in a lower recall. Convergence analysis. We further investigate the convergence of the random walk with restart algorithm. Figure 5(d) shows the convergence analysis of different models on the test case of Visualization-Data Mining. We see all the three models converge within 10 iterations and CTL achieved even faster convergence
DM−TH MI−DB MI−DM CV−DM 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Mean Average Precision
Content CF Hybrid Author Matching Topic Matching Katz CTL
Figure 6: Performance on new collaboration prediction of all algorithms. (within 5 iterations). This fast convergence on CTL model enable real time query support that is crucial in the deployed system we will discuss next. New Collaboration Prediction The collaboration network is dy- namic in nature, with collaborative relationships created over time. In general, there are two types of collaborative behaviors, maintain- ing existing collaborations and building new collaborations. Can we predict who will create a new collaboration in the future? This is a more difficult task. We conduct an experiment to evaluate the performance of the proposed method for new collaboration predic- tion. In particular, we still use the publication data before 2001 for training and the data between 2001-2005 for test, and in the evalua- tion, we only consider new collaborations in the test data. Figure 6 shows the performance of new collaboration prediction by the six comparison algorithms. On average, the performance of all algo- rithms drops a bit, but all algorithms have similar behaviors as that in Table 2. In particular, it is exciting to see that CTL can still main- tain about 0.3 in terms of MAP which is significantly higher than the baseline methods.
4.3 Prototype System
We have developed and deployed a web application for cross- domain recommendation based on the proposed CTL method5. The system trained a CTL model offline using all the publication data (about 1,932,442 publication papers) in Arnetminer.org. When a user wants to find cross-domain collaborators, he first inputs his profile (including organization and research interest) or use an ex- isting author profile via the Arnetminer system, which includes more than 1 million researcher profiles. Then the user inputs the target domain (by keywords) in which he wants to find collabo- rations. The system performs the random walk with restart algo- rithm (Cf. ��3.3) online against the CTL model to rank potential topics/collaborators in the target domain.
Collaboration recommendation plays an important role in many fields and has attracted a lot of research interest. Chen et al. [7] have developed a system called CollabSeer for discovering poten- tial collaborators for a given author based on the structure of the coauthor network and the user��s research interests. This is the most relevant paper to our work. However, it does not consider the cross-domain problem. Konstas et al. [17] investigated how social relationships can help recommendation. They developed a

Page 8
20 40 60 80 100 120 140 160 180 200 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.42 #Cross−domain Topics Mean Average Precision Data Mining − Theroy Medical Info. − Database Medical Info. − Data Mining Visualization − Data Mining
(a) number of topics T
0.2 0.4 0.6 0.8 1 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.42 �� Mean Average Precision Data Mining − Theroy Medical Info. − Database Medical Info. − Data Mining Visualization − Data Mining
(b) Hyperparameter ��
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 �� Mean Average Precision Data Mining − Theroy Medical Info. − Database Medical Info. − Data Mining Visualization − Data Mining
(c) RWR parameter ��
1 2 3 4 5 6 7 8 9 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 #Iterations Mean Average Precision Author Matching Topic Matching CTL
(d) Convergence analysis Figure 5: Parameter analysis. (a) Performance of cross-domain topic learning model by varying the number of topics T; (b) Performance of
cross-domain topic learning (CTL) is stable when varying �� parameter; (c) Performance of CTL is stable when varying the restart parameter �� in the random walk process on the four test cases; (d) Convergence analysis of different models on the test case of Visualization-Data Mining.
track recommendation system by considering both social annota- tion and friendship inherent in the social graph established among users, items and tags. Kautz et al. [16] introduced a system called ReferralWeb which attempts to combine social networks for col- laborative filtering. There are a large body of research on collabo- rative filtering. For example [2] introduced a system called Fab by combining content-based filtering and collaborative filtering. Shi et al. [26] proposed a large scale machine learning system for rec- ommending heterogeneous content in social networks and Sculley et al. [25] presented a method to rank which combines regression and ranking. Yuan et al. [35] aimed to fuse heterogeneous social relationships for recommendation using factorization and regular- ization technologies. Wang and Blei [34] developed an algorithm to recommend scientific articles to users of an online community by combining traditional collaborative filtering and probabilistic topic modeling. However, most existing works only consider the recom- mendation problem within one single domain, but do not consider the cross-domain recommendation problem. In addition, we pro- pose a novel cross-domain topic learning method, which supports recommending collaboration topics as well. Our work is also related to expert finding [3, 30, 36] and ex- pertise matching [23, 33]. Mimno et al. [23] and Tang et al. [33] studied the problem of paper-reviewer recommendation, a subtask of expert finding. The proposed algorithms can be leveraged for collaboration recommendations. However, expert finding and ex- pertise matching are in nature different from the problem of col- laboration recommendation. The idea of differentiating irrelevant topics has been also studied in previous work such as the query- oriented topic model (qLDA) proposed in [29], which tries to iden- tify relevant topics to queries in multi-document summarization.
In this paper, we study the problem of cross-domain collabora- tion recommendation. We precisely define the problem and present three models for ranking and recommending potential collabora- tors. A cross-domain topic modeling approach has been proposed to learn and differentiate collaboration topics from other topics. Ex- perimental results in a coauthor network demonstrate the effective- ness and efficiency of the proposed approach. As for the future work, it is intriguing to connect cross-domain collaborative relationships with social theories. For example, how cross-domain relationships correlate with strong/weak ties [12] and how such correlation can help spread knowledge from one domain to another domain. It would be also interesting to apply the pro- posed method to other networks, e.g., software development. Acknowledgements.
The work is supported by the Natural Science Foundation of China (No. 61073073, No. 61170061) and Chinese National Key Foundation Research (No. 60933013, No.61035004), 973 Program (No. 2011CB302302), a special fund for Fast Sharing of Science Paper in Net Era by CSTD, and Tsinghua-Tencent innovation funding.
[1] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999. [2] M. Balabanovic and Y. Shoham. Fab: content-based, collaborative recommendation. Commun. ACM, 40:66–72, March 1997. [3] K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In SIGIR��06, pages 43–55, 2006. [4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993–1022, 2003. [5] W. Buntine and A. Jakulin. Applying discrete pca in data analysis. In UAI��04, pages 59–66, 2004. [6] D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1):2, 2006. [7] H.-H. Chen, L. Gou, X. Zhang, and C. L. Giles. Collabseer: a search engine for collaboration discovery. In JCDL��11, pages 231–240, 2011. [8] A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: Scalable online collaborative filtering. In WWW��07, 2007. [9] M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143–177, Jan. 2004. [10] A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In UAI��00, pages 176–183, 2000. [11] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM��99, pages 251–262, 1999. [12] M. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):1360–1380, 1973. [13] T. L. Griffiths and M. Steyvers. Finding scientific topics. In PNAS��04, pages 5228–5235, 2004. [14] G. Heinrich. Parameter estimation for text analysis. Technical report, University of Leipzig, Germany, 2004. [15] T. Hofmann. Probabilistic latent semantic indexing. In SIGIR��99, pages 50–57, 1999. [16] H. Kautz, B. Selman, and M. Shah. Referral web: Combining social networks and collaborative filtering. Communications of the ACM, 40(3):63–65, 1997. [17] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborative recommendation. In SIGIR��09, pages 195–202, 2009. [18] J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD��06, pages 631–636, 2006. [19] J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and negative links in online social networks. In WWW��10, pages 641–650, 2010. [20] D. Liben-Nowell and J. M. Kleinberg. The link-prediction problem for social networks. JASIST, 58(7):1019–1031, 2007. [21] R. Lichtenwalter, J. T. Lussier, and N. V. Chawla. New perspectives and methods in link prediction. In KDD��10, pages 243–252, 2010.

Page 9
[22] L. Lovasz. Random walks on graphs: A survey. Combinatorics, 2(1):1?6, 1993. [23] D. Mimno and A. McCallum. Expertise modeling for matching papers with reviewers. In KDD��07, pages 500–509, 2007. [24] J. Quackenbush. Microarray analysis and tumor classification. New England Journal of Medicine, 354:2463–2472, June 2006. [25] D. Sculley. Combined regression and ranking. In KDD��10, pages 979–988, 2010. [26] Y. Shi, D. Ye, A. Goder, and S. Narayanan. A large scale machine learning system for recommending heterogeneous content in social networks. In SIGIR��11, pages 1337–1338, 2011. [27] M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery. In KDD��04, pages 306–315, 2004. [28] J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In ICDM��05, pages 418–425, 2005. [29] J. Tang, L. Yao, and D. Chen. Multi-topic based query-oriented summarization. In SDM��09, pages 1147–1158, 2009. [30] J. Tang, J. Zhang, R. Jin, Z. Yang, K. Cai, L. Zhang, and Z. Su. Topic level expertise search over heterogeneous networks. Machine Learning Journal, 82(2):211–237, 2011. [31] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD��08, pages 990–998, 2008. [32] L. Tang and H. Liu. Relational learning via latent social dimensions. In KDD��09, pages 817–826, 2009. [33] W. Tang, J. Tang, T. Lei, C. Tan, B. Gao, and T. Li. On optimization of expertise matching with various constraints. Neurocomputing, 76(1):71–83, 2012. [34] C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In KDD��11, pages 448–456, 2011. [35] Q. Yuan, L. Chen, and S. Zhao. Factorization vs. regularization: fusing heterogeneous social relationships in top-n recommendation. In RecSys��11, pages 245–252, 2011. [36] J. Zhang, J. Tang, and J. Li. Expert finding in a social network. In DASFAA��07, pages 1066–1069, 2007.
According to the generative process, we could integrate out the multinomial (Bernoulli) distributions ��, �� , ϑ, ��, ��, because the model only uses conjugate priors [10]. We use Eq. 4 as the example to explain its derivation. First we write the joint probability:
P (x, x , z, z , s, v, v |��, ��, ��t, ��, A) �� �� P (s|��)P (��|��, ��t)d�� �� P (v|A)P (z|v, s, ��)P (��|��)d�� �� P (v |A )P (z |v , s, �� )P (�� |��)d�� �� P (x|z, ��)P (��|��)d�� �� P ((v, v )|A)P (z|(v, v ), s, ϑ)P (ϑ|��) (9)
The conditional of si is obtained by dividing the joint distribu- tion of all variables by the joint with all variables but si (denoted by s−i) and canceling factors that do not depend on s−i.
p(si = 0|s−i, z, .) = P (x, x , z, z , s, v, v |��, ��, ��t, ��, A) P (x, x , z, z , s−i, v, v |��, ��, ��t, ��, A) = �� P (s|��)P (��|��, ��t)d�� �� P (s−i|��)P (��|��, ��t)d�� · �� P ((v, v )|A)P (z|(v, v ), s, ϑ)P (ϑ|��) �� P ((v, v )|A)P (z|(v, v ), s−i, ϑ)P (ϑ|��) (10)
We now derive the first fraction of Eq. 10. As we assume that si is generated from a Bernoulli distribution �� whose Beta parameters are ��,��t, then we can get p(s|��) = ��
d �� nds0 d
· (1 − ��d)nds1 , where nds0 is the number of times that s = 0 has been sampled in document d and nds1 represents the number of times that s = 1 has been sampled in d. Because Beta is the conjugate prior of Bernoulli, we could solve the Bernoulli-Beta integral using Gibbs sampling. Specifically,
�� P (s|��)P (��|��, ��t)d�� = ��
1 B(��t, ��) �� 1
nds0 +��t−1 d
(1 − ��d)nds1
d��d = ��
B(nds0 + ��t, nds1 + ��) B(��t, ��) = ��
��(nds0 + ��t)��(nds1 + ��)��(��t + ��) ��(nds0 + nds1 + ��t + ��) (11)
To yield the first fraction of Eq. 10, we apply the above equation twice and obtain the following equation:
�� P (s|��)P (��|��, ��t)d�� �� P (s−i|��)P (��|��, ��t)d�� = ��
d ��(nds0 +��t)��(nds1 +��)��(��t+��) ��(nds0 +nds1 +��t+��)
d ��(n−di ds0 +��t)��(n−di ds1 +��)��(��t+��) ��(n−di ds0 +n−di ds1 +��t+��)
= n
−di ds0
+ ��t n
−di ds0
+ n
−di ds1
+ ��t + �� (12)
Here, we use the identity ��(x + 1) = x��(x); the super- script −di denotes a quantity, excluding the current instance. The second fraction of Eq. 10 can be derived analogously. Specifi- cally, as P((v, v )|A) is a uniform distribution, P(z|(v, v , s, ϑ) and P(ϑ|��) are conjugate pair of Multinomial-Dirichlet, we can obtain [14]:
�� P ((v, v )|A)P (z|(v, v ), s, ϑ)P (ϑ|��) = ��
1 ��(Ad) · 1 ��(��) ��
nvz +nvz +nvvz +��−1 vvz
dϑvv = ��
1 ��(Ad) ��(nd + ��) ��(��) , with nd = {nvz + nvz + nvvz }T
where ��(Ad) is the total number of cross-domain user pairs gener- ated from authors of document d (for a specific document, the num- ber will be a constant); ��(��) = ��(��)T
��(T ��)
; nvv z denotes the number of times that topic z has been sampled by user pair (v, v ); nvz and nv z are two numbers obtained when combining the two distribu- tions ��v and ��v ; please note that though we write it as the sum of the two numbers, in practice, when sampling a specific topic, we will only consider one of them. This is because, for example, if a topic z is from the source domain, the number nv z will be 0. Accordingly, the second fraction of Eq. 10 can be written as:
�� P ((v, v )|A)P (z|(v, v ), s, ϑ)P (ϑ|��) �� P ((v, v )|A)P (z|(v, v ), s−i, ϑ)P (ϑ|��) = ��
d 1 ��(Ad) ��(nd+��) ��(��)
d 1 ��(Ad) ��(nd,¬i+��) ��(��)
��(nvvz +nvz +nvz +��) ��( �� z (nvvz +nvz +nvz +��)) ��(nvvz +nvz +nvz +��−1) ��([ �� z (n−di vvz +nvz +nvz +��)]1)
= n
−di vvzdi
+ (nvzdi + n
) + �� ��
z (n −di vvz + (nvzdi + n vzdi
) + ��) (14)
Finally, by combining Eqs. 12 and 14, we obtain Eq. 4. 1293
Search more related documents:Cross-domain Collaboration Recommendation

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com