Empirical Modeling and Data Analysis for Engineers and Applied Scientists

Free download. Book file PDF easily for everyone and every device. You can download and read online Empirical Modeling and Data Analysis for Engineers and Applied Scientists file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with Empirical Modeling and Data Analysis for Engineers and Applied Scientists book. Happy reading Empirical Modeling and Data Analysis for Engineers and Applied Scientists Bookeveryone. Download file Free Book PDF Empirical Modeling and Data Analysis for Engineers and Applied Scientists at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF Empirical Modeling and Data Analysis for Engineers and Applied Scientists Pocket Guide.

Insofar as choice homophily is concerned, it is not understood why people tend to select network partners with similar traits. Of course, this is also complicated by the fact that different traits might matter differently for homophily, and these differences are likely to be highly dependent on context—so, for example, a person might select their friends based on similarity in political views, but not based on similarity in musical tastes.

Third, homophily is not just a matter of tie formation. Choice homophily in particular may also influence choices about the termination of social ties, or avoidance of certain relationships altogether. In the extensive literature on homophily, scholars almost exclusively consider homophily in terms of the hypothesis similar individuals tend to form network ties. Proof of the former statement—the classic understanding of homophily—does not imply proof of the latter statement.

Starting with the observation that many communities exhibit racial segregation such that people of similar races tend to live spatially proximate to one another a natural explanation is that people have a strict preference for living in a racially homogenous neighborhood—that is, that individuals exhibit attraction homophily in their choice of neighborhood, based on race. Rather, it is sufficient to assume that individuals have a slight bias against being a minority in their local neighborhood—that is, people exhibit a small amount of aversion homophily in their choice of residential neighborhood, based on race.

When this processes is modeled in a network context it can be shown that slight levels of aversion homophily are sufficient to produce global patterns of network segregation and community structure that are characteristic of many real-world networks [ 56 ]. In this paper, we view both types of homophily as independent processes that influence the structure of observed networks: first through the deletion of ties as a result of aversion homophily Fig 1 , process 1 , and second, through the formation of new ties as a result of attraction homophily Fig 1 , process 2.

As noted above, we study homophily in terms of the various tastes that people have regarding Web browsing behavior and mobile application usage. We therefore consider two homophily hypotheses:. While actor attributes—such as tastes—shape network structures, networks also have a reciprocal effect on attributes. Many attributes that drive network structures are malleable, particularly cognitive and behavioral attributes such as opinions, values, and cultural traits.

Social influence is depicted as process 3 in Fig 1. Scholars have long been interested in developing models of social influence, with early applications in the modeling of how groups with varied opinions on an issue eventually reach a consensus [ 58 ]. More recent work seeks to reconcile the idea that the opinions of connected individuals tend to become more similar over time with the observation that we live in a diverse society where people hold diverse opinions.

Assuming that individuals always adopt the opinions or other traits of their neighbors always produces consensus, unless networks are fragmented to such an extent that certain groups of people never interact with one another [ 23 , 59 ]. There are, however, theoretical mechanisms that will allow diversity to emerge even when social influence is at work. Applications of social influence models are not limited to investigations of why people hold certain opinions, attitudes, and beliefs, which are the focus of scholars such as [ 57 ]. Models of social influence may also be applied to related phenomena such as the adoption of new interests and tastes in an online social network [ 22 ], the spread of emotions due to online interactions [ 60 , 61 ], the adoption of communications technologies [ 20 ], sustainability behaviors [ 62 , 63 ], and even the diffusion of innovations across organizations [ 64 ] and governments [ 65 ].

H 3 : In a social network, actors tend to adopt the attributes of others they share direct connections with. It should be noted that this hypothesis should not apply to all situations where actors exercise—or are prone to—social influence in a network setting. For instance, when individuals learn about scientific information characterized by uncertainty, actors may have differential levels of trust in different information sources due to biased assimilation [ 66 — 68 ], which may create and reinforce polarization in a social network [ 21 ].

Another important caveat to this hypothesis is that social influence effects are likely to be mediated by various properties of the network. For instance, social influence as the degree of convergence in tastes between two actors is likely to be stronger when actors have higher levels of trust, or more frequent interactions. Actors are therefore more likely to adopt the tastes of others who are more similar to themselves, or with whom they share both direct and indirect ties [ 46 , 69 ].

Moreover, actors are not only influenced by those in their immediate network neighborhood, but are likely influenced by global shifts as well. These local versus global influences are illustrated by research showing that copycat suicides—that is, suicides resulting from social influence—may be attributed to interpersonal connections as well as news about suicides committed by prominent individuals [ 70 ].

This possibility is examined below in the analysis of a large empirical dataset on social connections and tastes. To test the above hypotheses of homophily and social influence in networks, we examine a unique dataset on instant messaging and user tastes from a popular mobile phone platform in the United States. Attributes of network nodes are viewed as a combination of Web browser activity and application download history. While taste categories form a tree structure, in this paper we examine only the seventeen top-level taste categories: lifestyle, finance, business, entertainment, ringtones, photo, news, utilities, health, sports, themes, books, games, social, productivity, education and navigation.

The company uses a proprietary algorithm to extract tastes from their users, without any intervention, by mining their activities on the system [ 71 , 72 ]. This algorithm assigns taste scores between 0 and 1 for individual users over time in each of the seventeen taste categories, where higher values indicate stronger interests in Web searches and applications relevant to the given category.

This provides a dynamic view of how the interests and behaviors of users change over time. While the details of the algorithm are private to the mobile device company, the practice of inferring tastes from application and Web use is a widespread practice [ 71 , 72 ]. These data are typically used to model customer characteristics and provide personalized services, and are particularly useful when there is no opportunity to obtain direct information about mobile device users due to privacy concerns. It should be noted that tastes are only estimated for users with activity in a given month, and for those users who allow their usage history to be tracked privacy settings allow users to prevent their usage history from being recorded by the mobile company.

Furthermore, this dataset was made available for analysis by the mobile phone company only after it had been completely anonymized. The dataset used here is drawn from two snapshots of the contact network and user tastes attributes at two points in time: October 19, and November 22, Included in this analysis are all mobile platform users in the United States with taste data in a given month—this excludes all users who either had no recorded activity, as well as those users who used their privacy settings to prevent tracking of their usage history.

Table 1 provides descriptive statistics for the two network snapshots, and degree distributions for both networks are depicted in Fig 2. Between the two time periods, there were approximately , removals of ties and 1. If we examine the full network and not only the users included in this analysis , we see that the network grew in size between the two time periods. As seen in Fig 2 , both network snapshots are heavy-tailed with a maximal degree of about six thousand , sparse networks with an average degree of 3. These characteristics are typical for communication networks among mobile phone users [ 73 , 74 ].

These data are used to fit statistical models that test our hypotheses about how attribute similarity or difference influences tie formation or deletion , and how tie structures influence changes in tastes. The basic logic underlying these models is to propose a general functional form where the dependent variable tie formation or deletion in the case of homophily, and taste changes in the case of social influence is either a logistic or linear function of one or more independent variables.

These independent variables include the primary causal effect i. Crucially, the temporal ordering of our observations across two closely-spaced time periods allows us to use these models to infer causal relationships. In particular, homophily hypotheses are tested using functional forms with change in network ties between the first and second time periods as the dependent variable, and with attribute similarities in the first time period as the main independent variable.

The social influence hypothesis is testing using a functional form with changes in tastes between the first and second periods as the dependent variable, and tie existence or absence in the first time period as the main independent variable. The functional forms used to test these hypotheses represent assumptions about which variables might potentially influence each dependent variable, however the functions also include parameters weights on each independent variable that represent the strength of each causal effect in the model.

Models are estimated using standard statistical methods—linear and logistic regression—where parameters are selected that minimize the differences between observed dependent variables and those predicted by the model [ 75 , 76 ].

The following sections provide more detail on the precise functional forms used to test our three hypotheses. The structure of these data allow us to explicitly test two distinct views of homophily discussed above—homophily as a process of attraction to others with similar attributes H 1 , and homophily as a process of aversion from others with dissimilar attributes H 2. To do this, we take all dyads i. These groups define whether the dyad contained a newly formed tie Group 0—1 , a deleted tie Group 1—0 , a tie that was present in both time periods stable ties; Group 1—1 , or a tie that was absent in both time periods stable non-ties; Group 0—0.

For each dyad in the system, we also calculate two alternative measures of how different each pair of actors are in terms of their attributes. We use a measure of difference rather than similarity because zero has a natural interpretation in a difference measure i. While Hamming distance is estimated based on the number of shared tastes, Euclidean distance is estimated by weighting the shared tastes. Since tastes do change between the two time periods, the taste difference variable is recalculated for each snapshot. Given both the taste difference measures and the grouping of dyads, we are able to test the homophily hypotheses according to the following logic:.

The attraction hypothesis is tested by fitting a logistic regression model using dyads in Groups 0—0 and 0—1 as the unit of analysis: 1. Two important issues with this approach require discussion. First, the use of logistic regression models assume that observations are independent of one another—in other words, the probability of tie formation between one pair of actors is uncorrelated with tie formation between any other two actors in the system. This is not a good assumption, especially given that the same actors appear in multiple dyads.

To deal with this problem, we fit models only on a random sample of dyads in the network. Given that this is a relatively small percentage of actor-actor pairs spread out over an entire national market, the chances that we have replicated agents in the sample is very small, and we will be able to safely assume that dyads included in this analysis are, for the most part, independent.

Dealing with Data Deluge

The second issue is that Group 0—0 is extremely large compared to Group 0—1. By comparing samples drawn from both groups we invoke an implicit theory that all disconnected actor pairs have an equal opportunity to form a link between the two time periods. This is, of course, unrealistic. Thus, we are in need of a way to narrow the population of stable non-tie dyads Group 0—0 to a subgroup of dyads that we can realistically assume had an opportunity to form a link between the two time steps.

Our approach is to look at the geodesic path distances between actor pairs in Group 0—0 in the first time step, and compare them with the geodesic path distances between actor pairs in Group 0—1 newly formed ties.

Advances in predictive maintenance planning of roads by empirical models

These distributions, depicted in Fig 3 , suggest that there is a maximum path length beyond which it can be safely assumed that actors are in completely different communities, with no opportunity to form ties with one another. Orange region represents the density of geodesic path lengths between disconnected actors who formed a tie before the second time period members of Group 0—1.

White region represents the geodesic path lengths between disconnected actors that did not form a tie before the second time period members of Group 0—0. Examining differences between these distributions enables us to develop a heuristic, based on path length, for whether two actors have an opportunity to form a tie. The 98 th percentile of path lengths among actors with newly-formed ties Group 0—1 is six, meaning that fewer than two percent of all actors who formed a tie started with greater than six degrees of separation.

This suggests that dyads excluding from Group 0—0 all dyads where actors are separated by more than 6 degrees of separation narrows the universe of Group 0—0 dyads to those who plausibly had an opportunity to form a tie. As predicted H 1 , we find strong negative coefficients on both concepts of taste differences. This indicates that as taste differences in the first time period increase, the probability of tie formation between the two time periods decreases. Testing for the aversion aspect of homophily is more straight-forward, since it is not necessary to estimate the population of ties that are candidates for deletion.

Dyads in Groups 1—0 and 1—1 all have an existing tie on the first time period, all of which are treated as candidates for deletion. Similar to the analysis reported above, the probability of tie deletion is estimated as a logistic function of attribute differences in the first time period, using dyads drawn from Groups 1—0 and 1—1 as the unit of analysis: 2. Results from this logistic regression analysis are reported as Models 3 and 4 in Table 3. As with H 1 , these results lend strong support for the aversion hypothesis H 2. This indicates that, as distances increase, the probability of tie deletion also increases.

This effect is also represented graphically in Fig 4. The models including Hamming distance allow for a slightly easier interpretation, since coefficients in these models tell us how dissimilarity in each additional taste dimension will change the probability of tie formation or deletion. The longitudinal nature of this dataset also allows us to directly test the social influence hypothesis H 3 , that individuals tend to adopt the attributes of those they are connected to in a social network.

In the context of this study, we examine whether changes in tastes between the two time periods can be explained by social contacts in the first time period.

  1. Download Empirical Modeling And Data Analysis For Engineers And Applied Scientists!
  2. The Possibility of Language: A discussion of the nature of language, with implications for human and machine translation.
  3. The Upside-Down Christmas Tree: And Other Bizarre Yuletide Tales?
  4. TAMIDS Research Affiliates.
  5. Contemporary Approaches to Romance Linguistics: Selected Papers from the 33rd Linguistic Symposium on Romance Languages (LSRL), Bloomington, Indiana, April 2003?
  6. A Twist of Lyme: Battling a Disease that Doesnt Exist?

The constant coefficients a and b are weights that indicate the degree to which a user is influenced by these global and local trends, where a zero value of a or b indicates that users are not at all influenced by shifts in local or global tastes, respectively, and a value of one indicates that users are strongly influenced by shifts in these tastes. Since the model above 3 expresses changes in user tastes between the two time periods as a linear combination of observed local and global shifts in tastes, the coefficients a and b may be estimated using a standard ordinary least squares OLS regression.

Table 4 reports the results of seventeen regression models, one for each of the measured user tastes. This suggests that the tastes of users are indeed influenced by their social environment. As noted above, other aspects of the network may exercise an influence over these processes of social influence; that is, as a function of their local network, some users may be more or less susceptible to social influences.


One such influence is the degree of nodes—it turns out that users with more contacts also tend to experience smaller shifts in their own tastes. Shifts in four tastes are shown here for illustrative purposes; however, this decreasing trend is seen across all measured tastes. This suggests that as individuals are exposed to greater numbers of social contacts, they tend to be less influenced by their social environment.

Boxplots represent distributions of individual user shifts in tastes between October 19 th , and November 22 nd , Distributions are conditional on degree of user on October 19 th. The reservoirs are interconnected in that they are placed along major waterways, with some downstream of others, and also because they can receive similar amounts of water input and can be subject to similar management decisions.

The managers of these facilities must maintain a baseline amount of water in each reservoir. As the water level drops closer to that minimum mark, they dial back the amount of water released, which in turn affects all of the reservoirs downstream. Reservoir managers try to avoid having to shut off the water release completely, since that can have catastrophic consequences for farms and communities that rely on the water. The behavior of a reservoir—the rising and falling of the water level—is determined in part by shifts in the climate and in part by the humans managing the outflow of the reservoir.

These two components can make reservoir storage challenging to predict. But the behavior of these reservoirs is not solely determined by physical laws of the water cycle, but also by demands and what these reservoirs are being used for," says Caltech graduate student Armeen Taeb, lead author of a paper about the model that was published on November 22 by the journal Water Resources Research.

To solve this issue, Taeb and his colleagues— Venkat Chandrasekaran , professor of computing and mathematical sciences and electrical engineering at Caltech, and John Reager and Michael Turmon of JPL—used statistical techniques to learn from the past to shed light on how reservoirs will respond to different climate patterns in the future.

Empirical Model Building: Data, Models, and Reality, 2nd Edition

They compared fluctuations in reservoir water levels between and to a variety of factors, such as precipitation, the severity of the drought, the snowpack levels in the Sierras, and levels of other California reservoirs. The researchers found that the biggest predictor of changes in the reservoir network was the Palmer Drought Severity Index, which was developed by the National Weather Service in With this empirical model, Taeb says, managers can get a clearer picture of the demands that will be placed on their reservoirs, and can adjust their behavior earlier by curtailing water releases more gradually—reducing the possibility of having to cut off water releases altogether.

Hassan's team are already integrated into products used by millions of users worldwide. Hassan is the named inventor of patents at several jurisdictions around the world including the United States, Europe, India, Canada, and Japan. Validity oriented: replication and repeatability of previous work using predictive modelling in software engineering; assessment of measurement metrics for reporting the performance of predictive models; evaluation of predictive models with industrial collaborators;. We invite all kinds of empirical studies on the topics of interest e.

Both positive and negative results are welcome, though negative results should still be based on rigorous research and provide details on lessons learned. It is encouraged, but not mandatory, that conference attendees contribute the data used in their analysis on-line.

File Extensions and File Formats

Submissions can be of the following kinds:. Journal Special Issue Following the conference, the authors of the best papers will be invited for consideration in a special issue of the Empirical Software Engineering journal by Springer. Keynote Prof.