Views
5 years ago

April 10, 2011 Salzburg, Austria - WOMBAT project

April 10, 2011 Salzburg, Austria - WOMBAT project

Attribute Cardinality

Attribute Cardinality H(A) � � I(·, A) � — — — 2.436 ∗ 10 −2 Type 4 170.8 7.705 ∗ 10 −3 Country 12 250.9 1.496 ∗ 10 −2 Category 15 258.5 1.401 ∗ 10 −2 l 48 297.9 1.288 ∗ 10 −2 Dept 79 328.8 1.345 ∗ 10 −2 Title 249 319.0 3.387 ∗ 10 −3 Table 3: Entropy of Customer 1 Attribute Cardinality H(A) � � I(·, A) � — — — 9.226 ∗ 10 −3 C 2 8.14 6.454 ∗ 10 −6 G 2 72.70 4.263 ∗ 10 −4 B 6 97.96 7.820 ∗ 10 −4 A 48 845.86 7.370 ∗ 10 −3 M 53 816.69 5.131 ∗ 10 −3 H 254 878.00 3.663 ∗ 10 −3 X 756 858.61 4.033 ∗ 10 −17 L 849 857.27 4.661 ∗ 10 −17 Table 4: Entropy of Customer 2 attribute assignment in bits. The results are shown in Tables 3–5 for customer 1, customer 2, and customer 3, respectively. To help further analyze the results of our entropy calculations, we also plot the distribution of the twenty most frequent values for each attribute in Figures 3–5. If there are more than twenty values for each attribute, the tail is aggregated into a final black bar. We expect attributes that cause significant decreases in the entropy of permission assignments, and have a low entropy themselves, are likely useful for making access control predictions and attributing authorizations. Attributes with high entropy, or a high cardinality, are unlikely to be useful. In [13] it was illustrated how an attribute that significantly reduces the entropy of permission assignments, such as the user’s last name, may be semantically meaningless from a security context. If the data were too heavily anonymized, including the attribute name, it may be difficult to make such a distinction. There are several interesting observations we can draw from the entropy of user-attributes and the entropy of permission assignments conditioned on attributes. For example, consider the C and G attributes from the customer 2 dataset. Both are binary variables, yet the G attribute has almost 70 times more entropy for the dataset, and decreases permission uncertainty by two orders of magnitude more. By looking at the distribution of these attributes in Figure 4, we can see there are very few users with the second attribute type. Other interesting properties are found in customer 2, such as the distribution of attributes, X, H, and L, which are almost completely contained within the long tail. 4.2 Prediction Using Attributes In the previous section we use the measure of the entropy reduction from Frank et al. [6] to predict which attributes will provide the best predictive performance. Previously, 57 Country 100 0 0 50 5 Dept 10 0 0 100 10 Category 20 0 0 5 10 15 Type 200 100 0 0 2 4 Title 200 100 0 0 10 20 100 50 0 0 10 20 Figure 3: Customer 1 User Attribute Distributions Attribute Cardinality H(A) � � I(·, A) � — — — 7.181 ∗ 10 −3 Contractor 2 592.1 1.02 ∗ 10 −3 Organization 12 3132.6 5.87 ∗ 10 −3 Level 17 618.2 1.50 ∗ 10 −3 Location 53 3278.4 6.85 ∗ 10 −3 Dept 192 3195.8 6.01 ∗ 10 −3 Manager 298 3152.2 4.87 ∗ 10 −3 Title 525 3184.1 4.53 ∗ 10 −3 Table 5: Entropy of Customer 3 Molloy et al. [13] illustrated how to use user-attributes to detect errors in access control policies using collective matrix factorization [14], a process that involves learning both a factorization of the user-permission and the user-attribute relations such that they share a common factor. In matrix decomposition, one decomposes a matrix Y , such as the userpermission relation, into two matrices such that Y ≈ AB T . If A and B are binary, these are the user-assignment and permission-assignment relations in RBAC, otherwise they have limited semantic utility. In collective matrix factorization we have a second (or more) matrix X, such that X and Y share a dimension (in this instance, the users). We wish to decompose both matrices such that they share a common factor, X ≈ CA T and Y ≈ AB T , where the factor A is shared. When decomposing a single matrix, one is interested in minimizing a distance function D(Y � AB T ), such as the Frobenius norm in singular value decomposition. When we have two (or more) relations, we minimize a linear combination of their losses, αD(X � CA T ) + (1 − α)D(Y � AB T ). Here α is a mixing parameter of the importance of reconstructing X versus Y . Using collective matrix factorization, we can find a good value of α and balance the utility of reconstructing the userpermission relation; from a security standpoint, this is the most important relation. We use logistic PCA to reconstruct both the user-permission and user-attribute relations, which have been converted to binary relations. We hold out 20% of the data and measure the mean absolute error (MAE) of the zero-one loss for reconstructing the binary relation. This l

200 100 0 0 500 A 10 C 20 0 0 500 1 G 2 0 0 500 1 X 2 0 0 500 10 B 20 0 0 2 H 4 6 400 200 0 0 500 10 L 20 0 0 200 10 M 20 0 0 10 20 Figure 4: Customer 2 User Attribute Distributions measures how well we can predict missing values, and how well we can model the access control policies. The results, varying the mixing parameter α, are shown in Figure 6, and the improvement in predictive performance is given in Table 6. There is a correlation of 0.68 between I(·, A), and the predictive improvement. Attribute Predictive Improvement Std. Dev. Manager 20.34% 0.14 Department 25.41% 0.24 Title 15.03% 0.02 Location 21.44% 0.14 Organization 18.53% 0.51 Level 18.59% 0.1 Contractor 12.25% 0.24 Table 6: The total entropy of the user-permission relation given knowledge of a user’s attribute. While collective matrix factorization worked well for the customer 3 dataset, and key attributes, such as manager, department, and location were able to decrease the reconstructive error by over 20% each, we did not observe equally impressive results for the other datasets. On the customer 1 dataset, if fact, we did not observe a decrease in reconstruction error by adding user-attributes. The MAE loss for both the department user-attribute and user-permission relations using 15 latent variables are shown in Figure 7. We observed similar behavior for the other attributes and saw no remarkable improvement by either increasing or decreasing the number of latent variables. Similar behavior has been observed in the context of role mining when trying to find semantically meaningful roles. For example, Frank et al. [6] select roles with high attribute compliance from roles generated with multi-assignment clustering [16]. Their experiments on a real-world dataset from 58 Location 500 0 0 1000 10 Dept 20 0 0 2000 1000 0 0 10 Manager 10 Organization 20 20 500 0 0 5 Title 10 1000 0 0 10 Contractor 20 2000 1000 0 0 1 Level 2 2000 1000 0 0 10 Figure 5: Customer 3 User Attribute Distributions a bank found that by increasing the attribute compliance of mined roles increased the number of errors, over- and underassigning permissions to users. One possible explanation is that collective matrix factorization (and the method in [6]), as applied above, uses a linear model of the user attributes. For example, a user’s department or title may be used to determine factors influencing assignments, but cannot express preconditions such as “Department = ‘Research’ AND Title = ‘Manager’.” It may be possible to observe better predictive performance by applying a kernel trick at the expense of increasing the effective number of attributes. How best to model access control policies leveraging user-attributes remains an open problem and requires additional real-world datasets to validate new models. Another hypothesis is that customer 3 is more strict with how it manages its access control policies and ensures all data is accurate and up-to-date. The industry in which customer 3 operates imposes more laws and regulations governing its IT infrastructure than either customer 1 and 2. Further, some of customer 3’s operations have greater security needs and requirements, and this in turn may result in user entitlements being traced back to user attributes more effectively for auditing purposes. Finally, we should note that just because user attributes result in a decrease in entropy, does not imply the permis- Figure 6: The MAE for the user-permission relation X given several attributes.

D06 (D3.1) Infrastructure Design - WOMBAT project
6-9 December 2012, Salzburg, Austria Social Programme
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
D I P L O M A R B E I T - Salzburg Research
ECCMID meeting Vienna, Austria 10-13 April 2010 - European ...
April 10, 2011 - University of Cambridge