# Analyzing Censored Data in SEM

Censored data occurs when the value of a measurement is only partially known.You will often see censored data in clinical trials where there is a drug trying to prevent patient death. With some of the patients, we have a definite birth and death date, but for patients who have not died, we still do not know how long each patient will live in the future.This same type of data can be used in the field of business where we are trying to understand customer defection. We may know when a customer started patronizing a business and when they left, but there are other patrons who have not yet left, and so their “customer lifetime” with a business is not known.

You typically have three different types of censoring. One type is left censoring, where the unknown value is below a certain point. Second, there is right censoring, where the unknown value is above a certain point. The last type of censoring is interval censoring, where the unknown value is between two known data points.

Let’s look at an example to see how AMOS can help analyze censored data. We have a retailer that is trying to understand how long customers shop with them before defecting to another retailer.To understand this, the retailer has captured what year the customer initially started patronizing the retailer, frequency of purchases or the total number of purchases while they were a customer, and total sales or how much money was spent over the tenure of the customer. Lastly, the retailer has captured in days how long an individual was a customer of the retailer (noted in the Lifetime column).The retailer has also denoted if they are a current or former customer. Here is an example of data coded in SPSS in Figure 10.25.

Figure 10.25 Raw Data Coded in SPSS to Test a Censored Model

With classification of current and former customers, we are examining uncensored data with former customers and censored data with current customers. At this point, we do not know how much longer a customer will stay with the retailer.We are going to have to account for censored data in the lifetime column because we only partially know the value for current customers.You will also notice there is a column called SQLifetime. AMOS assumes censored data is normally distributed, and so to create a normal distribution with the data, we are going to take the square root of the lifetime column.

The data represents 132 respondents whose start year ranged from 1999 to 2018. With the former customers we have an exact lifetime in days, whereas with the current customers the lifetime cannot be determined yet. To use these unknown or censored data points, it has to be recoded in SPSS. For the current customers, the lifetime in days is going to be greater than the one presented in the data. We need to reflect this in the data, and so we will put a greater than symbol in front of the lifetime values and square root lifetime values for the current customers. To do this, you will need to convert the column in SPSS from numeric to string.

Figure 10.26 Data Coded for Unknown or Censored Points in SPSS

After altering all the current customers to reflect a greater than value for the lifetime of the customer, we are ready to save the data file and proceed to AMOS.

In AMOS, we are going to conceptualize a model that will help explain and project the lifetime of current customers by using the length of the time they have been a customer (Star- tYear), how many times they have purchased in the past (Frequency), and how much money they have spent (TotalSales). The first thing we need to do is read in the data to the AMOS program. Since we have a string variable in SPSS that has a greater than symbol in a field, we need to make sure to check the box that states, “Allow non-numeric data”.

Figure 10.27 Allowing Non-Numeric  Data With  a  Censored  Model Test

After reading in the data, we are going to conceptualize the model just like a path diagram where StartYear, Frequency, and TotalSales have a direct relationship to the square root life- time value.

Figure 10.28 Censored Model in AMOS

Once the model is drawn, we are going to use the Bayesian Analysis option in AMOS . After clicking this button, you will see the Bayesian window pop up.The analysis will initially have a red frowny face, which means the analysis is not ready, and when that frowny face turns into a yellow smiley face, then you can interpret the data. The Bayesian window will initially give you a lot of descriptives. The first subheading is our regression weights and how each independent variable influences the square root value of the customer lifetime. Next, you will see the mean value for each construct along with variances, covariances, and intercept of the analysis.

Looking at the regression weights, you will see that with every increase of one year (Star- tyear), the square root value of the customer lifetime decreases by 3.75 (or 14 days). With every purchase made (frequency), the lifetime of the customer increases (0.12). Total sales in this example had relatively no impact on the lifetime of the customer.

Figure 10.29 Bayesian Estimation of Censored Dependent Variable

If you want to see the posterior distribution for one of the independent variables, you need to click/highlight the row of interest and then right click, which gives you an option to see the prior or posterior distribution. The posterior distribution will give you different options on how to view the data. See Figure 10.30.

Figure 10.30 Show Posterior Prediction Graph of Relationship

So far, we have just examined the customer base as a whole, but what if we had a specific customer we wanted to examine and estimate how long the customer was going to be with the retailer? In the Bayesian window there is an icon at the top called posterior predictive  . By clicking this button, you will see the posterior predictive distribution for every row in your data set.You will notice that the current customers do not have a value in the square root of the customer lifetime column. It simply has a greater than symbol. If you click on that greater than symbol, AMOS will create another pop-up window that will show you the predictive distribution for that specific case.

For instance, let’s examine customer number 26, who has been with the retailer since 2015 and has made eight purchases that total \$199. By clicking on that greater than symbol, we can see what is the expected lifetime (square root) as it pertains to days projected in the future to be with the retailer. See Figure 10.32 to view the predictive distribution for cus- tomer 26.

Figure 10.31 Individual Customer Predictions of Censored Data

Figure 10.32 Graph of Posterior Prediction for Customer 26

The posterior distribution for customer 26 starts at 35, which means the customer has been with the retailer (35)2, or a total of 1,225 days. If we look at the projections, customer 26 is projected to most certainly defect to another retailer by value 49 or by day 2,401. This means the customer is projected to leave the company in the next four years.This is the best- case scenario for the retailer. If you look at the mean value for this projection (38.242), this equates to the customer leaving at the 1,462-day mark or less than a year from the starting point of this analysis.

Examining the posterior prediction distributions will give you a better understanding on how each independent variable is influencing your dependent variable, which in this case was the customer lifetime in days. If you had a specific customer that you really valued, you can see the exact projections on how many days they are expected to be with the retailer. In this example, I used customer lifetime in days, but I could have just as easily used total sales as the dependent variable and examined what values promoted sales. Unknown or censored data can still be analyzed and projections made that can aide a researcher.

Source: Thakkar, J.J. (2020). “Procedural Steps in Structural Equation Modelling”. In: Structural Equation Modelling. Studies in Systems, Decision and Control, vol 285. Springer, Singapore.