Dr Zoom: * Rating systems are too efficient a model not to spread

Josh Dzieza

Soon, you’ll be able to go to the Olive Garden and order your fettuccine alfredo from a tablet mounted to the table. After paying, you’ll rate the server.

Then you can use that tablet to hail an Uber driver, whom you’ll also rate, from one to five stars. You can take it to your Airbnb, which you’ll award one to five stars across several categories, and get a TaskRabbit or Postmates worker to pick up groceries — rate them too. Maybe you’ll check on the web developer you’ve hired through Upwork, perusing the screenshots taken automatically from her computer, and think about how you’ll rate her when the job is done. You could hire someone from Handy to clean the place before you leave. More stars.

The on-demand economy has scrambled the roles of employer and employee in ways that courts and regulators are just beginning to parse. So far, the debate has focused on whether workers should be contractors or employees, a question sometimes distilled into an argument about who’s the boss: are workers their own bosses, as the companies often claim, or is the platform their boss, policing their work through algorithms and rules?

But there’s a third party that’s often glossed over: the customer. The rating systems used by these companies have turned customers into unwitting and sometimes unwittingly ruthless middle managers, more efficient than any boss a company could hope to hire. They’re always there, working for free, hypersensitive to the smallest error. All the algorithm has to do is tally up their judgments and deactivate accordingly.

Ratings help these companies to achieve enormous scale, managing large pools of untrained contract workers without having to hire supervisors. It’s a nice arrangement for customers too, who get cheap service with a smile — even if it’s an anxious one. But for the workers, already in the precarious position of contract labor, making every customer a boss is a terrifying prospect. After all, they — we — can be entitled jerks.

"You get pretty good at kissing ass just because you have to," an Uber driver told me. "Uber and Lyft have created this monstrous brand of customer where they expect Ritz Carlton service at McDonald's prices."

In March, when Judge Edward Chen denied Uber’s motion for summary judgement on the California drivers’ class action suit, he seized on the idea that ratings aren’t just a customer feedback tool — they represent a new level of monitoring, far more pervasive than any watchful boss. Customer ratings, Chen wrote, give Uber an "arguably tremendous amount of control over the ‘manner and means’ of its drivers’ performance." Quoting from Michel Foucault’s Discipline and Punish, he wrote that a "state of conscious and permanent visibility assures the automatic functioning of power."

Starting with Ebay, rating systems have typically been described as way of establishing trust between strangers. Some commentators go so far as to say ratings are more effective than government regulation. "Uber and Airbnb are in fact some of the most regulated ecosystems in the world," said Joshua Gans, an economist at the University of Toronto, at an FTC workshop earlier this year. Rather than a single certification before you can begin work, everyone is regulated constantly through a system of mutually assured judgment.

Certainly customers sometimes have awful experiences — reckless driving, creepy comments — and the rating system can help report them. But when it comes to policing dangerous behavior, most of these platforms have come to rely not on ratings but on traditional safety measures — identity verification, background checks, and the knowledge that any illegal actions can be investigated and enforced through the tracking devices every worker carries. We can’t rate for criminal histories, poor training, or negligent car maintenance.

So what do we rate for? We rate for the routes drivers take, for price fluctuations beyond their control, for slow traffic, for refusing to speed, for talking too much or too little, for failing to perform large tasks unrealistically quickly, for the food being cold when they delivered it, for telling us that, No, we can’t bring beer in the car and put our friend in the trunk — really, for any reason at all, including subconscious biases about race or gender, a proven problem on many crowdsourced platforms. This would be a nuisance if feedback were just feedback, but ratings have become the primary metric in automated systems determining employment. If you imagine the things customers rate down for as firing decisions in a traditional workplace, they look capricious and harsh. It’s a strange amount of power for customers to hold, all the more so considering that many don’t know they wield it.

Sometimes, as in Uber’s system, workers have the opportunity to rate customers back. An Uber spokesperson told me that, "Uber’s priority is to connect you with a safe, reliable ride — no matter who you are, where you’re coming from, or where you’re going. Achieving that goal for our community means maintaining an environment of mutual accountability and respect. We want everyone to have a great ride, every time, and two-way feedback is one of the many ways we work to make that possible. "

A Handy spokesperson also emphasized the reciprocity of the rating system. "Handy is a transparent platform that takes care of every need in the home. As part of that transparency, we offer a two-way rating system so independent professionals can offer feedback on how a job went and customers can do likewise. This incentivizes excellence on both sides of the supply and demand equation, provides constructive feedback for professionals and customers, and opens up more lines of communication to ensure the best possible experience in the home."

The Verge also reached out to TaskRabbit and Postmates for comment about their rating systems, but they had not replied by time of publication.

But two-way feedback is often an imbalanced arrangement. To start with the obvious, only one party’s livelihood depends on ratings. Workers and customers are also subject to different standards. Uber drivers are deactivated when their ratings fall below 4.6 (Uber says the exact threshold varies based on the median rating in the area), but there’s no point at which customers are banned for ratings. And while some drivers may skip low-rated passengers, they can be deactivated for skipping too many. There are also asymmetries of information. Handy and TaskRabbit workers both said that customer ratings aren’t displayed at all. Instead, TaskRabbit workers must swap information about bad customers in private Facebook groups.

This imbalance translates into extreme wariness about the customers who wield the ratings. The dozen Uber drivers I spoke with — all of whom requested anonymity — said they were skittish about whom they picked up, not based on what their customer ratings were, but on signs that they might give harsh ratings. Some avoided five-star passengers because it meant they were new and might not know that anything less than five stars is a failing grade. Others said that if they got stuck in traffic or the rider placed the location pin wrong, they’d cancel the ride rather than risk deactivation from a customer who’d be annoyed at the outset. If the riders are holding red solo cups or look like they might try to cram too many people in the car, drivers cancel rather than get rated down for telling them no. You have to be careful who you pick up, one driver told me. "Once you start the ride, you’re screwed, you put all the power in the customer’s hands."

Drivers joke about the way riders sit in the back seat following their route on Google maps, silently monitoring whether they’re staying on course. If they take a wrong turn, drivers sometimes comp the rest of the trip — nice for the rider, but a significant loss for the driver, who has to choose between working for free or risking deactivation. "You’re in a state of neurotic anxious terror of making the tiniest slip up," said one driver.

More subtly, ratings result in a sort of coerced friendliness, emotional labor markedly different from unrated taxi drivers. "Ratings create strong incentives for drivers to be subservient, to smile, to be happy even when they’re not," said Temple Law School professor Brishen Rogers. "Taxi drivers have this freedom to be grumpy. It’s an entitlement of the job."

A Los Angeles driver had a particularly clear-eyed view, saying that ratings were essential to "dealing with the weak link, the driver’s personality, until the self-driving cars come."

Many drivers advised smiling and nodding, deflecting potentially controversial topics, and generally avoiding engaging riders any more than absolutely necessary. Several drivers placate riders with bottled water and candy. Another offers them charging USBs and Tide pens. Some offer control of the music, a practice encouraged by Uber’s partnership with Spotify. Several drivers said the best way to behave is like a servant. "The servant anticipates needs, does them effortlessly, speaks when spoken to, and you don’t even notice they’re there," said a driver in Sacramento. She grew up in Hong Kong with servants and theorized that some of the angst customers have about Uber stems from the fact that "Americans have no concept of the master-servant relationship, it’s un-American to have a tiered class society."

The spread of ratings-enforced emotional labor also means that men are learning to cope with something women have long dealt with. "Emotional labor has been a gendered thing for a long time," said Kati Sipp, a former labor organizer and editor of Hack the Union. "One of the things you’re seeing, especially with Uber and Lyft, is men being expected to perform emotional labor in ways lots of service sector women have been accustomed to for years. I got fired from a college table-waiting job because I didn’t smile enough at my customers."

Ratings also give power to whatever biases customers have. A Muslim driver based in Tampa jokes that his rating drops as his beard gets longer. "It’s weird, because my rating is 4.78 but I’m friendly and I drive a nice car," he said. Last week he took a ride with a white Uber driver who "was driving like a maniac" but whose rating was 4.9. "You can’t really prove it, but I’d say half my passengers ask me where I’m from, and I say, America, I was born in America!"

Until companies release ratings data, we can’t know for certain whether this is true, but a study of Airbnb users found that black hosts get less money for similar listings than white hosts, and another study found that white taxi drivers get higher tips than black ones. There’s no reason such biases wouldn’t carry over to ratings. Last week, Harvard Law professor Benjamin Sachs pointed out that if Uber is ruled an employer, discrimination through customer ratings could put it in violation of Title VII of the Civil Rights Act, something that would be just as true for every platform that runs on ratings.

Uber said it doesn’t track race data, but given the high potential for widespread discrimination, these companies have a duty to address it, employer status or no. The algorithms used to determine employment are exactly as racist or sexist as the customers doing the rating.

When ratings drop, the penalties can be severe. If you’re dependent on a platform for work, getting deactivated is less like an independent contractor getting a poor Yelp review and more like being fired, having your livelihood cut off, except through anonymous and opaque reviews and with very little recourse.

'I won’t let them bring alcohol in my car and you’re firing me?’

"I went to go to work one day, got up, got dressed, headed out to the car, turned the app on, and it said you’ve been deactivated," said a Florida driver who guesses she was rated low for refusing to let people bring drinks in her car. "It’s so horrible, have you ever been fired? You feel absolutely powerless. I’m trying to get Uber’s attention, saying ‘Wait don’t fire me, I didn’t do anything wrong, I’m following the rules, I won’t let them bring alcohol in my car and you’re firing me?’ I had no idea it would make me feel that way." She ended up paying $100 to an Uber-approved third party for a training class and is driving again, though she now balances risky, surge-priced, bar-hopping passengers with lower-paying daytime rides.

Work is similarly precarious on other platforms. A former Postmates worker was late to several deliveries during a blizzard in New York and woke to find himself deactivated. (Postmates’ cutoff is 4.7, according to workers.) On TaskRabbit, workers aren’t deactivated, but the penalties can still be high, especially for new workers. After watching friends get shunted into low-rated purgatory shortly after starting out, one worker said he took dozens of low-paying jobs primarily in order to get enough of a ratings buffer.

"We’re not just working for money," an Uber driver told me. "We’re working for ratings, but ratings have no value. Ratings serve only to prevent you from getting fired. Only bad things can happen to you. We’re scurrying like rats after these things with no value."

Ratings systems fall along a continuum. They can be used to establish trust between customers and workers, but they can also be a way to enforce standards of service, said Arun Sundararajan, a professor at NYU’s Stern School of Business who studies digital platforms. "It’s not helping customers make better choices by giving more information, it’s having a mechanism that allows automated customer feedback to identify workers who are subpar." The rating systems used by on-demand platforms all have a disciplinary element. On Airbnb and TaskRabbit, hosts and workers with poor records don’t get booted off, but they appear far down the list of options, so they get less work and get less for it. Uber, Postmates, and Handy are stricter, deactivating anyone whose rating falls below a certain threshold. The result is a workforce that despite being dispersed, atomized, and without official supervision, has remarkably consistent levels of service.

Businesses have incorporated customer feedback into management decisions for years, asking callers to stay on the line to fill out surveys or giving hotel guests questionnaires. But because apps can be designed to prompt ratings after every interaction, now customers actually fill them out. The high level of feedback allows companies to automate management, making the customer literally always right.

"Customers have always had power, there’s always been feedback, and workers could always be disciplined because a customer complained about you," said professor Sachs. "What’s new is the ease with which customers can give reviews and the almost complete substitution of customer reviews for management."

Replacing top-down management with distributed feedback is a very effective way of organizing a huge number of dispersed and untrained workers into an orderly and flexible network. But in traditional organizations, in addition to supervising workers, managers can also insulate them from irate customers. At the very least, managers have some legal restrictions on what they can fire for. Replacing them with customer judgments and automated systems puts workers in a precarious position.

"A lot of the history of the union movement was the fight against arbitrary managerial decision-making," said Sachs. "You don’t want to allow management to fire someone because they don’t like them, because they looked at them funny. One thing unions do is they impose rules, just cause firing provisions to curtail managerial power so it’s fair. But there’s no check on what customers can do, and it’s almost impossible to imagine how you’d impose such a check."

But rating systems are too efficient a model not to spread. All the economists, investors, and even workers I spoke to were in agreement on that point. About a third of Americans do freelance work, according to a study commissioned by the Freelancer’s Union, and rating systems are extremely effective at organizing and disciplining them. Ratings are both expanding to more traditional settings, like food service, in the form of Ziosk tablets, as well as forming the backbone of new labor platforms like Uber, TaskRabbit, and Upwork. "How else could you make sure a million drivers are doing their job?" asked Gans, the economist.

A TaskRabbit worker told me about an experience that felt like a preview of the future. For his last gig, he was called into Handy, the cleaning startup whose workers must maintain very high ratings or risk decreased pay and deactivation, and which has been receiving harsh Yelp ratings itself. His task was to research which upcoming Handy customers were prolific Yelp reviewers so that good cleaners could be sent and bolster the company’s reputation. It’s ratings all the way down.

Despite all their complaints about ratings — and there were many — almost none of the workers I spoke with wanted them to go away. They wanted them improved in various ways: more transparency about what they were being rated down for, or an ability to protest ratings they felt were unfair, or better education for customers about what ratings meant.

Of course implementing these changes would require more work from someone, either from the customer, in the form of more detailed evaluations, or from the company, in the form of a large human resources department.

There’s also a more radical proposal: seizing the means of reputation. Sipp, the editor of Hack the Union, believes workers should be able to own their ratings and take them to other platforms. Not only can rating systems be implemented unfairly, she worries that they bolster the power of the companies that use them. For a worker who spends weeks or months building up a durable reputation on a particular platform, leaving for a competitor means starting from scratch. Ratings are one of the network effects that give these platforms their monopolistic tendencies.

"It puts a lot of power in hands of companies that own the data," Sipp said. "If I’m a person who wants to live my life as a driver in the on-demand economy, why shouldn’t I be able to transfer my reputation from Uber to Lyft? It takes away bargaining power from the worker and gives it to the platform, the boss."

Albert Wenger, a partner at Union Square Ventures, also sees portable reputations, through access to the platform’s API, as a partial solution to the power of platforms.

"We want these platforms. The question is how do we deal with the fact that network effects are very powerful, so many of them will become quasi monopolies. We have to make it so individuals can also be represented by code in their interactions," he said. "The problem right now is if you’re a driver and you feel mistreated by Uber, you can stop driving for Uber. There’s no sense in which drivers or passengers have any real power at all. The threat of people migrating en masse to different systems is only credible if people can participate in multiple systems at once."

These proposals are all highly theoretical at this point, and it’s unclear exactly what they’d look like. While some sort of universal reputation ID or API key would allow workers to switch between platforms more easily and give them the power to push for more forgiving systems, it could also make ratings more important and inescapable. In fact it sounds similar to Peeple, the "Yelp for people" that was greeted with universal horror last month.

In the near-term, it seems unlikely that companies like Uber, Handy, or UpWork would open up their APIs unless regulators forced them to — it would mean inviting competitors to bootstrap off their reputation databases. Attempts at creating third-party reputation repositories like Karma mostly just get used for unrated marketplaces like Craigslist. But as these companies grow to encompass more forms of work and larger pools of labor, the way they decide who can stay on the platform and who gets deactivated will become even more pressing, and something that any serious regulatory attempt or labor movement will need to address.

It’s also something customers need to think about, because they can’t help but be complicit in the system that’s emerging. They aren’t just customers anymore. They’re also terrifying bosses.

Dr Zoom

Monday, 29 February 2016

* Rating systems are too efficient a model not to spread

No comments:

Post a Comment

Followers

Total Pageviews

Translate