Yuma 4×4

Media and Communications

Big Data, Big Brother

Big Data, Big Brother


– I want to welcome you all to our March Hawkeye Lunch & Learn today, and want to remind you that this is a part of our 2017 theme semester,
where we’re focused on the Internet and technology, and if you want to know more
about a variety of events that are going on for our theme semester, please do stop at the table
as you’re leaving today, which is just to the back. It’s my pleasure today to introduce to you Dr. Zubair Sharif. Shafiq, I’m sorry, mispronounced his name. He’s here to speak to
us today about big data, and that’s a very important
topic on our university today, because we actually have an entire unit in my College of Public
Health on the fifth floor devoted to that entire
topic of informatics. So we’re very happy that he’s here today. He is an assistant professor
in computer science at the University of Iowa,
and he has done a great deal to look at the coordination of informatics across our campus. He graduated from the University
of Michigan for his PhD a very short time ago, 2014,
and his current research involves conducting
large-scale measurements to study security, privacy,
and performance aspects of the Internet, a very
important topic today. So please join me in welcoming Dr. Shafiq. (audience applauds) – Thank you for the introduction, Linda, and thank you all for
coming for this talk. The title of this talk
is Big Data, Big Brother, so hopefully most of you
have heard about big data and how it has the potential
to revolutionize our lives. The key idea behind
big data is to collect, combine, analyze data
from heterogeneous sources to mine insights and to drive efficiencies in our business systems,
so this is going to be big as we go on over the next decade. In this talk, I would highlight
the dark side of big data, how big data can be exploited
to undermine our privacy. This talk is about how
private companies collect data about us when we browse the
Internet, use it to infer our behaviors, our preferences,
and then how they monetize these inferred behaviors and preferences. And then in the second part
of my talk, I will talk about how Big Brother piggies
back on this information which is collected by private companies, and governments use this information to do surveillance on their citizens. So, before we go and talk
about most of these things, let me give you a quick
background about what happens when you try to browse any
website on the Internet. So let’s say you open
your favorite browser. Next week is spring
break, maybe you’re trying to book a hotel for your spring break, so you go to hotels.com
and you press enter. Now if you peek behind the
browser and try to figure out what actually is going on the background, so this is your laptop, for
example, and you’re trying to connect to hotels.com server, your browser will send out a
request to hotels.com server to download the webpage for the home page. So this request is called an HTTP request, HTTP is always included in the URL that you type in your web browser; HTTP is really just the standard
language which is spoken by web browsers and web servers. Hotels.com server receives this request from your web browser
and then it responds back with an HTML file. So this is really a text
file which contains a bunch of resources that need to
be subsequently requested by your browser to show
you the full webpage. So this full HTML text file
will contain references to images, to videos, and a
bunch of other information that is necessary to show you the whole webpage on hotels.com. So after downloading
this initial HTML file, your browser issues
multiple subsequent requests to download images, videos, style files, and a bunch of other script
files from hotels.com server. At least, that is what most
people think actually happens in the background, but you
if look a little more deeper, what you find out is your
browser not only sends requests to hotels.com server, it is
actually sending out requests to other websites like
comScore.com and DoubleClick.com. Most people have not heard of these names. In fact, your browser connects
to more than 50 servers other than hotels.com when
you try to go to hotels.com and open their webpage. These domains like
comScore and DoubleClick, they are called third-party domains because these are not the
domains you intended to visit when you type this in in your web browser. These third-party domains,
they are typically called trackers or data brokers. These companies are
primarily in the business of collecting information about you as you go from one website
to another on the Internet, and as the name suggests, this is what their business model is. I will call them third-party trackers for the rest of my talk. So if there is a third-party tracker which is on multiple websites,
they will precisely know as you go from one website to the other. So let’s say first, you go to hotels.com and then you go to eBay.com,
and comScore.com is present on both of these websites,
comScore would know that you first went to hotels.com and then you went to eBay.com. And sometimes, you can
actually figure this out. When you go to eBay.com,
you might see an ad which would remind you to
complete your hotel purchase on eBay.com, so this is very
common, so this is exactly how this is done using these
third-party trackers. So let me give you a little
more technical information about how these third-party
trackers track you across different websites. To track you across different websites, they need to assign
you unique identifiers. So think of these as
social security numbers, every person is assigned a unique number. So this is the way they
also want to track you, so they know precisely
that’s you who is going from one website to another. They can use one thing that
most of you, hopefully, have heard about, it’s
called IP addresses, to track you across different websites, but that is actually
not a good information to uniquely identify people. Because let’s say you
have a laptop and you go from a coffee shop to campus,
your IP address would change. So because IP address is going to change, IP addresses are not
good unique identifiers, although they are still very
powerful in narrowing down who was accessing a particular website. There are two techniques that they use to unique identify users. The first one which is commonly
used, it’s called cookies. These are not the sweet
cookies you just had; cookies actually are strings, so these are random numbers
which these websites ask your browsers to store
on your local machine. So when you go to hotels.com,
comScore will ask your browser to store a unique cookie, which
is just a unique long string which actually does not make much sense, and your browser stores it. Whenever you go to other websites and comScore is on those websites, your browser will automatically
append this cookie to comScore.com, so comScore
will know this is the person who recently visited hotels.com and is going to multiple other websites. Some of you who are more computer-savvy, they would know that cookies can be bad, because they can be used to track you, so in your browser, you have
an option to delete cookies, or not allow certain
websites to store cookies in your browser. So users actually have some
control over whether they want to restrict cookies
from different websites. Some of these third-party
trackers have started to use this new technology
which is called fingerprinting. Fingerprinting does not require any cooperation from browsers. So fingerprinting relies on the fact that every workstation is unique. You have unique software, you have some unique hardware variation
on every workstation. So they ask every browser
to do a standardized task, let’s say they want them to draw a circle, but because every workstation has different software/hardware
configurations, every workstation does the task
in a slightly different way. And because everyone is doing this task in a slightly different
way, this information about how the circle was exactly drawn can be used as a unique
identifier for people. Now the key point here is,
most people actually don’t know how this fingerprinting work,
and if you visit a website, there is no way for you to figure out a website is involved in fingerprinting. And these fingerprinting-based
unique identifiers are very powerful, they actually
can be much more accurate than cookies and more stable than cookies, because users don’t actually
don’t have any control of changing their fingerprint, unless you change the software
or hardware on your machine. So they are very powerful
ways of tracking users across different websites. So, let’s talk about how
many trackers are there. If you go to a typical
website, are there maybe a few trackers, are
there dozens of trackers? What’s the situation? It turns out, these
trackers are very common on news websites. If you go to, for example,
FoxNews.com or NYTimes.com, you will be tracked by dozens
of third-party trackers. So, they are plentiful
and they are everywhere. And news websites, probably,
okay, some people might think “They will figure out what
kind of news I am reading, “that’s probably not too bad.” But they are on some websites that you really start to think about, do they actually have any
business of being there? For example, if you go to WebMD.com, which is a website a lot of people go to to look up for symptoms, for example, there are about 17 trackers,
17 different companies tracking your information,
that you visited WebMD.com, and let’s say you were
searching for some symptoms of cancer or you had a lump
and you were just trying to look up your effects,
these 17 companies would know that you were actually doing that. So this is, hopefully, starting
to sound scary a little bit. (audience chuckles) Healthcare.gov. Probably won’t be here too long, but if you had a chance
to go to healthcare.gov to sign up for health insurance,
you would be surprised that there are six trackers tracking you. There are six private companies
which know which person is trying to sign up for
health care, so again, this is very odd. If you go to uiowa.edu, surprise surprise, there are seven companies
which are tracking you, and I looked up, actually, this week; there are companies like
DoubleClick, Facebook, Twitter, so University of Iowa
probably advertises to people when they want to, let’s
say, advertise some programs or a lecture like this. Obviously the university
is able to advertise, but then Google and Facebook and Twitter, they also get all this information, and then they can use this information with other information
that they have, then, to profile people, and I
will talk more about this in a little bit. But just to give you a quick sense of what are some of the big
trackers on the Internet. On this graph, on x-axis, these are the big third-party tracking
companies on the Internet, on y-axis is what percentage
of the Internet they cover. It’s very alarming to know that Google can actually track you
across more than 80% of top 1 million websites. So if you go to any of these websites, there’s a high likelihood
that Google is tracking you on those websites. These top trackers, they can
get a very precise picture of which websites you
are visiting over time. And there’s also a very
long tail of other trackers. They don’t cover big
swaths of the Internet but they actually combine
this data from each other. They cooperate with each other,
they exchange information, and by doing this information
sharing, they are able to actually get a much
clearer picture of users. So it’s not just the big
trackers we have to worry about, these small trackers are
actually also very powerful, they can also know a
lot about your behavior as you browse the Internet. Now, let’s ask a basic question:
why are they tracking you? And why are they everywhere? So let’s try to get a sense
of why tracking happens on the Internet. And this is what I call corporate
surveillance capitalism. Most of the Internet, most of
the websites that we visit, they are free, so how do you
think they actually make money? They make money off of advertising, that’s the business they are in. At surface, advertising
might seem like very simple, very innocuous, but let’s
roll back a little bit and get a historical
perspective of how advertising has evolved, and it
will give us more sense of why advertising is very
intricately intertwined with tracking on the Internet. Back in the day, good old newspapers, we used to have these things. Publishers wrote what they wanted and they left these empty
rectangles on newspapers. Advertisers could buy these
rectangles and they could decide what to put in these rectangles. Viewers would occasionally
look at these rectangles when you were reading a newspaper; maybe you like something,
you decided to buy something, so that’s how the
advertising business worked back in the day. Life was very simple. And then there were these ad
agencies that you would recruit to figure out what you
actually wanted to put in this ad, and there was a lot of effort that was put in into designing
very nice-looking ads that really conveyed the
message that you wanted to send to the public. This was primitive form of advertising; there was a big problem in this. The big problem in this primitive
form of print advertising was that half of the money
that you spend was wasted, and the big problem was
that you did not know which half was wasted. There was no way for
advertisers to figure out how effective their advertising was. Just to again recap, we
really have a simple trinity of three entities. We have advertisers, who
put their ads on publishers; publishers like newspapers;
and on the other side, we have viewers who buy these newspapers and look at these ads. So this is very simple
architecture of advertising. Advertising in newspapers
was actually a huge industry. It was a multi-billion dollar industry, and this peaked roughly around in 2000, there was $67 billion advertising revenue for print advertising in
newspapers, primarily. And then something happened. The revenues started to go
down at a very fast pace. What happened really was
companies like Google and Facebook emerged, and these companies were in the business of advertising. So in 2001, Google bought DoubleClick, which is the online advertising exchange, to show ads on their search engine. And then obviously, very quickly, Facebook also followed
suit, and then advertisers could also put ads on Facebook. In the last few years, Google’s
annual advertising revenue has exceeded $70 billion dollars as that of newspaper advertising
has gone really down. So this is the change that has happened in the last couple of decades. Online advertising. You don’t have those nice
advertisings anymore. Now you have these annoying,
flashy ads that you see on all websites. This has made advertising
very cheap, actually, so if you are actually
a very small business, you can advertise relatively cheaply and you can reach out a
large number of people. So that’s really the big
benefit of online advertising. But other than this big
benefit for advertisers, there were three big promises
of online advertising. The first one was that for the first time, an advertiser could actually
tell how many clicks happened on their advertisements. So you can actually tell
how effective was your ad, how many people actually
clicked on something, and they maybe went on to buy something. The second big advantage was
that you can target these ads. So not only you can target
your ads on specific websites, the decision to show
a person a specific ad is made in real time. So based on what kind of
information they have on you, they might decide to
show you different ads. Let’s say if you go to NYTimes.com and they think that you’re
conscious about your weight, they will in real-time figure this out and show you an ad to lose weight. If they know that you are
planning to buy a new cellphone, they will show you an ad for that purpose. So this is very precisely targeted now. And the third big advantage
of this whole thing was, this ecosystem became much more complex. In the middle of this ecosystem, we have a bunch of companies
which are called ad exchanges. So a publisher actually
does not have to worry about selling out their ad spaces
to different advertisers. Ad exchanges would do this for them. So if you have a website,
you don’t have to worry about contacting Nike to figure out if they want to advertise on your website. Companies like DoubleClick,
which is the ad exchange owned by Google, will do that for you. This actually was huge, and
obviously, companies started to make a lot of money. Everybody was making more money. Everyone looked on this
and they were very happy, until this happened. So there was a big problem
on the Internet in 2000. There was this problem
that advertising companies were facing, it was called click fraud. So if someone clicks on your
ad, you cannot actually tell whether they were humans
or they were bots. There were spammers and hackers who would actually make bogus websites and they would ask Google to
put ads on their websites, and then they would actually
write computer programs to click on those ads. So as those programs click on
ads, people would make money, and this actually was a huge business, and still actually is, to some extent, a huge business in many
developing countries. So companies wrote some
programs to try to eliminate this click fraud, so they
figured out what are bots and how can we separate
them from real users, but then these spammers and hackers, they stepped up their game. They actually hired actual
people in developing countries who would wake up, they would
go to a computer center, and they would open a bunch of websites and they would start clicking. These were real people,
so it was very hard to distinguish between these
people and actual people who were clicking on your ads. So really, the big challenge
for advertisers and publishers was how to figure out if
these clicks were real. And the response was this. They simply increased the
complexity of this ecosystem. The key idea was that if
we can collect more data about people, we can actually
even try to figure out whether this person is
actually part of a click farm. So the idea was, if you
get more information, we can use that information
to more accurately distinguish between real people and fake people. This prompted data wars
between different companies, all these companies now which we have in the simple trinity of advertising. And all these companies
are in the business of collecting information on consumers. They sell this information to advertisers, to ad exchanges, to drive up efficiency. So the goal of all this data tracking is to reduce click fraud and to do more precise, targeted advertising. Google was the big player initially in the online advertising ecosystem. They had a bunch of data
from their search engine and their email client, Gmail. Then Facebook came along,
and Facebook, obviously, started getting really popular; now more than 1.5 billion people around the world use Facebook. All these companies in this
ecosystem, including publishers, they started to collaborate
with each other, so all this data is being
shared with each other, and these companies are
building more and more accurate profiles of
people, what do they do, what are they interested in. Now, the situation is vastly different. So this is a redo of the
famous New Yorker article, where now, these companies
actually precisely know who you are. They know that if you are a dog, they can actually figure that out. It’s not really a big deal. So let me give you a quick
example of one of the companies which is in this tracking business, and this company, hopefully
most of you have heard about, its name is Experian. They are in the credit business as well, and they combine information
from multiple offline and online sources; they also
are in the tracking business so they do track you across the Internet. The information I’m about to show you is from their public brochures, and they brag about
what kind of information they have on people in the United States in this particular example. So, they know a lot about us. They claim to have information
about 299 million people in the United States alone. More than 100 million
households, and they claim to have hundreds of data
points on every person. So they know information
like our age, our education, our gender, our income, our occupation, and whether we have kids. And then, if you really dig
deeper, they also know a lot. US is a country of
immigrants, so they know what is your ethnicity, they
know your country of origin, they know if you speak
a particular language. And all this information
is available to us, and you can only imagine
how this information can be misused in certain
contexts, and I will give you some more examples later. One interesting thing that they
do is they divide all of us into different segments. Every person is usually part of hundreds of different segments,
and this is one segment that they actually advertise. So it turns out, a lot of
spending is driven by moms in our homes. So they actually have
multiple categories for moms. There are soccer moms,
there are couponing moms. There are moms with
one kid, there are moms which are more laid-back,
which are more outdoorsy. So again, all this information
is collected for us. And one of the things I
want to point out here is, you actually don’t know whether this segmentation is accurate. Sometimes, they can identify
you as part of a segment and that might be inaccurate,
but there is no way for you to actually tell them, “Hey, you have categorized
me incorrectly,” so you might be seeing some
nasty ads on the Internet, and it might just be their
algorithm working incorrectly. And there’s no way to actually figure out how much they know about us,
and what kind of segments they have actually divided us into. They know our little secrets. They know very small details about us. For example, they know if you
like to indulge in fast food. This is very easy for them to figure out. They know whether you play lotteries, or whether you have a
certain type of insurance, or they know whether you are
conscious about your weight. This is really powerful
stuff, so it’s hard to imagine there are thousands of companies out there who know all this
information about everyone in the US, and also around the world. So we don’t know what they know and we don’t have any control
over this information. So, let me first give
you some horror stories that have actually happened. So what could possibly go wrong? Yep? – [Audience Member] I’m sorry,
could you tell me the company which you were talking about? – Experian. E-x-p-e-r-i-a-n. I can spend the whole day talking about some of the horror
stories that have happened, but I would like to highlight
some of the big ones which I think have broader impacts, other than exactly what just happened, and obviously are more relevant
in the current context. So, the first one. Collecting information about ethnicity, about country of origin,
about languages people speak is not itself actually
illegal, or even unethical. But if you use this information
in specific contexts, it’s actually unlawful. For example, in the US, we
have a Fair Housing Act, so you cannot discriminate between people on the basis of their race,
of their country of origin, or the languages they speak. So Facebook was actually
letting advertisers who were advertising for
different properties to rent out on the basis of their
race or their ethnicities, and this turns out was actually illegal under this Fair Housing
Act that we have in the US. Then, obviously, Facebook
thankfully pulled off this information from advertisers, so now they do actually scan and make sure that certain types of ads
are not being targeted. Another way to think about
targeting is discrimination, so you cannot discriminate across people when you are trying to advertise for some of these kind of things which are protected by the Constitution. Other thing is credit agencies. If you are advertising for credit, we have this Consumer
Credit Protection Act which bars companies from discriminating based, again, on race, sexual orientation, ethnicity, and so on. So advertisers right now in the
online advertising industry, this is just one example for
Facebook, but no one knows what else is going on in other
types of online advertising. And advertisers are allowed
to “target,” or discriminate, against people on the basis
of these protected classes, like race, religion, sexual orientation, and ethnicities and so on. So this is really bad. Another example I would
like to point out is there have been some talks recently about creating a registry of all Muslims in the United States. Some people think, actually,
this would be very hard for the government to do,
because the government agencies have to go out, they have
to collect this information, they have to compile this information. Maybe you have to, next
time in the US census, it will be unprecedented, ask
people about their religion, which actually has not
happened in a long time. But the important thing to note here is the government actually
does not have to collect any information, this
information is already out there. Recently, there have
been a lot of hate crimes against minorities: there
have been hate crimes against Jews, against Muslims,
recently against people from India, against Sikhs
very recently as well. And all this data about
different religious minorities in the US is actually available from many third-party data brokers. Amnesty International
recently did a sting operation where they actually contacted
one of the companies called ExactData.com. You can go to their website actually, this is right up there, all of you can go and do this right now, and
you can go and actually select different categories, and
within four to five clicks, you can get a quote of what
would it take to get information about all Muslims in the United States. Turns out you can buy this information, and this information will
contain hundreds of data points about every Muslim,
including their location, what kind of business they are in, and again, other things
that I’ve talked about, for less than $150,000. For example, for smaller minorities, this information is much cheaper. So you can get the list of
all Sikhs in the United States for less than $15,000. So this is really scary,
and it’s not hard to imagine other potential applications of this. Amnesty International
contacted a bunch of companies who are third-party trackers
to give them information about all undocumented
immigrants in California, and several companies had this information and they were willing to
share this information, and they were very
confident about the accuracy of their information. Using this kind of information,
the government can also try to maintain a registry of
all people who own guns in the United States,
and some people think that is against the Second
Amendment that we have in the US Constitution. So there are serious implications of private companies
having this information and when this kind of
information is not regulated, and very recently in the
2016 presidential election, this technology was also used to do micro-targeted election campaigning. This was actually not new,
Obama campaign used this in 2012, but most recently,
this is a picture of a company called Cambridge Analytica,
they are based in UK, and they were working
with the Cruz campaign and the Trump campaign to do
targeted advertising on people. So the key idea of
micro-targeting election campaigns based on the data which
these companies have is that you actually send out a
different version of a message to different people. And you can think that
this is actually very bad for our democracy,
because people are hearing different things, so they
cannot agree on the facts. This really has the potential
to divide our society and make everything more polarized. So what these micro-targeting
election campaigns companies like Cambridge
Analytica claim to do is that “We are going to
drive up efficiencies.” So you are going to be spending
less dollars on advertising and you will be reaching more consumers. And in the picture which
is shown in the background, this company has information
about all voters in Iowa. So this is the map of Iowa
and they precisely have the location of all Democratic
and Republican voters. Again, you can target a specific message for people who are Democrats
but they feel strongly about gun ownership, for example. So you can slice and dice
people into different segments and show them different messages. So as I pointed out, this
is bad for our democracy because people would like
to have a candidate say the same thing to everyone all the time. So this is only going
to make our democracy more contentious, more polarized. This kind of technology is
not only used in the US, it has been used in referendums in the UK, and this kind of technology will be used going forward as well. With almost unlimited money flowing into our elections, again,
you can only imagine what other possible applications could be. So, in the second part of my talk, I would like to switch gears and focus more on the government. So not on private companies,
but how governments actually piggyback on this data, which is collected by companies,
to spy on their citizens. In the US, the Fourth Amendment,
actually at a high level, defines the basic right to privacy. It means that government
cannot spy on their citizens without a court order, so
unless they have due suspicion that someone has done something wrong. Fourth Amendment was ratified in 1792. Again, this was back in the day, modern forms of communication
were not prevalent at that point in time, so it’s very hard to take the Fourth Amendment
and try to apply it to internet systems. After the Watergate scandal in 1978, a new act was passed,
which was called FISA Act, Foreign Intelligence Surveillance Act, and this was the first
act which actually tried to translate some of the protections which were in Fourth Amendment and apply it to modern
communication technologies like telephone companies and the Internet, and the key provision
in this new FISA Act was that the government cannot do
surveillance on their citizens without an explicit court order, and if there was a
reason, that court order had to remain anonymous or that
order has to remain secret; there were secret FISA courts. So the barrier for the government was low, but still you had to
go in front of a judge and ask for a court order. However, after 9/11, a new act was passed which was called the Patriot
Act, and the Patriot Act significantly reduced the
burden on the government to go to the court and ask
for these court orders. Under the Patriot Act, the NSA, the National Security
Agency, secretly started a bulk data-collection program
where they were collecting telephone and internet data
about citizens in the US and also citizens abroad. This program was
borderline unconstitutional and there were many people inside the NSA who were concerned about the
legality of this program. Under this program, the
government could actually ask internet companies and telephone companies to provide them this information, and the key thing that was
enabled by the Patriot Act was these companies were required by law not to disclose this information publicly. So they had these so-called gag orders which stopped these
companies from disclosing that they were actually
providing information to the government. All of this was done in secret, until this happened. In 2013, Ed Snowden,
who was hired by the NSA as a contractor, leaked
classified information which actually revealed the
magnitude of the surveillance which was going on. He revealed thousands
of classified documents to reputed journalists, who later on did a lot of investigation on this, but the immediate response
from the government was that, “We are not looking
at your conversations, “we are not reading your emails, “we are not listening to your phone calls. “We are only collecting metadata.” And as we just discussed,
cookies are metadata. Fingerprints are metadata. So they don’t have to read your emails to precisely know what you are doing. If you combine metadata
from multiple sources, all of this can be combined to
have a very powerful picture of what people are doing. So, some of the revelations,
this was a big one. This was the infamous PRISM program. Under this program, the
NSA had direct access to major internet
companies, going as far back as 2007, so they had
Microsoft, and then later on they had Google and
Facebook, and more recently Apple as well, and when
this was leaked out in 2013, there was a lot of public backlash for these internet
companies, and the companies actually started to fight
back the government. The companies tried to add protection and they wanted to make
sure this consumer data is protected, because if
there is a public perception that the US government gets
everything that you upload on Facebook, people will
stop using Facebook. So there was some resistance
after these public revelations. But there were other programs. This was another program called MUSCULAR, and with this program, the NSA was tapping onto the big internet cables which were going across continents. So these undersea cables and other cables, they see a majority of
the internet traffic that flows on the Internet. The NSA, again, this did
not require any cooperation from internet companies,
was collecting all this data from these undersea cable,
underground internet cables, and they would simply
get all the information that they needed from
this kind of program. So these taps, and you
can see them on the map, were located primarily outside the US so they did not have to
follow the US Constitution, to some extent. And then very recently, two years after Ed Snowden’s revelations,
there were finally some change, a new law was passed which was
called the USA Freedom Act, which, again, tried to restore
some of the constraints that the government had to actually have to do surveillance on people. So this has slowed down but
this has not completely stopped. And while there is resistance
to this kind of surveillance in the US, there is very little resistance in many other countries
which are not as democratic as the US. For example, Chinese
government is actually planning to launch a very ambitious
project, where they’re planning to assign a social credit
score to every citizen. China already has a very strong and very comprehensive censorship program; they monitor all the traffic
which goes in and out of China, this is called
the Great Firewall of China. The goal is to use all that information to then assign citizens credit score. Something like a social credit score. For example, if you are involved
in some unwanted activities your score would go down,
and the implications could be your kids would not be allowed
to go to a particular school, you would not be allowed
to stay at a particular set of hotels, and so on. So again, this kind of
government surveillance, based on this data which is
collected by internet companies can be very powerful, can have
really negative repercussions in other countries. This is what I call an ideal
marriage between corporate and government surveillance. The government actually does
not have to do anything. The NSA and CIA, they probably
love Google and Facebook, because people are voluntarily uploading all this information about
minor details of their life, so they don’t have to expend any resources to collect this information themselves, and the broader sentiment in
the intelligence community is that if people are okay
with giving up this information to Facebook and Google so that
they can use these services for free, then people probably are okay with giving up this information to shore up national security, to stop terrorist attacks in the country. So, this kind of mindset
really undermines our privacy and this kind of logic can be extended, and then the government will be looking at all actions of their citizens. So this can be really bad. The broader question here
is, do people really care about our privacy? Do people like us, do
people who live in the US, they care about their privacy? And it turns out they do. There was this survey done by Pew Research where more than 90% of people
said that they really care about what kind of information
is collected about them and who is collecting this information. However, more than 90% of people also said that there is actually
a feeling of giving up. They have no idea how to
control this information. So companies and regulators know that people do care about their privacy, but there’s really not much
that it seems, on the surface, that you can do as a common citizen. So let’s look at how we can
try to change some of this. Obviously, you can be
involved in activism, so this is a picture of a balloon which was flown out by Greenpeace and the Electronic Frontier Foundation. This was done of 4th of July,
after Snowden revelations, and the balloon actually was
saying “illegal spying below,” and this was flown above
a NSA data center in Utah which was collecting information
and storing information about US citizens. Again, I would encourage all of you to become part of these organizations, and next time when you try
to think about donating some of your money, do
consider organizations like Electronic Frontier Foundation, ACLU, and Amnesty International. Other than this, how can we bring change? Obviously, we can pass new laws. We have the new Congress. So do we have any hope of
actually passing new regulations to protect our privacy? And there are some government agencies, like the Federal Trade Commission, Federal Communications Commission, which control and regulate some
of these internet companies. I will talk a bit about that
and I will also briefly talk about some of the
technical countermeasures you can do as an individual
user to protect your privacy. There was, in 2009,
after Google was really getting really big and
Facebook also went mainstream, there was a push from the
Federal Trade Commission asking online advertisers and
companies to self-regulate. These companies were
making billions of dollars, they were creating thousands of jobs, so government was really reluctant to regulate these companies. These companies, in 2011,
came up with this program called AdChoices, so whenever
you see an ad on the Internet, you have this blue thing on the top right, and you can click here and the companies would actually tell you why
you are seeing a particular ad. But again, this kind of program
does not give you any option to opt out of tracking. You cannot stop these companies from collecting information about you. There was this other
program called Do Not Track, which was supposed to
be a voluntary program. So if you have this Do Not Track setting enabled in your web browsers, companies would voluntarily not track you. Again, this is too good to
be true; nothing came of it. (audience chuckles) And then finally, the FCC
was one of the agencies which, right before the
new administration passed some regulations called
Broadband Privacy Regulations, which stopped internet service providers like Mediacom and CenturyLink
to collect information about their consumers and
sell it to make profit off of advertising. Last week, actually,
the new FCC commissioner actually rolled back all
of these regulations. So, my point is, there’s very little hope that any regulation or government agencies would do something, at
least in the near future, to prioritize your
privacy over the profits these companies are making. So, as a consumer, you
really only have one option, and when you use a web browser, you should be using
privacy-enhancing tools. These are very easy to install. One of the popular tools
is called Privacy Badger, which was developed by
Electronic Frontier Foundation. There was another tool called Ghostery. What these tools do is, as
you go to different websites, they block all the third-party trackers that are on these websites, so companies cannot collect this information. And then there are tools like ad blockers. These tools go way ahead. They say that, okay, there
are all these trackers, but why are they tracking you everywhere? They want to show you ads. So how about we get rid of
ads as well as trackers? So that’s exactly what they do. So if you install an ad blocker, most of the ad blockers are open source, they use publicly available lists so you exactly know what kind
of things they are blocking. You can not only get
rid of all the trackers, but you can also get rid of all the ads as you go on the Internet. So, just a quick show of hands, how many in the room
actually use an ad blocker or heard about an ad blocker? Okay, not as high as I
would have hoped for. But the number of users
who are using ad blocker is exponentially increasing. There was a recent estimate that more than 600 million people around the
world now use an ad blocker when they browse the Internet. There was a recent study done by comScore which showed that around
20% of users in the US use an ad blocker. In other places, for example in the EU, the percentage of users using
ad blockers is much higher, because they are in general
more privacy-conscious than us. So, the online advertising industry sees these ad-blocking
and tracker-blocking tools as a threat to their business model. If you’re not going to see
any ads while using Google or Facebook, how are they
going to make money off of you? This really breaks their
advertising ecosystem. So what these companies
have started to do, they have started to
detect your ad blockers. They now use these anti-ad blockers. So if you go to a website and
you’re using an ad blocker, they will figure that out
and they will stop you, and they will ask you to
disable your ad blockers so they can track you and
they can show you advertising, so they can make money off of you. Many popular publishers, for
example the Washington Post, Wired, Forbes, have recently started to use some of these
techniques, and these attempts to undermine ad blockers can
mean a return to the status quo to widespread tracking. So, we really need to develop effective and long-lasting countermeasures
to make good ad blockers which can protect your privacy. So in our lab, we are working
on a stealthy ad blocker. So our lab and a bunch
of other researchers around the world, we are
trying to improve ad blockers, so we are specifically working
on a stealthy ad blocker. These ad blockers are
anti-anti-ad blockers, (audience chuckles) so we are stepping up
the game a little bit, and we are targeting the
strategies using which these websites figure out that
you have an ad blocker on. So we want to make sure you can choose to protect your privacy when
you browse the Internet. I would like to conclude
here with a big picture. The big picture really here is big data, and the key term here is data. the Internet is fueled
by data, by advertising which relies on this data
which is collected on us. So the bigger question
that we need to ask is how can we sustainably
maximize the contribution of big data, to our
economy, to our society, to individual lives? And right now, the order
seems to be preferring prioritizing economy,
and then we have society, and at the very end we have individuals, and we really need to
think hard about this. Is this the order we want, and we believe that this order should actually be changed and it should be the individuals
who should be put first in this system. So we are doing research
to put users in control of their privacy, so you
should be able to control what kind of information
is collected about you as you browse the Internet. So please use an ad blocker; an ad blocker or a tracker
blocker puts you in control, and you can choose when it is okay to leak some of your
information, and you can decide not to leak that information. So our research is aiming
to develop more effective and more robust privacy-enhancing tools, and just to conclude, I
have a call of action. All of you who are not
using an ad blocker, please use an ad blocker. Because if you use an ad
blocker, then it actually reduces the marginal benefit for these companies. Imagine if more than 70% to 80% of people start to use ad blockers,
these companies have no choice but to change their course of action, and some of these things have
recently started to happen. So we really need to show
them that we as consumers have a voice, and we
are privacy-conscious. So all of you who installed an ad blocker, you are telling these companies that you are privacy-conscious, and unless these companies change the way, at least, they’re operating right now, they try to give you more control over what kind of information
they are collecting on you, you will not give up your information, you will not let these companies
monetize your information. With that, I will thank all
of you for coming today. Thank you. (audience applauds) Time for some questions. – [Audience Member] You showed a list of all private entities,
Facebook and Google, et cetera, et cetera, and then you brought in the government. Are these all sources for
the government to link in and get to know all information about you? Can the government actually
penetrate these other entities? – So I showed you two
programs that the NSA had. The first program was that government was directly getting information
from these companies, so it’s not actually hard for them. But even if these companies don’t comply, the US government controls
all the infrastructure. So the NSA and other spy
agencies around the world, they can tap into internet cables and they can collect all
the traffic off the wire. And all this traffic contains
metadata, like cookies and fingerprints, and
using this information, they can figure out
what everyone is doing. So NSA actually has a search
engine, which is called XKeyscore, it’s like
Google but they can search, “Give me all the people in
Germany who spoke Arabic,” and it would pop out all the information, again, using information like cookies which are considered
metadata by the government. So the government does not need
to work with these companies to get this information. So the point is, as soon
as your information leaks, in terms of cookies and fingerprints, as you browse the
Internet, companies can use this information, and then it’s very easy for governments to get a hand
on this kind of information. – [Audience Member] What are your thoughts on net neutrality? – At least in the next administration, there is no hope for net neutrality. There are a lot of steps
which have already been taken to undermine net neutrality. For example, I was telling
you that the new commissioner of the FCC, Ajit Pai, who is a very pro-free market person,
he recently, actually, has rolled back these privacy regulations. Now, not only internet companies
like Google and Facebook but your internet service
providers like Comcast can actually collect all
the information about you and sell it to advertisers,
so that’s really against the essence of net neutrality. Unfortunately, most of us, for example in Iowa City/Coralville area,
don’t really have a choice of internet service provider. So if Mediacom chooses to do so, I really don’t have
CenturyLink servicing my area. So this really undermines
competition, gives users no choice, and as these
net neutrality regulations are going to be rolled back,
your information will be exposed to more players
to make money off of you in the online advertising ecosystem. – [Audience Member] Is there any hope for dealing successfully
with fake information, namely fake news? – Yes. A lot of fake news
websites which popped up, and there has been a lot of
reporting so I will not go into the detail of what they were doing, but many of these fake news websites, their incentive to
actually spread fake news was to earn money. So there were hackers
is Romania, in Russia, who were creating these fake websites and they were putting
ads on their websites. Now, the key thing for
them was to drive traffic to their websites, how can
they get people clicking on their websites? So what they did was
they would create stories which would appeal very
much to a particular segment of the society, and then
they would spread this kind of information on Facebook,
and then some people would choose to believe it, unfortunately, and then start sharing it,
and then millions of people would be reaching their websites and they would make thousands of dollars. So really, the thing I
want to point out here is the core of fake news
problem is also advertising, to some extent. So if we can get rid of
this widespread tracking and the incentive for
profit in advertising, we can try to reduce that
problem to some extent. But again, it’s a very complex problem, I’ll be happy to talk
more with you on this. – [Audience Member] In light
of all the blocking ads and whatnot, what role do
even VPNs or other ways of masking data traffic, how
do they play into big data? – VPNs protect you from your
internet service provider or any other entities which
are between you and the website that you are trying to visit. So if you use a VPN, Mediacom
cannot, even if they collect all of your traffic, they will
be seeing encrypted traffic, they cannot monetize the traffic. So VPNs do provide a benefit. But then again, the website
and all the trackers on those websites, they
are still tracking you; VPNs don’t stop that. So you need to use all
of these tools together, unfortunately there is no single program that you can install and you can be safe. VPNs and the thing that I am
trying to advertise today, ironically, ad blockers;
I’m not advertising them, I’m trying to promote them. So please install an ad blocker, and use VPNs, use as many as
these privacy-enhancing tools you can to protect your privacy. – [Audience Member] You
mentioned earlier that uiowa.edu, there were seven third parties. Now, do those third
parties pay the university? – University pays them. That’s ironic. That’s actually mind-boggling. All these websites, they
voluntarily put these trackers on their websites, and
let me give you a gist of why they actually would do that. The promise here is, if Google’s
tracker is on uiowa.edu, then uiowa.edu can actually pay Google to advertise to people on the Internet and they can specifically
target who are the people who came to uiowa.edu and were looking at different graduate programs. So this obviously makes
some sense for uiowa.edu, but the key problem here is,
when this information goes from uiowa.edu to Google,
uiowa.edu does not have any control to take that information back, and Google is free to
use all this information with info they are
collecting on more than 80% of top one million
websites on the Internet. – [Audience Member] How
do you know how many third-party tracking tools
are on a given website? Is there a way for us to tell? – If you install ad blocker or Ghostery, they will actually show you a number. So as they are blocking
these tracker and ads, you will get a satisfaction and you will see these numbers going up. (audience chuckles) – [Audience Member] There’s
this cat-and-mouse game, cookies and fingerprints
and as you mentioned, VPNs. Why doesn’t Google just put
some code in their browser so they know where I’m going? Is there something that
prevents them from doing that? – That’s an excellent question. And yes, if everyone was using Chrome, that would be the case. There is a competition here
between different browsers, and many of these browsers
are now open source. So if Google does decide to put a tracker right in their web browser
that everyone is using, then people will stop using that browser. And thankfully, we
still have some options. Mozilla Firefox is a browser
which is by a non-profit entity Mozilla Foundation, which
is a very good alternative, even if you think Internet Explorer sucks. (audience titters) – [Audience Member] As
the somewhat recent rise of social consumerism, do you see that as moving into this realm
of a viable strategy to combat the big data question? – Social consumerism, that’s
a very loaded term, (chuckles) so I’m trying to unravel
what exactly do you mean by– – [Audience Member]
Just to clarify for you, I’m thinking in terms of where
we start preferring sites that either have less
trackers or no trackers, or even situations where a
certain company could then change their revenue streams
by monetizing not tracking. – I guess the key here is, if users show intent and users show, for example by using an
ad blocker, to websites that you are not okay with tracking, these companies would try to
adjust their business models, and this would change the status quo and hopefully the status
quo will actually change for the good, and as companies realize they cannot do tracking,
there are actually some new initiatives
which have been going on which are non-intrusive advertising, or advertising which actually
does not rely on tracking and only on context. There are some of these
advertising exchanges which are coming up, but again, these exchanges would only be successful if more and more of us
actually use an ad blocker and force companies to
change their behavior, and hopefully, as a
consumer, the only way for us to show intent to these companies is by showing that we
care about our privacy. – [Audience Member] I
don’t want to open up an even bigger can of worms,
but the kind of volume that we do now, if you could
have a post-big data world, how would a revenue model for internet web browsing even work? – There are many alternate models, and people who are working
in marketing and business, there is a lot of
research on that already. There is one interesting business model, this is a browser which
is actually supported by the Mozilla Foundation. It’s called Clicks, they
have an ad blocker built in. And one of the revenue models
they are thinking about is as a consumer, you value your privacy, you are, let’s say, okay
with paying $5 a month for your web browsing
and you want to ensure that no one tracks you, so
this money will be stored in your web browser, and as
you visit different websites, your web browser would
actually pay proportionally to different websites what they deserve based on how much time you
spend on these websites. So these anonymous
micro-payments is one option that people are thinking
about, and there could be other exciting new
paradigms that can come up. So let’s not stop
ourselves from installing these ad blockers simply based on the fact that some people will
say that you are going to break the Internet, all
these websites would go away and then there won’t be any Facebook. Trust me, they will still be there, they will find other ways
to monetize their services. – [Audience Member] So,
a lot of this was based on the US government but there’s a lot of just-developing governments. Do the various government entities talk about how they’re tracking each other, or is there a governing body or conference for lack of a better term that they talk about– – They don’t. There was a recent leak about
the CIA, so governments have their own in-house hacking
surveillance operations. Governments collaborate with each other, so for example, the NSA has
a very good collaboration with spy agencies in the UK
and some of the spy agencies in Europe, so they do
collaborate with each other. So if it is illegal for the US government to spy on their citizens,
sometimes they would ask the UK counterpart to do it for them. And then they would later on
exchange this information, it goes multiple ways. So that’s some collaboration
which has been documented by journalists, but other than that, no. They really on black-hat marketplaces, underground marketplaces, where
they secretly exchange tools and security flaws and
devices and internet systems. – Okay, thank you all for your questions and thank you so much Dr. Shafiq. – Thank you.
(audience applauds)

Leave comment

Your email address will not be published. Required fields are marked with *.