Surprise surprise, Xiaomi web browser and music player are sending data about you to China

When he looked around the Web on the device’s default Xiaomi browser, it recorded all the websites he visited, including search engine queries whether with Google or the privacy-focused DuckDuckGo, and every item viewed on a news feed feature of the Xiaomi software. That tracking appeared to be happening even if he used the supposedly private “incognito” mode.

The device was also recording what folders he opened and to which screens he swiped, including the status bar and the settings page. All of the data was being packaged up and sent to remote servers in Singapore and Russia, though the Web domains they hosted were registered in Beijing.

Meanwhile, at Forbes’ request, cybersecurity researcher Andrew Tierney investigated further. He also found browsers shipped by Xiaomi on Google Play—Mi Browser Pro and the Mint Browser—were collecting the same data. Together, they have more than 15 million downloads, according to Google Play statistics.

[…]

And there appear to be issues with how Xiaomi is transferring the data to its servers. Though the Chinese company claimed the data was being encrypted when transferred in an attempt to protect user privacy, Cirlig found he was able to quickly see just what was being taken from his device by decoding a chunk of information that was hidden with a form of easily crackable encoding, known as base64. It took Cirlig just a few seconds to change the garbled data into readable chunks of information.

“My main concern for privacy is that the data sent to their servers can be very easily correlated with a specific user,” warned Cirlig.

[…]

But, as pointed out by Cirlig and Tierney, it wasn’t just the website or Web search that was sent to the server. Xiaomi was also collecting data about the phone, including unique numbers for identifying the specific device and Android version. Cirlig said such “metadata” could “easily be correlated with an actual human behind the screen.”

Xiaomi’s spokesperson also denied that browsing data was being recorded under incognito mode. Both Cirlig and Tierney, however, found in their independent tests that their web habits were sent off to remote servers regardless of what mode the browser was set to, providing both photos and videos as proof.

[…]

Both Cirlig and Tierney said Xiaomi’s behavior was more invasive than other browsers like Google Chrome or Apple Safari. “It’s a lot worse than any of the mainstream browsers I have seen,” Tierney said. “Many of them take analytics, but it’s about usage and crashing. Taking browser behavior, including URLs, without explicit consent and in private browsing mode, is about as bad as it gets.”

[…]

Cirlig also suspected that his app use was being monitored by Xiaomi, as every time he opened an app, a chunk of information would be sent to a remote server. Another researcher who’d tested Xiaomi devices, though was under an NDA to discuss the matter openly, said he’d seen the manufacturer’s phone collect such data. Xiaomi didn’t respond to questions on that issue.

[…]

Late in his research, Cirlig also discovered that Xiaomi’s music player app on his phone was collecting information on his listening habits: what songs were played and when.

Source: Exclusive: Warning Over Chinese Mobile Giant Xiaomi Recording Millions Of People’s ‘Private’ Web And Phone Use

It’s a bit of a puff piece, as American software also records all this data and sends it home. The article also seems to suggest that the whole phone is always sending data home, but only really talks about the browser and a music player app. So yes, you should have installed Firefox and used that as a browser as soon as you got the phone, but that goes for any phone that comes with Safari or Chrome as a browser too. A bit of anti Chinese storm in a teacup

Australian contact-tracing app leaks telling info and increases chances of third-party tracking, say security folks. That’s OK says maker, you download worse stuff as games.

The design of Australia’s COVIDSafe contact-tracing app creates some unintended surveillance opportunities, according to a group of four security pros who unpacked its .APK file.

Penned by independent security researcher Chris Culnane, University of Melbourne tutor, cryptography researcher and masters student Eleanor McMurtry, developer Robert Merkel and Australian National University associate professor and Thinking Security CEO Vanessa Teague and posted to GitHub, the analysis notes three concerning design choices.

The first-addressed is the decision to change UniqueIDs – the identifier the app shares with other users – once every two hours and for devices to only accept a new UniqueID if the app is running. The four researchers say this will make it possible for the government to understand if users are running the app.

“This means that a person who chooses to download the app, but prefers to turn it off at certain times of the day, is informing the Data Store of this choice,” they write.

The authors also suggest that persisting with a UniqueID for two hours “greatly increases the opportunities for third-party tracking.”

“The difference between 15 minutes’ and two hours’ worth of tracking opportunities is substantial. Suppose for example that the person has a home tracking device such as a Google home mini or Amazon Alexa, or even a cheap Bluetooth-enabled IoT device, which records the person’s UniqueID at home before they leave. Then consider that if the person goes to a shopping mall or other public space, every device that cooperates with their home device can share the information about where they went.”

The analysis also notes that “It is not true that all the data shared and stored by COVIDSafe is encrypted. It shares the phone’s exact model in plaintext with other users, who store it alongside the corresponding Unique ID.”

That’s worrisome as:

“The exact phone model of a person’s contacts could be extremely revealing information. Suppose for example that a person wishes to understand whether another person whose phone they have access to has visited some particular mutual acquaintance. The controlling person could read the (plaintext) logs of COVIDSafe and detect whether the phone models matched their hypothesis. This becomes even easier if there are multiple people at the same meeting. This sort of group re-identification could be possible in any situation in which one person had control over another’s phone. Although not very useful for suggesting a particular identity, it would be very valuable in confirming or refuting a theory of having met with a particular person.”

The authors also worry that the app shares all UniqueIDs when users choose to report a positive COVID-19 test.

“COVIDSafe does not give them the option of deleting or omitting some IDs before upload,” they write. “This means that users consent to an all-or-nothing communication to the authorities about their contacts. We do not see why this was necessary. If they wish to help defeat COVID-19 by notifying strangers in a train or supermarket that they may be at risk, then they also need to share with government a detailed picture of their day’s close contacts with family and friends, unless they have remembered to stop the app at those times.”

The analysis also calls out some instances of UniqueIDs persisting for up to eight hours, for unknown reasons.

The authors conclude the app is not an immediate danger to users. But they do say it presents “serious privacy problems if we consider the central authority to be an adversary.”

None of which seems to be bothering Australians, who have downloaded it more than two million times in 48 hours and blown away adoption expectations.

Atlassian co-founder Mike Cannon-Brookes may well have helped things along, by suggestingit’s time to “turn the … angry mob mode off. He also offered the following advice:

When asked by non technical people “Should I install this app? Is my data / privacy safe? Is it true it doesn’t track my location?” – say “Yes” and help them understand. Fight the misinformation. Remind them how little time they think before they download dozens of free, adware crap games that are likely far worse for their data & privacy than this ever would be!

Source: Australian contact-tracing app leaks telling info and increases chances of third-party tracking, say security folks • The Register

Why should the UK pensions watchdog be able to spy on your internet activities? Same reason as the Environment Agency and more than 50 more

It has been called the “most extreme surveillance in the history of Western democracy.” It has not once but twice been found to be illegal. It sparked the largest ever protest of senior lawyers who called it “not fit for purpose.”

And now the UK’s Investigatory Powers Act of 2016 – better known as the Snooper’s Charter – is set to expand to allow government agencies you may never have heard of to trawl through your web histories, emails, or mobile phone records.

In a memorandum [PDF] first spotted by The Guardian, the British government is asking that five more public authorities be added to the list of bodies that can access data scooped up under the nation’s mass-surveillance laws: the Civil Nuclear Constabulary, the Environment Agency, the Insolvency Service, the UK National Authority for Counter Eavesdropping (UKNACE), and the Pensions Regulator.

The memo explains why each should be given the extraordinary powers, in general and specifically. In general, the five agencies “are increasingly unable to rely on local police forces to investigate crimes on their behalf,” and so should be given direct access to the data pipe itself.

Five Whys

The Civil Nuclear Constabulary (CNC) is a special armed police force that does security at the UK’s nuclear sites and when nuclear materials are being moved. It should be given access even though “the current threat to nuclear sites in the UK is assessed as low” because “it can also be difficult to accurately assess risk without the full information needed.”

The Environment Agency investigates “over 40,000 suspected offences each year,” the memo stated. Which is why it should also be able to ask ISPs to hand over people’s most sensitive communications information, in order “to tackle serious and organised waste crime.”

The Insolvency Service investigates breaches of company director disqualification orders. Some of those it investigates get put in jail so it is essential that the service be allowed “to attribute subscribers to telephone numbers and analyse itemised billings” as well as be able to see what IP addresses are accessing specific email accounts.

UKNACE, a little known agency that we have taken a look at in the past, is home of the real-life Qs, and one of its jobs is to detect attempts to eavesdrop on UK government offices. It needs access to the nation’s communications data “in order to identify and locate an attacker or an illegal transmitting device”, the memo claimed.

And lastly, the Pensions Regulator, which checks that companies have added their employees to their pension schemes, need to be able to delve into anyone’s emails so it can “secure compliance and punish wrongdoing.”

Taken together, the requests reflect exactly what critics of the Investigatory Powers Act feared would happen: that a once-shocking power that was granted on the back of terrorism fears is being slowly extended to even the most obscure government agency for no reason other that it will make bureaucrats’ lives easier.

None of the agencies would be required to apply for warrants to access people’s internet connection data, and they would be added to another 50-plus agencies that already have access, including the Food Standards Agency, Gambling Commission, and NHS Business Services Authority.

Safeguards

One of the biggest concerns remains that there are insufficient safeguards in place to prevent the system being abused; concerns that only grow as the number of people that have access to the country’s electronic communications grows.

It is also still not known precisely how all these agencies access the data that is accumulated, or what restrictions are in place beyond a broad-brush “double lock” authorization process that requires a former judge (a judicial commissioner, or JCs) to approve a minister’s approval.

Source: Why should the UK pensions watchdog be able to spy on your internet activities? Same reason as the Environment Agency and many more • The Register

Stripe Payment Provider is Silently Recording Your Movements On its Customers’ Websites

Among startups and tech companies, Stripe seems to be the near-universal favorite for payment processing. When I needed paid subscription functionality for my new web app, Stripe felt like the natural choice. After integration, however, I discovered that Stripe’s official JavaScript library records all browsing activity on my site and reports it back to Stripe. This data includes:

  1. Every URL the user visits on my site, including pages that never display Stripe payment forms
  2. Telemetry about how the user moves their mouse cursor while browsing my site
  3. Unique identifiers that allow Stripe to correlate visitors to my site against other sites that accept payment via Stripe

This post shares what I found, who else it affects, and how you can limit Stripe’s data collection in your web applications.

Source: Stripe is Silently Recording Your Movements On its Customers’ Websites · mtlynch.io

Zoom sex party moderation: app uses machine-learning to patrol nudity – will it record them to put up on the web?

As Rolling Stone reported, the app is now playing host to virtual sex parties,  “play parties,” and group check-ins which have become, as one host said, “the mutual appreciation jerk-off society.”

According to Zoom’s “acceptable use” policy, users may not use the technology to “engage in any activity that is harmful, obscene, or indecent, particularly as such would be understood in the context of business usage.” The policy specifies that this includes “displays of nudity, violence, pornography, sexually explicit material, or criminal activity.”

Zoom says that the platform uses ‘machine learning’ to identify accounts in violation of its policies — though it has remained vague about its methods for identifying offending users and content.

“We encourage users to report suspected violations of our policies, and we use a mix of tools, including machine learning, to proactively identify accounts that may be in violation,” a spokesperson for Zoom told Rolling Stone.

While Zoom executives did not respond to the outlet’s questions about the specifics of the machine-learning tools or how the platform might be alerted to nudity and pornographic content, a spokesperson did add that the company will take a “number of actions” against people found to be in violation of the specified acceptable use.

When reached for comment, a spokesperson for Zoom referred Insider to the “acceptable use” policy as well as the platform’s privacy policy which states that Zoom “does not monitor your meetings or its contents.”

The spokesperson also pointed to Yuan’s message in which he addressed how the company has “fallen short” of users’ “privacy and security expectations,” referencing instances of harassment and Zoom-bombing, and laid out the platform’s action plan going forward.

Source: Zoom sex party moderation: app uses machine-learning to patrol nudity – Insider

It’s not unthinkable that they will record the videos and them just leave them on the web for anyone to download. After all, they’ve left thousands of video calls just lying about before.

India says ‘Zoom is a not a safe platform’ and bans government users

India has effectively banned videoconferencing service Zoom for government users and repeated warnings that consumers need to be careful when using the tool.

The nation’s Cyber Coordination Centre has issued advice (PDF) titled “Advisory on Secure use of Zoom meeting platform by private individuals (not for use by government offices/officials for official purpose)”.

The document refers to past advisories that offered advice on how to use Zoom securely and warned that Zoom has weak authentication methods. Neither of those notifications mentioned policy about government use of the tool, meaning the new document is a significant change in position!

The document is otherwise a comprehensive-if-dull guide to using Zoom securely.

[…]

Source: India says ‘Zoom is a not a safe platform’ and bans government users • The Register

Apple: We respect your privacy so much we’ve revealed a little about what we can track when you use Maps

Apple has released a set of “Mobility Trends Reports” – a trove of anonymised and aggregated data that describes how people have moved around the world in the three months from 13 January to 13 April.

The data measures walking, driving and public transport use. And as you’d expect and as depicted in the image atop this story, human movement dropped off markedly as national coronavirus lockdowns came into effect.

Apple has explained the source of the data as follows:

This data is generated by counting the number of requests made to Apple Maps for directions in select countries/regions and cities. Data that is sent from users’ devices to the Maps service is associated with random, rotating identifiers so Apple doesn’t have a profile of your movements and searches. Data availability in a particular country/region or city is subject to a number of factors, including minimum thresholds for direction requests made per day.

Apple justified the release by saying it thinks it’ll help governments understand what its citizens are up to in these viral times. The company has also said this is a limited offer – it won’t be sharing this kind of analysis once the crisis passes.

But the data is also a peek at what Apple is capable of. And presumably also what Google, Microsoft, Waze, Mapquest and other spatial services providers can do too. Let’s not even imagine what Facebook could produce. ®

Source: Apple: We respect your privacy so much we’ve revealed a little about what we can track when you use Maps • The Register

Twitter Obliterates Its Users’ Privacy Choices

The EFF’s staff technologist — also an engineer on Privacy Badger and HTTPS Everywhere, writes: Twitter greeted its users with a confusing notification this week. “The control you have over what information Twitter shares with its business partners has changed,” it said. The changes will “help Twitter continue operating as a free service,” it assured. But at what cost?

Twitter has changed what happens when users opt out of the “Allow additional information sharing with business partners” setting in the “Personalization and Data” part of its site. The changes affect two types of data sharing that Twitter does… Previously, anyone in the world could opt out of Twitter’s conversion tracking (type 1), and people in GDPR-compliant regions had to opt in. Now, people outside of Europe have lost that option. Instead, users in the U.S. and most of the rest of the world can only opt out of Twitter sharing data with Google and Facebook (type 2).
The article explains how last August Twitter discovered that its option for opting out of device-level targeting and conversion tracking “did not actually opt users out.” But after fixing that bug, “advertisers were unhappy. And Twitter announced a substantial hit to its revenue… Now, Twitter has removed the ability to opt out of conversion tracking altogether.”

While users in Europe are protected by GDPR, “users in the United States and everywhere else, who don’t have the protection of a comprehensive privacy law, are only protected by companies’ self-interest…” BoingBoing argues that Twitter “has just unilaterally obliterated all its users’ privacy choices, announcing the change with a dialog box whose only button is ‘OK.’

Source: Twitter Accused of Obliterating Its Users’ Privacy Choices – Slashdot

Mozilla installs Scheduled Telemetry Task on Windows with Firefox 75 – if you had put telemetry on

Observant Firefox users on Windows who have updated the web browser to Firefox 75 may have noticed that the upgrade brought along with it a new scheduled tasks. The scheduled task is also added if Firefox 75 is installed on a Windows device.

The task’s name is Firefox Default Browser Agent and it is set to run once per day. Mozilla published a blog post on the official blog of the organization that provides information on the task and why it has been created.

firefox default browser agent

According to Mozilla, the task has been created to help the organization “understand changes in default browser settings”. At its core, it is a Telemetry task that collects information and sends the data to Mozilla.

Here are the details:

  • The Task is only created if Telemetry is enabled. If Telemetry is set to off (in the most recently used Firefox profile), it is not created and thus no data is sent. The same is true for Enterprise telemetry policies if they are configured. Update: Some users report that the task is created while Telemetry was set to off on their machine.
  • Mozilla collects information “related to the system’s current and previous default browser setting, as w2ell as the operating system locale and version”.
  • Mozilla notes that the data cannot be “associated with regular profile based telemetry data”.
  • The data is sent to Mozilla every 24 hours using the scheduled task.

Mozilla added the file default-browser-agent.exe to the Firefox installation folder on Windows which defaults to C:\Program Files\Mozilla Firefox\.

Firefox users have the following options if they don’t want the data sent to Mozilla:

  • Firefox users who opted-out of Telemetry are good, they don’t need to make any change as the new Telemetry data is not sent to Mozilla; this applies to users who opted-out of Telemetry in Firefox or used Enterprise policies to do so.
  • Firefox users who have Telemetry enabled can either opt-out of Telemetry or deal with the task/executable that is responsible.

Disable the Firefox Default Browser Agent task

firefox-browser agent task disabled

Here is how you disable the task:

  1. Open Start on the Windows machine and type Task Scheduler.
  2. Open the Task Scheduler and go to Task Scheduler Library > Mozilla.
  3. There you should find listed the Firefox Default Browser Agent task.
  4. Right-click on the task and select Disable.
  5. Note: Nightly users may see the Firefox Nightly Default Browser Agent task there as well and may disable it.

The task won’t be executed anymore once it is disabled.

Closing Words

The new Telemetry task is only introduced on Windows and runs only if Telemetry is enabled (which it is by default [NOTE: Is it? I don’t think so! It asks at install!]). Mozilla is transparent about the introduction and while that is good, I’d preferred if the company would have informed users about it in the browser after the upgrade to Firefox 75 or installation of the browser and before the task is executed the first time.

Source: Mozilla installs Scheduled Telemetry Task on Windows with Firefox 75 – gHacks Tech News

Go  to about:telemetry in Firefox to see what it’s collecting. In my case this was none, because when FF was installed it asked me whether I wanted it on or off and I said off.

Facebook asks users about coronavirus symptoms, releases friendship data to researchers

Facebook Inc said on Monday it would start surveying some U.S. users about their health as part of a Carnegie Mellon University research project aimed at generating “heat maps” of self-reported coronavirus infections.

The social media giant will display a link at the top of users’ News Feeds directing them to the survey, which the researchers say will help them predict where medical resources are needed. Facebook said it may make surveys available to users in other countries too, if the approach is successful.

Alphabet Inc’s Google, Facebook’s rival in mobile advertising, began querying users for the Carnegie Mellon project last month through its Opinion Rewards app, which exchanges responses to surveys from Google and its clients for app store credit.

Facebook said in a blog post that the Carnegie Mellon researchers “won’t share individual survey responses with Facebook, and Facebook won’t share information about who you are with the researchers.”

The company also said it would begin making new categories of data available to epidemiologists through its Disease Prevention Maps program, which is sharing aggregated location data with partners in 40 countries working on COVID-19 response.

Researchers use the data to provide daily updates on how people are moving around in different areas to authorities in those countries, along with officials in a handful of U.S. cities and states.

In addition to location data, the company will begin making available a “social connectedness index” showing the probability that people in different locations are Facebook friends, aggregated at the zip code level.

Laura McGorman, who runs Facebook’s Data for Good program, said the index could be used to assess the economic impact of the new coronavirus, revealing which communities are most likely to get help from neighboring areas and others that may need more targeted support.

New “co-location maps” can similarly reveal the probability that people in one area will come in contact with people in another, Facebook said.

Source: Facebook asks users about coronavirus symptoms, releases friendship data to researchers – Reuters

This might actually be a good way to use all that privacy invading data

A Feature on Zoom Secretly Displayed Data From People’s LinkedIn Profiles

But what many people may not know is that, until Thursday, a data-mining feature on Zoom allowed some participants to surreptitiously have access to LinkedIn profile data about other users — without Zoom asking for their permission during the meeting or even notifying them that someone else was snooping on them.

The undisclosed data mining adds to growing concerns about Zoom’s business practices at a moment when public schools, health providers, employers, fitness trainers, prime ministers and queer dance parties are embracing the platform.

An analysis by The New York Times found that when people signed in to a meeting, Zoom’s software automatically sent their names and email addresses to a company system it used to match them with their LinkedIn profiles.

The data-mining feature was available to Zoom users who subscribed to a LinkedIn service for sales prospecting, called LinkedIn Sales Navigator. Once a Zoom user enabled the feature, that person could quickly and covertly view LinkedIn profile data — like locations, employer names and job titles — for people in the Zoom meeting by clicking on a LinkedIn icon next to their names.

The system did not simply automate the manual process of one user looking up the name of another participant on LinkedIn during a Zoom meeting. In tests conducted last week, The Times found that even when a reporter signed in to a Zoom meeting under pseudonyms — “Anonymous” and “I am not here” — the data-mining tool was able to instantly match him to his LinkedIn profile. In doing so, Zoom disclosed the reporter’s real name to another user, overriding his efforts to keep it private.

Reporters also found that Zoom automatically sent participants’ personal information to its data-mining tool even when no one in a meeting had activated it. This week, for instance, as high school students in Colorado signed in to a mandatory video meeting for a class, Zoom readied the full names and email addresses of at least six students — and their teacher — for possible use by its LinkedIn profile-matching tool, according to a Times analysis of the data traffic that Zoom sent to a student’s account.

The discoveries about Zoom’s data-mining feature echo what users have learned about the surveillance practices of other popular tech platforms over the last few years. The video-meeting platform that has offered a welcome window on American resiliency during the coronavirus — providing a virtual peek into colleagues’ living rooms, classmates’ kitchens and friends’ birthday celebrations — can reveal more about its users than they may realize.

“People don’t know this is happening, and that’s just completely unfair and deceptive,” Josh Golin, the executive director of the Campaign for a Commercial-Free Childhood, a nonprofit group in Boston, said of the data-mining feature. He added that storing the personal details of schoolchildren for nonschool purposes, without alerting them or obtaining a parent’s permission, was particularly troubling.

Source: A Feature on Zoom Secretly Displayed Data From People’s LinkedIn Profiles – The New York Times

Thousands of recorded Zoom Video Calls Left Exposed on Open Web

Thousands of personal Zoom videos have been left viewable on the open Web, highlighting the privacy risks to millions of Americans as they shift many of their personal interactions to video calls in an age of social distancing. From a report: Many of the videos appear to have been recorded through Zoom’s software and saved onto separate online storage space without a password. But because Zoom names every video recording in an identical way, a simple online search can reveal a long stream of videos that anyone can download and watch. Zoom videos are not recorded by default, though call hosts can choose to save them to Zoom servers or their own computers. There’s no indication that live-streamed videos or videos saved onto Zoom’s servers are publicly visible. But many participants in Zoom calls may be surprised to find their faces, voices and personal information exposed because a call host can record a large group call without participants’ consent.

Source: Thousands of Zoom Video Calls Left Exposed on Open Web – Slashdot

Someone Convinced Google To Delist Our Entire Right To Be Forgotten Tag In The EU For Searches On Their Name, which means we can’t tell if they are abusing the system

The very fact that the tag being delisted when searching for this unnamed individual is the “right to be forgotten” tag shows that whoever this person is, they recognize that they are not trying to cover up the record of, say, an FTC case against them from… oh, let’s just say 2003… but rather are now trying to cover up their current effort to abuse the right to be forgotten process.

Anyway, in theory (purely in theory, of course) if someone in the EU searched for the name of anyone, it might be helpful to know if the Director of the FTC’s Bureau of Consumer Protection once called him a “spam scammer” who “conned consumers in two ways.” But, apparently, in the EU, that sort of information is no longer useful. And you also can’t find out that he’s been using the right to be forgotten process to further cover his tracks. That seems unfortunate, and entirely against the supposed principle behind the “right to be forgotten.” No one is trying to violate anyone’s “privacy” here. We’re talking about public court records, and an FTC complaint and later settlement on a fairly serious crime that took place not all that long ago. That ain’t private information. And, even more to the point, the much more recent efforts by that individual to then hide all the details of this public record.

Source: Someone Convinced Google To Delist Our Entire Right To Be Forgotten Tag In The EU For Searches On Their Name | Techdirt

US Officials Use Mobile Ad Location Data to Study How COVID-19 Spreads, not cellphone tower data

Government officials across the U.S. are using location data from millions of cellphones in a bid to better understand the movements of Americans during the coronavirus pandemic and how they may be affecting the spread of the disease…

The data comes from the mobile advertising industry rather than cellphone carriers. The aim is to create a portal for federal, state and local officials that contains geolocation data in what could be as many as 500 cities across the U.S., one of the people said, to help plan the epidemic response… It shows which retail establishments, parks and other public spaces are still drawing crowds that could risk accelerating the transmission of the virus, according to people familiar with the matter… The data can also reveal general levels of compliance with stay-at-home or shelter-in-place orders, according to experts inside and outside government, and help measure the pandemic’s economic impact by revealing the drop-off in retail customers at stores, decreases in automobile miles driven and other economic metrics.

The CDC has started to get analyses based on location data through through an ad hoc coalition of tech companies and data providers — all working in conjunction with the White House and others in government, people said.

The CDC and the White House didn’t respond to requests for comment.
It’s the cellphone carriers turning over pandemic-fighting data in Germany, Austria, Spain, Belgium, the U.K., according to the article, while Israel mapped infections using its intelligence agencies’ antiterrorism phone-tracking. But so far in the U.S., “the data being used has largely been drawn from the advertising industry.

“The mobile marketing industry has billions of geographic data points on hundreds of millions of U.S. cell mobile devices…”

Source: US Officials Use Mobile Ad Location Data to Study How COVID-19 Spreads – Slashdot

I am unsure if this says more about the legality of the move or the technical decentralisation of cell phone tower data making it technically difficult to track the whole population

Israel uses anti-terrorist tech to monitor phones of virus patients

Israel has long been known for its use of technology to track the movements of Palestinian militants. Now, Prime Minister Benjamin Netanyahu wants to use similar technology to stop the movement of the coronavirus.

Netanyahu’s Cabinet on Sunday authorized the Shin Bet security agency to use its phone-snooping tactics on coronavirus patients, an official confirmed, despite concerns from civil-liberties advocates that the practice would raise serious privacy issues. The official spoke on condition of anonymity pending an official announcement.

Netanyahu announced his plan in a televised address late Saturday, telling the nation that the drastic steps would protect the public’s health, though it would also “entail a certain degree of violation of privacy.”

Israel has identified more than 200 cases of the coronavirus. Based on interviews with these patients about their movements, health officials have put out public advisories ordering tens of thousands of people who may have come into contact with them into protective home quarantine.

The new plan would use mobile-phone tracking technology to give a far more precise history of an infected person’s movements before they were diagnosed and identify people who might have been exposed.

In his address, Netanyahu acknowledged the technology had never been used on civilians. But he said the unprecedented health threat posed by the virus justified its use. For most people, the coronavirus causes only mild or moderate symptoms. But for some, especially older adults and people with existing health problems, it can cause more severe illness.

“They are not minor measures. They entail a certain degree of violation of the privacy of those same people, who we will check to see whom they came into contact with while sick and what preceded that. This is an effective tool for locating the virus,” Netanyahu said.

The proposal sparked a heated debate over the use of sensitive security technology, who would have access to the information and what exactly would be done with it.

Nitzan Horowitz, leader of the liberal opposition party Meretz, said that tracking citizens “using databases and sophisticated technological means are liable to result in a severe violation of privacy and basic civil liberties.” He said any use of the technology must be supervised, with “clear rules” for the use of the information.

Netanyahu led a series of discussions Sunday with security and health officials to discuss the matter. Responding to privacy concerns, he said late Sunday he had ordered a number of changes in the plan, including reducing the scope of data that would be gathered and limiting the number of people who could see the information, to protect against misuse.

Source: Israel takes step toward monitoring phones of virus patients – ABC News

What I’m missing is a maximum duration for these powers to be used.

Zoom Removes Code That Sends Data to Facebook – but there is still plenty of nasty stuff in there

On Friday video-conferencing software Zoom issued an update to its iOS app which stops it sending certain pieces of data to Facebook. The move comes after a Motherboard analysis of the app found it sent information such as when a user opened the app, their timezone, city, and device details to the social network giant.

When Motherboard analyzed the app, Zoom’s privacy policy did not make the data transfer to Facebook clear.

“Zoom takes its users’ privacy extremely seriously. We originally implemented the ‘Login with Facebook’ feature using the Facebook SDK in order to provide our users with another convenient way to access our platform. However, we were recently made aware that the Facebook SDK was collecting unnecessary device data,” Zoom told Motherboard in a statement on Friday.

Source: Zoom Removes Code That Sends Data to Facebook – VICE

But there is still pleny of data being hoovered up by Zoom:
Yeah, that Zoom app you’re trusting with work chatter? It lives with ‘vampires feeding on the blood of human data’

Yeah, that Zoom app you’re trusting with work chatter? It lives with ‘vampires feeding on the blood of human data’

As the global coronavirus pandemic pushes the popularity of videoconferencing app Zoom to new heights, one web veteran has sounded the alarm over its “creepily chummy” relationship with tracking-based advertisers.

Doc Searls, co-author of the influential internet marketing book The Cluetrain Manifesto last century, today warned [cached] Zoom not only has the right to extract data from its users and their meetings, it can work with Google and other ad networks to turn this personal information into targeted ads that follow them across the web.

This personal info includes, and is not limited to, names, addresses and any other identifying data, job titles and employers, Facebook profiles, and device specifications. Crucially, it also includes “the content contained in cloud recordings, and instant messages, files, whiteboards … shared while using the service.”

Searls said reports outlining how Zoom was collecting and sharing user data with advertisers, marketers, and other companies, prompted him to pore over the software maker’s privacy policy to see how it processes calls, messages, and transcripts.

And he concluded: “Zoom is in the advertising business, and in the worst end of it: the one that lives off harvested personal data.

“What makes this extra creepy is that Zoom is in a position to gather plenty of personal data, some of it very intimate (for example with a shrink talking to a patient) without anyone in the conversation knowing about it. (Unless, of course, they see an ad somewhere that looks like it was informed by a private conversation on Zoom.)”

The privacy policy, as of March 18, lumps together a lot of different types of personal information, from contact details to meeting contents, and says this info may be used, one way or another, to personalize web ads to suit your interests.

“Zoom does use certain standard advertising tools which require personal data,” the fine-print states. “We use these tools to help us improve your advertising experience (such as serving advertisements on our behalf across the internet, serving personalized ads on our website, and providing analytics services) … For example, Google may use this data to improve its advertising services for all companies who use their services.”

Searls, a former Harvard Berkman Fellow, said netizens are likely unaware their information could be harvested from their Zoom accounts and video conferences for advertising and tracking across the internet: “A person whose personal data is being shed on Zoom doesn’t know that’s happening because Zoom doesn’t tell them. There’s no red light, like the one you see when a session is being recorded.

“Nobody goes to Zoom for an ‘advertising experience,’ personalized or not. And nobody wants ads aimed at their eyeballs elsewhere on the ‘net by third parties using personal information leaked out through Zoom.”

Speaking of Zoom…

Zoom’s iOS app sent analytics data to Facebook even if you didn’t use Facebook, due to the application’s use of the social network’s Graph API, Vice discovered. The privacy policy stated the software collects profile information when a Facebook account is used to sign into Zoom, though it didn’t say anything about what happens if you don’t use Facebook. Zoom has since corrected its code to not send analytics in these circumstances.

It should go without saying but don’t share your Zoom meeting ID and password in public, such as on social media, as miscreants will spot it, hijack it, and bomb it with garbage. And don’t forget to set a strong password, too. Zoom had to beef up its meeting security after Check Point found a bunch of weaknesses, such as the fact it was easy to guess or brute-force meeting IDs.

Source: Yeah, that Zoom app you’re trusting with work chatter? It lives with ‘vampires feeding on the blood of human data’ • The Register

Android Apps Are Transmitting what other apps you have ever installed to marketing peole

At this point we’re all familiar with apps of all sorts tracking our every move and sharing that info with pretty much every third party imaginable. But it actually may not be as simple as tracking where you go and what you do in an app: It turns out that these apps might be dropping details about the other programs you’ve installed on your phone, too.

This news comes courtesy of a new paper out from a team of European researchers who found that some of the most popular apps in the Google Play store were bundled with certain bits of software that pull details of any apps that were ever downloaded onto a person’s phone.

Before you immediately chuck your Android device out the window in some combination of fear and disgust, we need to clarify a few things. First, these bits of software—called IAMs, or “installed application methods”—have some decent uses. A photography app might need to check the surrounding environment to make sure you have a camera installed somewhere on your phone. If another app immediately glitches out in the presence of an on-phone camera, knowing the environment—and the reason for that glitch—can help a developer know which part of his app to tinker with to keep that from happening in the future.

Because these IAM-specific calls are technically for debugging purposes, they generally don’t need to secure permissions the same way an app usually would when, say, asking for your location. Android devices have actually gotten better about clamping down on that form of invasive tracking after struggling with it for years, recently announcing that the Android 11 formally requiring that devs apply for location permissions access before Google grants it.

But at the same time, surveying the apps on a given phone can go the invasive route very easily: The apps we download can tip developers off about our incomes, our sexualities, and some of our deepest fears.

The research team found that, of the roughly 4,200 commercial apps it surveyed making these IAM calls, almost half were strictly grabbing details on the surrounding apps. For context, most other calls—which were for monitoring details about the app like available updates, or the current app version—together made up less than one percent of all calls they observed.

There are a few reasons for the prevalence of this errant app-sniffing behavior, but for the most part it boils down to one thing: money. A lot of these IAMs come from apps that are on-boarding software from adtech companies offering developers an easy way to make quick cash off their free product. That’s probably why the lion’s share—more than 83%—of these calls were being made on behalf of third-party code that the dev onboarded for their commercially available app, rather than code that was baked into that app by design.

And for the most part, these third parties are—as you might have suspected—companies that specialize in targeted advertising. Looking over the top 20 libraries that pull some kind of data via IAMs, some of the top contenders, like ironSource or AppNext, are in the business of getting the right ads in front of the right player at the right time, offering the developer the right price for their effort.

And because app developers—like most people in the publishing space—are often hard-up for cash, they’ll onboard these money-making tools without asking how they make that money in the first place. This kind of daisy-chaining is the same reason we see trackers of every shape and size running across every site in the modern ecosystem, at times without the people actually behind the site having any idea.

Source: Android Apps May Be Snooping on You More Than You Realize

Ring corporate surveillance doorbells Continues To Insist Its Cameras Reduce Crime, But Crime Data Doesn’t Back Those Claims Up

Despite evidence to the contrary, Amazon’s Ring is still insisting its the best thing people can put on their front doors — an IoT camera with PD hookups that will magically reduce crime in their neighborhoods simply by being a mute witness of criminal acts.

Boasting over 1,000 law enforcement partnerships, Ring talks a good game about crime reduction, but its products haven’t proven to be any better than those offered by competitors — cameras that don’t come with law enforcement strings attached.

Last month, Cyrus Farivar undid a bit of Ring’s PR song-and-dance by using public records requests and conversations with law enforcement agencies to show any claim Ring makes about crime reduction probably (and in some cases definitely) can’t be linked to the presence of Ring’s doorbell cameras.

CNET has done the same thing and come to the same conclusion: the deployment of Ring cameras rarely results in any notable change in property crime rates. That runs contrary to the talking points deployed by Dave Limp — Amazon’s hardware chief — who “believes” adding Rings to neighborhoods makes neighborhoods safer. Limp needs to keep hedging.

CNET obtained property-crime statistics from three of Ring’s earliest police partners, examining the monthly theft rates from the 12 months before those partners signed up to work with the company, and the 12 months after the relationships began, and found minimal impact from the technology.

The data shows that crime continued to fluctuate, and analysts said that while many factors affect crime rates, such as demographics, median income and weather, Ring’s technology likely wasn’t one of them.

Worse for Ring — which has used its partnerships with law enforcement agencies to corner the market for doorbell cameras — law enforcement agencies are saying the same thing: Ring isn’t having any measurable impact on crime.

“In 2019, we saw a 6% decrease in property crime,” said Kevin Warych, police patrol commander in Green Bay, Wisconsin, but he noted, “there’s no causation with the Ring partnership.”

[…]

“I can’t put numbers on it specifically, if it works or if it doesn’t reduce crime,” [Aurora PD public information officer Paris] Lewbel said.

But maybe it doesn’t really matter to Ring if law enforcement agencies believe the crime reduction sales pitch. What ultimately matters is that end users might. After all, these cameras are installed on homes, not police departments. As long as potential customers believe crime in their area (or at least their front doorstep) will be reduced by the presence of camera, Ring can continue to increase market share.

But the spin is, at best, inaccurate. Crime rates in cities where Ring has partnered with law enforcement agencies continue to fluctuate. Meanwhile, Ring has fortuitously begun its mass deployment during a time of historically-low crime rates which have dropped steadily for more than 20 years. Hitting the market when things are good and keep getting better makes for pretty good PR, especially when company reps are willing to convert correlation to causation to sell devices.

Source: Ring Continues To Insist Its Cameras Reduce Crime, But Crime Data Doesn’t Back Those Claims Up | Techdirt

HP printers try to send loads of data back to HP about your devices and what you print

NB you can disable outgoing communication in the public network using windows defender by using the instructions here (HP).

They come down to opening windows defender firewall, allowing an app or feature through windows defender firewall, searching for HP and then deselecting the public zone.

At first the setup process was so simple that even a computer programmer could do it. But then, after I had finished removing pieces of cardboard and blue tape from the various drawers of the machine, I noticed that the final step required the downloading of an app of some sort onto a phone or computer. This set off my crapware detector.

It’s possible that I was being too cynical. I suppose that it was theoretically possible that the app could have been a thoughtfully-constructed wizard, which did nothing more than gently guide non-technical users through the sometimes-harrowing process of installing and testing printer drivers. It was at least conceivable that it could then quietly uninstall itself, satisfied with a simple job well done.

Of course, in reality it was a way to try and get people to sign up for expensive ink subscriptions and/or hand over their email addresses, plus something even more nefarious that we’ll talk about shortly (there were also some instructions for how to download a printer driver tacked onto the end). This was a shame, but not unexpected. I’m sure that the HP ink department is saddled with aggressive sales quotas, and no doubt the only way to hit them is to ruthlessly exploit people who don’t know that third-party cartridges are just as good as HP’s and are much cheaper. Fortunately, the careful user can still emerge unscathed from this phase of the setup process by gingerly navigating the UI patterns that presumably do fool some people who aren’t paying attention.

But it is only then, once the user has found the combination of “Next” and “Cancel” buttons that lead out of the swamp of hard sells and bad deals, that they are confronted with their biggest test: the “Data Collection Notice & Settings”.

In summary, HP wants its printer to collect all kinds of data that a reasonable person would never expect it to. This includes metadata about your devices, as well as information about all the documents that you print, including timestamps, number of pages, and the application doing the printing (HP state that they do stop short of looking at the contents of your documents). From the HP privacy policy, linked to from the setup program:

Product Usage Data – We collect product usage data such as pages printed, print mode, media used, ink or toner brand, file type printed (.pdf, .jpg, etc.), application used for printing (Word, Excel, Adobe Photoshop, etc.), file size, time stamp, and usage and status of other printer supplies. We do not scan or collect the content of any file or information that might be displayed by an application.

Device Data – We collect information about your computer, printer and/or device such as operating system, firmware, amount of memory, region, language, time zone, model number, first start date, age of device, device manufacture date, browser version, device manufacturer, connection port, warranty status, unique device identifiers, advertising identifiers and additional technical information that varies by product.

HP wants to use the data they collect for a wide range of purposes, the most eyebrow-raising of which is for serving advertising. Note the last column in this “Privacy Matrix”, which states that “Product Usage Data” and “Device Data” (amongst many other types of data) are collected and shared with “service providers” for purposes of advertising.

HP delicately balances short-term profits with reasonable-man-ethics by only half-obscuring the checkboxes and language in this part of the setup.

At this point everything has become clear – the job of this setup app is not only to sell expensive ink subscriptions; it’s also to collect what apparently passes for informed consent in a court of law. I clicked the boxes to indicate “Jesus Christ no, obviously not, why would anyone ever knowingly consent to that”, and then spent 5 minutes Googling how to make sure that this setting was disabled. My research suggests that it’s controlled by an item in the settings menu of the printer itself labelled “Store anonymous usage information”. However, I don’t think any reasonable person would think that the meaning of “Store anonymous usage information” includes “send analytics data back to HP’s servers so that it can be used for targeted advertising”, so either HP is being deliberately coy or there’s another option that disables sending your data that I haven’t found yet.

I bet there’s also a vigorous debate to be had over whether HP’s definition of “anonymous” is the same as mine.


I imagine that a user’s data is exfiltrated back to HP by the printer itself, rather than any client-side software. Once HP has a user’s data then I don’t know what they do with it. Maybe if they can see that you are printing documents from Photoshop then they can send you spam for photo paper? I also don’t know anything about how much a user’s data is worth. My guess is that it’s depressingly little. I’d almost prefer it if HP was snatching highly valuable information that was worth making a high-risk, high-reward play for. But I can’t help but feel like they’re just grabbing whatever data is lying around because they might as well, it might be worth a few cents, and they (correctly) don’t anticipate any real risk to their reputation and bottom line from doing so.

Recommended for who?

Source: HP printers try to send data back to HP about your devices and what you print | Robert Heaton

Private By Design: Free and Private Voice Assistants

Science fiction has whetted our imagination for helpful voice assistants. Whether it’s JARVIS from Iron Man, KITT from Knight Rider, or Computer from Star Trek, many of us harbor a desire for a voice assistant to manage the minutiae of our daily lives. Speech recognition and voice technologies have advanced rapidly in recent years, particularly with the adoption of Siri, Alexa, and Google Home.

However, many in the maker community are concerned — rightly — about the privacy implications of using commercial solutions. Just how much data do you give away every time you speak with a proprietary voice assistant? Just what are they storing in the cloud? What free, private, and open source options are available? Is it possible to have a voice stack that doesn’t share data across the internet?

Yes, it is. In this article, I’ll walk you through the options.

WHAT’S IN A VOICE STACK?

Some voice assistants offer a whole stack of software, but you may prefer to pick and choose which layers to use.

» WAKE WORD SPOTTER — This layer is constantly listening until it hears the wake word or hot word, at which point it will activate the speech-to-text layer. “Alexa,” “Jarvis,” and “OK Google” are wake words you may know.

» SPEECH TO TEXT (STT) — Also called automatic speech recognition (ASR). Once activated by the wake word, the job of the STT layer is just that: to recognize what you’re saying and turn it into written form. Your spoken phrase is called an utterance.

» INTENT PARSER — Also called natural language processing (NLP) or natural language understanding (NLU). The job of this layer is to take the text from STT and determine what action you would like to take. It often does this by recognizing entities — such as a time, date, or object — in the utterance.

» SKILL — Once the intent parser has determined what you’d like to do, an application or handler is triggered. This is usually called a skill or application. The computer may also create a reply in human-readable language, using natural language generation (NLG).

» TEXT TO SPEECH — Once the skill has completed its task, the voice assistant may acknowledge or respond using a synthesized voice.

Some layers work on device, meaning they don’t need an internet connection. These are a good option for those concerned about privacy, because they don’t share your data across the internet. Others do require an internet connection because they offload processing to cloud servers; these can be more of a privacy risk.

Before you pick a voice stack for your project you’ll need to ask key questions such as:

• What’s the interface of the software like — how easy is it to install and configure, and what support is available?

• What sort of assurances do you have around the software? How accurate is it? Does it recognize your accent well? Is it well tested? Does it make the right decisions about your intended actions?

• What sort of context, or use case, do you have? Do you want your data going across the internet or being stored on cloud servers? Is your hardware constrained in terms of memory or CPU? Do you need to support languages other than English?

ALL-IN-ONE VOICE SOLUTIONS

If you’re looking for an easy option to start with, you might want to try an all-in-one voice solution. These products often package other software together in a way that’s easy to install. They’ll get your DIY voice project up and running the fastest.

Jasper  is designed from the ground up for makers, and is intended to run on a Raspberry Pi. It’s a great first step for integrating voice into your projects. With Jasper, you choose which software components you want to use, and write your own skills, and it’s possible to configure it so that it doesn’t need an internet connection to function.

Rhasspy also uses a modular framework and can be run without an internet connection. It’s designed to run under Docker and has integrations for NodeRED and for Home Assistant, a popular open source home automation software.

Mycroft is modular too, but by default it requires an internet connection. Skills in Mycroft are easy to develop and are written in Python 3; existing skills include integrations with Home Assistant and Mozilla WebThings. Mycroft also builds open-source hardware voice assistants similar to Amazon Echo and Google Home. And it has a distribution called Picroft specifically for the Raspberry Pi 3B and above.

Almond is a privacy-preserving voice assistant from Stanford that’s available as a web app, for Android, or for the GNOME Linux desktop. Almond is very new on the scene, but already has an integration with Home Assistant. It also has options that allow it to run on the command line, so it could be installed on a Raspberry Pi (with some effort).

The languages supported by all-in-one voice solutions are dependent on what software options are selected, but by default they use English. Other languages require additional configuration.

WAKE WORD SPOTTERS

PocketSphinx is a great option for wake word spotting. It’s available for Linux, Mac, Windows platforms, as well as Android and iOS; however, installation can be involved. PocketSphinx works on-device, by recognizing phonemes, which are the smallest units of sound that make up a word.

For example, hello and world each have four phonemes:

hello H EH L OW

world W ER L D

The downside of PocketSphinx is that its core developers appear to have moved on to a for-profit company, so it’s not clear how long PocketSphinx or its parent CMU Sphinx will be around.

Precise by Mycroft.AI uses a recurrent neural network to learn what are and are not wake words. You can train your own wake words with Precise, but it does take a lot of training to get accurate results.

Snowboy is free for makers to train your own wake word, using Kitt.AI’s (proprietary) training, but also comes with several pre-trained models, and wrappers for several programming languages including Python and Go. Once you’ve got your trained wake word, you no longer need an internet connection. It’s an easier option for beginners than Precise or PocketSphinx, and has a very small CPU footprint, which makes it ideal for embedded electronics. Kitt.AI was acquired by Chinese giant Baidu in 2017, although to date it appears to remain as its own entity.

Porcupine from Picovoice is designed specifically for embedded applications. It comes in two variants: a complete model with higher accuracy, and a compressed model with slightly lower accuracy but a much smaller CPU and memory footprint. It provides examples for integration with several common programming languages. Ada, the voice assistant recently released by Home Assistant, uses Porcupine under the hood.

SPEECH TO TEXT

Kaldi has for years been the go-to open source speech-to-text engine. Models are available for several languages, including Mandarin. It works on-device but is notoriously difficult to set up, not recommended for beginners. You can use Kaldi to train your own speech-to-text model, if you have spoken phrases and recordings, for example in another language. Researchers in the Australian Centre for the Dynamics of Language have recently developed Elpis , a wrapper for Kaldi that makes transcription to text a lot easier. It’s aimed at linguists who need to transcribe lots of recordings.

CMU Sphinx , like its child PocketSphinx, is based on phoneme recognition, works on-device, and is complex for beginners.

DeepSpeech, part of Mozilla’s Common Voice project , is another major player in the open source space that’s been gaining momentum. DeepSpeech comes with a pre-trained English model but can be trained on other data sets — this requires a compatible GPU. Trained models can be exported using TensorFlow Lite for inference, and it’s been tested on an RasPi 4, where it comfortably performs real-time transcriptions. Again, it’s complex for beginners.

INTENT PARSING AND ENTITY RECOGNITION

There are two general approaches to intent parsing and entity recognition: neural networks and slot matching. The neural network is trained on a set of phrases, and can usually match an utterance that “sounds like” an intent that should trigger an action. In the slot matching approach, your utterance needs to closely match a set of predefined “slots,” such as “play the song [songname] using [streaming service].” If you say “play Blur,” the utterance won’t match the intent.

Padatious is Mycroft’s new intent parser, which uses a neural network. They also developed Adapt which uses the slot matching approach.

For those who use Python and want to dig a little deeper into the structure of language, the Natural Language Toolkit is a powerful tool, and can do “parts of speech” tagging — for example recognizing the names of places.

Rasa  is a set of tools for conversational applications, such as chatbots, and includes a robust intent parser. Rasa makes predictions about intent based on the entire context of a conversation. Rasa also has a training tool called Rasa X, which helps you train the conversational agent to your particular context. Rasa X comes in both an open source community edition and a licensed enterprise edition.

Picovoice also has Rhino, which comes with pre-trained intent parsing models for free. However, customization of models — for specific contexts like medical or industrial applications — requires a commercial license.

TEXT TO SPEECH

Just like speech-to-text models need to be “trained” for a particular language or dialect, so too do text-to-speech models. However, text to speech is usually trained on a single voice, such as “British Male” or “American Female.”

eSpeak  is perhaps the best-known open source text-to-speech engine. It supports over 100 languages and accents, although the quality of the voice varies between languages. eSpeak supports the Speech Synthesis Markup Language format, which can be used to add inflection and emphasis to spoken language. It is available for Linux, Windows, Mac, and Android systems, and it works on-device, so it can be used without an internet connection, making it ideal for maker projects.

Festival is now quite dated, and needs to be compiled from source for Linux, but does have around 15 American English voices available. It works on-device. It’s mentioned here out of respect; for over a decade it was considered the premier open source text-to-speech engine.

Mimic2 is a Tacotron fork from Mycroft AI, who have also released the to allow you to build your own text-to-speech voices. To get a high-quality voice requires up to 100 hours of “clean” speech, and Mimic2 is too large to work on-device, so you need to host it on your own server or connect your device to the Mycroft Mimic2 server. Currently it only has a pre-trained voice for American English.

Mycroft’s earlier Mimic TTS can work on-device, even on a Raspberry Pi, and is another good candidate for maker projects. It’s a fork of CMU Flite.

Mary Text to Speech supports several, mainly European languages, and has tools for synthesizing new voices. It runs on Java, so can be complex to install.

So, that’s a map of the current landscape in open source voice assistants and software layers. You can compare all these layers in the chart at the end of this article. Whatever your voice project, you’re likely to find something here that will do the job well — and will keep your voice and your data private from Big Tech.

WHAT’S NEXT FOR OPEN SOURCE VOICE?

As machine learning and natural language processing continue to advance rapidly, we’ve seen the decline of the major open source voice tools. CMU Sphinx, Festival, and eSpeak have become outdated as their supporters have adopted other tools, or maintainers have gone into private industry and startups.

We’re going to see more software that’s free for personal use but requires a commercial license for enterprise, as Rasa and Picovoice do today. And it’s understandable; dealing with voice in an era of machine learning is data intensive, a poor fit for the open source model of volunteer development. Instead, companies are driven to commercialize by monetizing a centralized “platform as a service.”

Another trajectory this might take is some form of value exchange. Training all those neural networks and machine learning models — for STT, intent parsing, and TTS — takes vast volumes of data. More companies may provide software on an open source basis and in return ask users to donate voice samples to improve the data sets.Mozilla’s Common Voice follows this model.

Another trend is voice moving on-device. The newer, machine-learning-driven speech tools originally were too computationally intensive to run on low-end hardware like the Raspberry Pi. But with DeepSpeech now running on a RasPi 4, it’s only a matter of time before the newer TTS tools can too.

We’re also seeing a stronger focus on personalization, with the ability to customize both speech-to-text and text-to-speech software.

WHAT WE STILL NEED

What’s lacking across all these open source tools are user-friendly interfaces to capture recordings and train models. Open source products must continue to improve their UIs to attract both developer and user communities; failure to do so will see more widespread adoption of proprietary and “freemium” tools.

As always in emerging technologies, standards remain elusive. For example, skills have to be rewritten for different voice assistants. Device manufacturers, particularly for smart home appliances, won’t want to develop and maintain integrations for multiple assistants; much of this will fall to an already-stretched open source community until mechanisms for interoperability are found. Mozilla’s WebThings ecosystem (see page 50) may plug the interoperability gap if it can garner enough developer support.

Regardless, the burden rests with the open source community to find ways to connect to proprietary systems (see page 46 for a fun example) because there’s no incentive for manufacturers to do the converse.

The future of open source rests in your hands! Experiment and provide feedback, issues, pull requests, data, ideas, and bugs. With your help, open source can continue to have a strong voice.

click the image to view full size. Alternatively, you can download this data as a spreadsheet by clicking here.

Source: Private By Design: Free and Private Voice Assistants

Pervasive digital locational surveillance of citizens deployed in COVID-19 fight

Pervasive surveillance through digital technologies is the business model of Facebook and Google. And now governments are considering the web giants’ tools to track COVID-19 carriers for the public good.

Among democracies, Israel appears to have gone first: prime minister Benjamin Netanyahu has announced “emergency regulations that will enable the use of digital means in the war on Corona. These means will greatly assist us in locating patients and thereby stop the spread of the virus.”

Speaking elsewhere, Netanyhau said the digital tools are those used by Israeli security agency Shin Bet to observe terrorists. Netanyahu said the tools mean the government “will be able to see who they [people infected with the virus] were with, what happened before and after [they became infected].”

Strict oversight and a thirty-day limit on the use of the tools is promised. But the tools’ use was announced as a fait accompli before Israel’s Parliament or the relevant committee could properly authorise their use. And that during a time of caretaker government!

The idea of using tech to spy on COVID-carriers may now be catching.

The Washington Post has reported that the White House has held talks with Google and Facebook about how the data they hold could contribute to analysis of the virus’ spread. Both companies already share some anonymised location with researchers. The Post suggested anonymised location data be used by government agencies to understand how people are behaving.

Thailand recently added a COVID-19-screening form to the Airports of Thailand app. While the feature is a digital replica of a paper registration form offered to incoming travellers, the app asks for location permission and tries to turn on Bluetooth every time it is activated. The Register has asked the app’s developers to explain the permissions it seeks, but has not received a reply in 48 hours.

Computer Emergency Response Team in Farsi chief incident response officer Nariman Gharib has claimed that the Iranian government’s COVID-diagnosis app tracks its users.

China has admitted it’s using whatever it wants to track its people – the genie has been out of the bottle there for years.

If other nations follow suit, will it be possible to put the genie back in?

Probably not: plenty of us give away our location data to exercise-tracking apps for the sheer fun of it and government agencies gleefully hoover up what they call “open source intelligence

Source: Pervasive digital surveillance of citizens deployed in COVID-19 fight, with rules that send genie back to bottle • The Register

Brave Browser Delivers on Promise, Files GDPR Complaint Against Google

Earlier today, March 16, Brave filed a formal complaint against Google with the lead General Data Protection Regulation (GDPR) enforcer in Europe.

In a February Cointelegraph interview, Dr. Johnny Ryan, Brave’s chief policy and industry relations officer, explained that Google is abusing its power by sharing user data collected by dozens of its distinct services, creating a “free for all” data warehouse. According to Ryan, this was a clear violation of the GDPR.

Aggravated with the situation and the lack of enforcement against the giant, Ryan promised to take Google to court if things don’t change for the better.

Complaint against Google

Now, the complaint is with the Irish Data Protection Commission. It accuses Google of violating Article 5(1)b of the GDPR. Dublin is Google’s European headquarters and, as Dr. Ryan explained to Cointelegraph, the Commission “is responsible for regulating Google’s data protection across the European Economic Area”.

Article 5(1)b of the GDPR requires that data be “collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes”. According to Dr. Ryan:

“Enforcement of Brave’s GDPR ‘purpose limitation’ complaint against Google would be tantamount to a functional separation, giving everyone the power to decide what parts of Google they chose to reward with their data.”

Google is a “black box”

Dr. Ryan has spent six months trying to elicit a response from Google to a basic question: “What do you do with my data?” to no avail.

Alongside the complaint, Brave released a study called “Inside the Black Box”, that:

“Examines a diverse set of documents written for Google’s business clients, technology partners, developers, lawmakers, and users. It reveals that Google collects personal data from integrations with websites, apps, and operating systems, for hundreds ill-defined processing purposes.”

Brave does not need regulators to compete with Google

Cointelegraph asked Dr. Ryan how Google’s treatment of user data frustrates Brave as a competitor, to which  Dr. Ryan replied:

“The question is not relevant. Brave does not —  as far as I am aware — have direct frustrations with Google. Brave is growing nicely by being a particularly fast, excellent, and private browser. (It doesn’t need regulators to help it grow.)”

A recent privacy study indicated that Brave protects user privacy much better than Google Chrome or any other major browser.

In addition to filing a formal complaint with the Irish Data Protection Commission, Brave has reportedly written to the European Commission, German Bundeskartellamt, UK Competition & Markets Authority, and French Autorité de la concurrence.

If none of these regulatory bodies take action against Google, Brave has suggested that it may take the tech giant to court itself.

Source: Brave Browser Delivers on Promise, Files GDPR Complaint Against Google

Data of millions of eBay and Amazon shoppers exposed by VAT analysing 3rd party

Researchers have discovered another big database containing millions of European customer records left unsecured on Amazon Web Services (AWS) for anyone to find using a search engine.

A total of eight million records were involved, collected via marketplace and payment system APIs belonging to companies including Amazon, eBay, Shopify, PayPal, and Stripe.

Discovered by Comparitech’s noted breach hunter Bob Diachenko, the AWS instance containing the MongoDB database became visible on 3 February, where it remained indexable by search engines for five days.

Data in the records included names, shipping addresses, email addresses, phone numbers, items purchased, payments, order IDs, links to Stripe and Shopify invoices, and partially redacted credit cards.

Also included were thousands of Amazon Marketplace Web Services (MWS) queries, an MWS authentication token, and an AWS access key ID.

Because a single customer might generate multiple records, Comparitech wasn’t able to estimate how many customers might be affected.

About half of the customers whose records were leaked are from the UK; as far as we can tell, most if not all of the rest are from elsewhere in Europe.

How did this happen?

According to Comparitech, the unnamed company involved was a third party conducting cross-border value-added tax (VAT) analysis.

That is, a company none of the affected customers would have heard of or have any relationship with:

This exposure exemplifies how, when handing over personal and payment details to a company online, that info often passes through the hands of various third parties contracted to process, organize, and analyze it. Rarely are such tasks handled solely in house.

Amazon queries could be used to query the MWS API, Comparitech said, potentially allowing an attacker to request records from sales databases. For that reason, it recommended that the companies involved should immediately change their passwords and keys.

Banjo, the company that will use an AI to spy on all of Utah through all their cams Used a Secret Company and Fake Apps to Scrape Social Media

Banjo, an artificial intelligence firm that works with police used a shadow company to create an array of Android and iOS apps that looked innocuous but were specifically designed to secretly scrape social media, Motherboard has learned.

The news signifies an abuse of data by a government contractor, with Banjo going far beyond what companies which scrape social networks usually do. Banjo created a secret company named Pink Unicorn Labs, according to three former Banjo employees, with two of them adding that the company developed the apps. This was done to avoid detection by social networks, two of the former employees said.

Three of the apps created by Pink Unicorn Labs were called “One Direction Fan App,” “EDM Fan App,” and “Formula Racing App.” Motherboard found these three apps on archive sites and downloaded and analyzed them, as did an independent expert. The apps—which appear to have been originally compiled in 2015 and were on the Play Store until 2016 according to Google—outwardly had no connection to Banjo, but an analysis of its code indicates connections to the company. This aspect of Banjo’s operation has some similarities with the Cambridge Analytica scandal, with multiple sources comparing the two incidents.

“Banjo was doing exactly the same thing but more nefariously, arguably,” a former Banjo employee said, referring to how seemingly unrelated apps were helping to feed the activities of the company’s main business.

[…]

Last year Banjo signed a $20.7 million contract with Utah that granted the company access to the state’s traffic, CCTV, and public safety cameras. Banjo promises to combine that input with a range of other data such as satellites and social media posts to create a system that it claims alerts law enforcement of crimes or events in real-time.

“We essentially do most of what Palantir does, we just do it live,” Banjo’s top lobbyist Bryan Smith previously told police chiefs and 911 dispatch officials when pitching the company’s services.

[…]

Motherboard found the apps developed by Pink Unicorn Labs included code mentioning signing into Facebook, Twitter, Instagram, Russian social media app VK, FourSquare, Google Plus, and Chinese social network Sina Weibo.

[…]

One of the former employees said they saw one of the apps when it was still working and it had a high number of logins.

“It was all major social media platforms,” they added. The particular versions of the apps Motherboard obtained, when opened, asked a user to sign-in with Instagram.

Business records for Pink Unicorn Labs show the company was originally incorporated by Banjo CEO Damien Patton. Banjo employees worked directly on Pink Unicorn Labs projects from Banjo’s offices, several of the former employees said, though they added that Patton made it clear in recent years that Banjo needed to wind down Pink Unicorn Labs’ work and not be linked to the firm.

“There was something about Pink Unicorn that was important for Damien to distance himself from,” another former employee told Motherboard.

[…]

ome similar companies, like Dataminr, have permission from social media sites to use large amounts of data; Twitter, which owns a stake in Dataminr, gives the firm exclusive access to its so-called “fire hose” of public posts.

Banjo did not have that sort of data access. So it created Pink Unicorn Labs, which one former employee described as a “shadow company,” that developed apps to harvest social media data.

“They were shitty little apps that took advantage of some of the data that we had but the catch was that they had a ton of OAuth providers,” one of the former employees said. OAuth providers are methods for signing into apps or websites via another service, such as Facebook’s “Facebook Connect,” Twitter’s “Sign In With Twitter,” or Google’s “Google Sign-In.” These providers mean a user doesn’t have to create a new account for each site or app they want to use, and can instead log in via their already established social media identity.

But once users logged into the innocent looking apps via a social network OAuth provider, Banjo saved the login credentials, according to two former employees and an expert analysis of the apps performed by Kasra Rahjerdi, who has been an Android developer since the original Android project was launched. Banjo then scraped social media content, those two former employees added. The app also contained nonstandard code written by Pink Unicorn Labs: “The biggest red flag for me is that all the code related to grabbing Facebook friends, photos, location history, etc. is directly from their own codebase,” Rahjerdi said.

[…]

“Banjo was secretly farming peoples’ user tokens via these shadow apps,” one of the former employees said. “That was the entire point and plan,” they added when asked if the apps were specifically designed to steal users’ login tokens.

[…]

The apps request a wide range of permissions, such as access to location data, the ability to create accounts and set passwords, and find accounts on the device.

Multiple sources said Banjo tried to keep Pink Unicorn Labs a secret, but Motherboard found several links between the two. An analysis of the Android apps revealed all three had code that contained web links to Banjo’s website; each app contained a set of identical data that appeared to be pulled from social network sites, including repeatedly the Twitter profile of Jennifer Peck, who works for Banjo and is also married to Banjo’s Patton. In registration records for the two companies, both Banjo and Pink Unicorn Labs shared the same address in Redwood, California; and Patton is listed as the creator of Pink Unicorn Labs in that firm’s own public records.

Source: Surveillance Firm Banjo Used a Secret Company and Fake Apps to Scrape Social Media – VICE