电脑回收如何处理隐私数据

  原文来自:DataColumn Will data analysis be automated?

  文末附:生僻词汇和词组搭配,以供大家学习交流。

  中文注释:由Eknowns翻译组提供

  原文建议阅读时间: 10分钟

  

  Will data analysis be automated?

  数据分析会变成完全自动化么电脑回收中的隐私保护措施

  In 2011, McKinsey Company released a report about the shortage of analytical talents, marking the beginning of the era of big data. In the following year, an article in the Harvard Business Review called data scientist “ the sexiest job in 21st century.” Businesses and organizations, private and public sectors alike, continue to express their interests in hiring analytical talent, some even express difficulties in hiring one.

  ▲2011年,麦肯锡发布了一篇关于数据分析人才短缺的报告,标志着大数据时代的开启。第二年,哈佛商业评论的一篇文章把数据科学家称作“21世纪最性感的工作”。各类企业和组织,私营部门和公共部门,都在不断表达他们有兴趣雇佣数据分析人才,其中一些组织甚至表达出找到这类人才的难度。

  While the future seems bright and shining for analytical talent, those who have just set their mind on becoming an analyst probably share the same deepest fear that I do—Will my job as a data analyst be automated in the future? This is not just a question for data analysts. The world changes at a faster pace than we can possibly keep pace with or imagine. Everyone in their early 20’s should consider this before deciding which career to pursue and how technology could impact their careers / jobs in the future.

  ▲虽然数据分析人才的前途看似一片光明,但是那些已经决定成为分析师的人大概都会有一个共同的顾虑——我的这份数据分析工作将来会被自动化替代么?我也有同样的担忧,而且这不仅仅是数据分析师要面对的问题。世界正在以我们难以跟上甚至超乎想象的速度变化着,每个人在20出头的时候,都应该考虑一下这个事实,然后再去决定从事什么职业,并且判断技术将可能如何影响他们的工作。

  After reflecting on my own experience in the MSA program over the last six months, I would like to share some in sights on this question, and hopefully I will alleviate some of the anxiety.

  ▲回顾我过去六个月在北卡州立大学的分析硕士学习经历,我想分享一些关于上面问题的个人见解,希望可以帮助大家缓解一些焦虑。

  First, let’s consider why people think analytics jobs could potentially be automated by computers. It is no secret that data analysis and modeling highly rely on computers. Nowadays, computers can not only show the modeling results but also help the analysts pick the best model by applying the model to the test set or validation set, or alternatively use techniques such as cross validation.

  ▲首先,我们先思考为什么人们觉得数据分析工作可能会被计算机取代。实际上数据分析和建模工作高度依赖计算机已经不是什么秘密了。如今,计算机不仅能给出模型运行结果,还能通过测试、验证、交叉验证等方式帮助分析师选择最佳模型。

  However, to pick the best model, we need to decide on the criteria to pick the model, and this is where human intervention comes in. Often times in analytics, there’s no one universal answer to all situations (this is where the running joke “it depends” at the Institute comes from), and thus the final call still relies on human judgement.

  ▲然而,为了选择最佳模型,我们需要选定选择模型的条件,这部分需要人为干预。数据分析中往往没有适用于所有情况的通用答案(这就是我们内部关于“视情况而定”这个笑话的出处),因此,最终的决定依然需要人为的判断。

  However, in my opinion, the biggest obstacle that prevents analytical jobs from being automated lies in the nature of data itself. To be more specific, let’s talk about two aspects of The Four Vs of Big Data –“Variety and Veracity.”

  ▲然而我认为,数据分析工作完全自动化的最大障碍在于数据的属性本身。具体来说,我们讨论下大数据四个性质中的两个方面——多样性和精确性(还有两个方面是数量和速度)。

  In most of the real world analytics projects, a large amount of effort goes into preparing the data in the analytics-ready format. Depending on the sources of your data, the resources required to get through this stage of preparing the data varies. Let’s assume you would like to predict powerusage over the next week for an energy company. The information you need maybe saved in the same database (in this case you’re extremely lucky), or it could be spread out in several database across different departments in the organization, which comes in different formats.

  ▲大多数的实际分析项目中,大量的工作是把数据提前处理成可分析的格式。根据电脑回收中的隐私保护措施你的数据来源,你需要完成处理数据多样化的阶段。假设你要预测一家能源公司未来一周的电力供给情况,你要用到的信息可能保存在一个数据库中(遇到这种情况你简直太幸运了),或者分布在不同部门各自的数据库中,这种情况下数据格式不一样。

  Sometimes the information you need doesn’t exist in the company’s database. For example, weather is highly related to power usage, so you would like to include weather data in your analysis, but it is not available in the current data set. You may need to scrape the website that provides such information and convert the information into the same format as your current dataset and integrate the data together.

  ▲有时你需要的信息却不在公司的数据库里,比如天气和电力供给高度相关,因此你想在分析中加入天气数据,但是现有的数据库并没有。你可能需要去提供天气信息的网站搜刮,并且把这些信息转成和现有数据同样的格式并把它们整合起来。

  All these are just the tip of the iceberg of the variety of data. As you can see, to pull all the necessary information together, and transform them into an analytics-ready format requires lots of human intervention, not to mention all the data cleaning work (missing values etc.)once the data is put together.

  ▲这一切只不过是数据多样性的冰山一角,正如你所见,把所有必需信息汇总到一起并且转成可供分析的格式就已经耗费很多人力了,更别说之后的数据清洗工作了。

  Let’s move on to the veracity of the data – one of data analysts’ biggest nightmares.In terms of the quality and accuracy of the data, this could only be determined by a human. After all, a computer is just a machine; it takes whatever data you feed in, and it does not have the ability to question the quality of data. In many cases, the data analysts are not involved in the data collection process and the data they’re given may not be suitable to answer the questions that are posed.

  ▲下面我们来谈谈数据的精确性——数据分析师们最大的噩梦之一。就数据的质量和精确度而言,只能凭借人类来决定。毕竟电脑只是机器,无论什么数据,只要你喂给它它都用,而电脑没有能力去怀疑数据的质量。很多时候,数据分析师并没有参与到数据收集过程,他们拿到的数据也许根本不适用要解决的问题。

  Sometimes it is necessary for the analyst to communicate with those who design the data collection to assess the quality of data. Another factor that complicates the issue is privacy. Here at the Institute, every student is assigned to a practicum project and given the chance to work on a real world problem for their sponsors.

  ▲因此,分析师有时必须和设计数据收集过程的人沟通来评估数据的质量。另一个使得问题复杂化的因素是隐私性。在我们学院内部,每个学生都被分配到一个实践项目以便有机会替赞助商解决实际问题。

  For privacy concerns, the data handed to the students must not contain personal information identifiers, which sometimes pose extra challenges in data analysis. For example, if you can’t tell that two purchases are made by the same person, how could you find the purchasing pattern on the individual level? As a result, analyzing the data that are masked to protect personal privacy requires lots of human intervention.

  ▲出于隐私考虑,交给学生的数据不可以包含个人信息标识符,这有时给数据分析工作带来更多的挑战。比如说,如果无法辨别两个商品是同一个人购买的,你怎么才能发现个人层面的购买模式呢?结果就是,分析那些因为保护隐私而伪装的数据,需要大量的人工介入。

  So it seems like data analysis is nowhere near being automated—at least not in the next five years, and the demand for analytical talents might be larger than you think. If you think analytics is the right career for you, I would encourage you to pursue this path.

  ▲这样看来,数据分析离完全自动化还远——至少未来五年不会,而且对于数据分析人才的需求可能比你想象得还要多。如果你认为数据分析的工作适合你,我一定鼓励你从事这行。

  Again,one should never underestimate the disruptive power of technology. All these arguments are made based on the current technologies. If some unexpected technology comes into play, these arguments may no longer be valid.

  ▲再次提醒,切不可低估技术的破坏性力量。所以上述推论都是基于现在的技术。如果有未知的技术出现了,这些推论可能都将失效。

  生词

  alleviate:减轻,缓和

  alternatively:另外

  validation:验证

  criteria:标准

  intervention:介入;调停;妨碍

  convert:使转变;转换…;转变,变换

  obstacle:障碍

  integrate:使…完整;使…成整体;求积分

  pursue:追求,进行

  disruptive:破坏的;分裂性的;制造混乱的

  特别注意两个常见动词的灵活用法

  pursue:pursue a career in;pursue a positive relationship with;pursue a Master's degree

  mask:注意其抽象的名字和动词用法,under the mask of;mask negative emotions

  词组

  reflect on:仔细考虑,思考;反省;回想,回顾

  go into:进入;加入;变得;探究

  get through:通过;到达;做完。注意go into和get through的抽象用法,汉译英的时候很难想到用这两个词组。

  be involved in:参与;涉及;包含;牵涉进

  be assigned to:被分配给,指派给

  come into play:开始活动;开始起作用;开始有(或发生影响)

  此外还有一些地道生动的表达,可以作为素材积累:

  release a report:发布报告

  express an interest in...:表示对...感兴趣

  set one's mind on...:把心放在...上

  bright/shining future:美好的未来

  share the same valule/fear/interest:分享着共同的...

电脑回收中的隐私保护措施_电脑回收前怎么处理个人信息

  running joke:笑话

  to be more specific:更具体些

  be highly related to:高度相关

  scrape the website:搜刮网站(注意体会scrape和search的表现程度,前者体现出极尽全力的感觉)

  tip of the iceberg:冰山一角

  complicate the issue:使问题更加复杂

  关注易知星球(eknowns_com),更多福利干货

客户评论

我要评论