ACDH Lecture 4.1 – Jennifer Edmond – What can Big Data Research Learn from the Humanities?

ACDH Lecture 4.1 – Jennifer Edmond – What can Big Data Research Learn from the Humanities?

Show Video

You. Thank. You to the Center for having me and thank. You to Charlie, for that lovely introduction. You. May be wondering. First. Of all given my my biography. That was just given I do, work a lot in digital. Research infrastructure, and particularly at the European, level but. When you work in research. Infrastructure, and I think to, a certain extent these, days when you work in the digital humanities you, start, to find that. You've. You put yourself into sometimes, uncomfortable positions. You. Find that, you're creating systems. And very large systems. That. May, have, an impact on how. We understand, history how, we understand, culture, and. From. That, slight. Disease. Came. For me an interest in questions. Of well, how do the arts and humanities help, us to understand technology, we talk a lot about how technology helps, us understand, the humanities, but. I was interested in the other side and that's a bit of what you're going to hear today although, if you want to talk about research infrastructure, afterwards, I'm always happy. Thinking. About this question of. Ok. Well if technology, is helping, us understand the humanities, what can the humanities, help us understand, about technology. There, are there are a few friends, I've made, intellectually. Along the way and, certainly. When. Alan Lou started to talk about the. The. Cultural, singularity. And how, the lack of a cultural, criticism was, blocking the digital humanities from becoming a full partner, of the humanities, I felt. That this was an important moment and, I, also felt, that. Perspectives. Such as the one that Gary, Hall puts forward, about. How. It's not just interesting, what computer science can offer the amenities but what the humanities, can offer computer science and I thought there was a real, interesting. Question there, and within. My institution, I also wear another hat I'm, an. Investigator. In a large. National, computer science, research, institute. And, this, race saw these questions the conversations. You have as, someone. Trained in, literature, sitting. In a research. Institute for personalization. And adaptive. Computing, do. Lead you to ask these questions and, come up with some answers and. The. Main project, are going to be talking about the, real context. For what I'm gonna be telling, you about today, is a, project we call kplex, or knowledge, complexity. Kplex. Is very interesting, I don't know if any of you have had the joy of applying.

For European, research, funding, but. This, is what's called a sister. Project you probably didn't even know there was such a thing as sister, projects, and the, sister projects, are an instrument, that was devised. So. That. Researchers. Coming from an arts and humanities background could help to expose, bias, in. Computational. Research and I thought that's the kind of thing we need to be doing more of so. This is actually a project that's affiliated, with the big data PPP. The. Public-private, partnership, in big data which, says, to you that not only is there a research imperative, there but also a corporate, imperative, that we're resisting. In our sister project bias, finding. Sort of way and. What we proposed to do is, to. Look at a number of things within the. The. Culture, of big data research that, we thought might expose. Some, biases, and give. Us some ways to, look at possible. Interventions. From a humanistic point of view that, could be made to improve the. Research in terms of its social, impact and in terms of its technological, robustness. Because, we do believe that if you improve the research generally, you, can improve the technology, as well. So. The things that we're looking at primarily, are first. Of all discourses of data how we talk, about data because. How we talk about things I don't need to tell people in this room how, we talk about things is important, to how we understand, them, we. Talk about hidden data, necessarily. Things that are hidden to keep them hidden but. Hidden because of accidents, of history, largely. This, comes from a perspective of looking at cultural. Heritage collections, in, Europe where. You would have many. That are very very well exposed, the. UK. France. Germany and, then, you'd have others that are, essentially, invisible from a digital point of view primarily. For example Eastern Europe we. Look at what we call the epistemic marking, of data I'll, talk later about. Work. About how data is never raw you. Always have someone, who created the, data and if they didn't create the data they created the instrument, that's it indicate the instrument, they created, the sensor, so.

Dana, Always comes from somewhere, there's always a human bias in it and finally. We're looking at complexity, and the representations. Of complexity, and how in. Technological. Systems, sometimes, these representations, of complexity, can be lost it can be smoothed over to, our detriment, as users and. You can see we have a number of partners it's not a big project but, it's been a very influential, one and, I don't started, I think, the. Day I saw this billboard in the, London Underground it wasn't exactly this feel bored because my picture is not as good as this the. Fact that there was a data analytics, company, out there who. Could imply, but, analyzing, big data was. The secret to living happily ever after, disturbed. Me greatly because. There is this almost, fetishization. Of, big, data and, we know that big data can be powerful, we, know that it can be deployed for example, towards. Public health crises. And, and to answer certain kinds of questions but. The idea, that generically. You could say you could take the fairy tale trope you know taking the literary trope that that's my turf, so. I felt like we had to push back against, this and find out well where, is the, real intersection. Because. We know as well that. AI. And. Big Data obviously, they're related, phenomena are quite different in some ways but they have similar, effects sociologically. Have. Baked. In. Prejudices. They have baked in biases, so if we in this project were there to expose biases. Then, this was certainly, the place to start so. I'm going to take a few topics, out, of the, universe, of our project, and expand. A bit about them and the first one I want to talk about is words. Amongst. Few minutes you can always talk about words and. We. Started by extracting, and you don't need to read all this it's more there to show you that this exists, we, started extracting some of the definitions, of data that, are out there now, this is actually from the scientific, literature about data, so. These are the people who are actually writing. So as to define, data. And you. Find that there are so, there's, such a variety there. That. It's, actually, quite difficult to, find any sort of coherence, you, have data, as pre-analytical, it's pre factual, it's, false, but data that is false is still data data has no truth it, is resisting. Analysis. It's neither truth nor reality, but, it may be facts, it's. A fiction, of data it's an illusion, its performative. It is. A. Sort. Of actor, and it. Has a very, distinct. Set of. Properties. For others for example the difference between data, and capita. Something that is given and something that is taken. So. We knew once, we found this kind of diversity. Even in the, the, discourse, the scientific, discourse of people who are studying science, and looking at data that, we were going to find more when we looked into the, practice, of this so. The. Next thing we looked at is. We looked at the. Ways in which big data researchers, talk about data and what we kept finding is, big. Data researchers, talk about data all the, time.

You Can see. 659. Occurrences, across, why, we have here two four six papers. And in fact the worst offender. We found of using. The word data so much that, it almost becomes empty, was. One paper, 21. Pages long in. Which the word data was used five hundred, times. So. When. You look at that, you realize, that can't be meeting the same thing every time, and, what. We did find again, digging through some of these papers is that data. Can mean comparatively, simple strings, or it. Can be complex, human, created records same. Word two, very different phenomena it can, be simple records and complex, hydrater hybrid auction objects so, it can be individual. Records or it can be, collaborations. Of records it, can be something newly drawn out of the environment or it can be something previously available for. Access analysis. And navigation, it, can be preeminent. It can be pre, processed it, can, be of direct, use to humans or purely, machine readable and of, course it can protect it. Can inhabit, all sorts, of different qualities, it, can be relevant contextual. Various, external complex, rich, note, that none of these actually tell us what the data is they. Just tell us more or less how the researcher. Feels about, it and. My. Researcher. Pulled out a couple of quotes and was, kind, of stomping, around the office one, day with these simply. Because she felt that this was really these were indicative of, the way in which not. Only was this your normal sort of jargon, but, the fact that the word data is so prevalent in these statements that are made makes it almost obscure. To. To anyone, makes it obscure to understanding, so data pretreatment, module is outside from online component, and it's done to pre-process stream, data from. The original data which is produced by the previous component in the form of data stream, or. We, calculate, the standard deviation, for the entire data in the stream to check whether all of the data are of the same value or not and are. Due. To visiting data once during the processing, data in the, performance, of processing, data is crucial. Data. Data data and. We. Thought okay, is this just the. Fact that we're looking at big data research is, there something, here that is unique, what's, interesting is, you, do have. Scenarios. And schemas, and standards. For how, to talk, about different, levels of data for example the the NASA data levels but, we found interesting about this is that the. Transformations. That occur as, you, work through data, so the cleaning, the scrubbing the. Cleaning and scrubbing as opposed to the dirty data by the way note, the the, words, there and how some. Are positively. Valence. Tanned some are negatively valence, well it's interesting about NASA is that you. Can process data up, to a level of maybe for up, to a level of five and so, then you have much, refined data if another.

Researcher, Takes that data to. Use in a different context. It reverts, back to level, zero so. Even, the more well-defined, and. Well developed schemas, for working with data they. Have a, very. Different sort of way of viewing. Provenance. And, the. The, the, impact, that, the individual, researchers, may have on, what, they're doing to the data now. I began to wonder if, this was just an epistemic thing, so. And this is based on some work I did a few years ago about is. It it does it have, to do with the way, humanists. And in this case particularly was work done in historians, how, they create, data and on, how they view data. Because. If you read work on epistemic, cultures, like, Karen North's at Tina's epistemic, cultures, you. See that there's a real difference see, or not there's. A tendency to say that humanists don't collaborate, or that, humanists. You know the epistemic process, is entirely capsulated, in the writing, or you, have others who say that humanists, don't create knowledge at all they, make it all up you know you've heard all of these better. And worse conceptions. About what, makes the two cultures debate for. Me it stands, the instrumentation. So. For, a physicist. Working at. CERN or at the European. Spallation source, the. Question, of instrumentation. Will have to do with, physical. Instruments, in. A microbiology, lab it has to do with repeatable, processes, for. The humanists, it's about layering, different. Kinds of source materials, so you have your primary, sources your secondary sources it's more like building, a dry stone wall in, which, you'll see gaps. But. One. Of the things that you note is if you're, looking at these kinds of things that go into that, humanistic. Instrument, you're. Not really gonna find anything that you can even pretend, to call raw data the. The fingerprints. Of the human beings who've come before are, always. Front and center within, that kind of of source material. Which. Brought. Us to thinking, about the differences, between this. One APIs Nemec culture where the word data was. So prevalent and our, own culture. Which. Leads me to the you say tomato I say data, because, all, of these, words that we were finding, were, so diverse in, the humanities research we're. Actually quite the same so. Every. One of these can be mapped to the word data in, some ways in, computer science research so. If there, is one sort, of light, motif or a and, wouterf odden for, the, the, work i'm presenting, to you today is that. It's. That there's a lot more confidence, that we, as humanities, researchers, or those from a background in humanities researchers, and can, take when, looking at technology, because there's a lot that we can see and, a lot that we can sense and a lot that we do differently in very positive ways. So. What. Does all this mean when we come to big data well big. Data essentially, mattify, magnifies. These issues Bigley. Mean, any word that has anything to do with big it's very popular in my office right now. Because. Obviously. Magnify. Magnification. Of errors makes, the Biggers, magnification. Of misunderstandings. Makes, them bigger and when you have larger, and larger agglomeration. Is the, likelihood, that these are going to come in in ways, that affect the. What. Can be done with the data it, raises, the. Black boxes get deeper they, get blacker and then. There's this risk of what we call Epis topological, fallout where. If. Interdisciplinary. Work is grounded, a manifold. Unresolved and undocumented, and potentially. Contradictory, aberrant. An idiot sack synchronic. Understanding. Of the term then. You can come to a point of crisis, and I think anyone who works in the digital humanities has, had that conversation I, need. The data I gave you the data what, do you want more data well I have the data you can have these entire, conversations. Where two people think they agree but. They mean something completely different, around single. Terms single, words such, as data data is only one, so. You can have problems, coming out of this and one, of the things we're realizing, is, that the problems, are not just in, research the problems are also. Potentially. Social. Because we know there are problems with, how. People out, in the world deal, with their data in terms, of privacy, and in, terms of their own how they develop, identities, how they interact, with their worlds, I. Think a good example of this again this is from the digital humanities but. I think it points in the direction of, the importance, of what we call things my. Engineering colleagues have often said to me I have a problem to solve it, doesn't I don't want to talk about what we call things I want, to solve the problem and I, recognize, that that is almost, a caricature.

Of An engineering bias but. We need to be very careful, and I don't know if you know this this Twitter. Back. And forth between Mary and Posner and Bethenny Nowitzki this, came on the heels of, a funding. Call. The. Digging into data challenge, where. The number of female applicants, was so low as, to be, very, noticeable and when. It was queried, the the. The funder said well we'd love to have more female applicants, but there. Was no bias in the system and the discussion here is well, is there actually a bias when you learn, those words like dig and mine, is there something intensively. Masculine. Inherently, masculine, about that language that, causes, people maybe, to pull back if, they've know if that's not how they see their, research so maybe it's not that it's what, girls really dig is unicorns. And sparkles and, boys but. Maybe it's the whole digging in trope is it not my personal brand of scholarship, or a rhetorical. Turn-off, so, again you have to wonder if. Words. Like data can become maybe not a rhetorical, turn off but. A sort of a turn off that leads, people into a false sense that. All, data, is the same and, a false inability. To differentiate. Between the, data they don't want widely shared and the data that we maybe do want broadcasted, and widely, shared. So. That's my first topic, my second topic is about memory. Humanists. Talk a lot about memory I think memory, and identity are probably two of the largest. Umbrellas. Under which you can group research. In, the humanities whether, it be into literature languages. Culture. But. In kplex, in this European, sister project, we talk a lot about memory. As it is encoded, memory. As it is held in institutions. Memory. As it is made accessible. As cultural, memory to. People. Who might want to research it or people who might want to use it and you. May or may not know the enumerates survey but when you look at the levels, of, how. Much cultural heritage material, in Europe is digitized. Particularly. If you look at our Chi ville material, 13%. And actually. If you dig into those numbers it's even a little bit lower because a lot of what you find that has been digitized, they're, more the administrative. Systems and and records within the the archives, that's. Fine but. More. And more there's going to be an expectation if. There, is there that in a big data universe, we, will all be able to access Big, Data approaches, that problems, will be able to be solved, questions, will be able to be asked, but. If the data is not there if the data remains hybrid. Between the analog, and the you know what. Happens then so. We, have questions there around, how, we deal, with the, cultural memory of Europe and, beyond, and I'm, always told I often, ask about provenance, as being an important part of cultural heritage data, and I'm. Often told well there are w3c. Standards. For provenance, and. This. Is a colleague. From a library actually tried to map some of that out, but. When, I think about the provenance, of culture, heritage, I think about things like this so, this was a record. I found about a collection, in the West of Ireland which. Is quite interesting because obviously the, collection, was related to papers. Of Roger Casement, and. It tells it tells the whole story and, I. Don't know what Bart, what parts, of this, history. Of this narrative, of that particular. Data, set, I don't, know what parts are the most important, is it that they relate to casement, and that they're in the Claire County Council, archives and by, the way there's, no particular reason, for casement records, to be in Claire he had no particularly. Strong link to Claire was. It the fact of who, they were found by the fact that they were kept under lock and key the. Fact that the council, didn't even know they had them well he was a controversial, figure. Wasn't, the fact that it came from a, German, u-boat or, that he was on a German u-boat was it the fact that these records were, handed, over by. A. Member. Of the the, European, nobility I mean what. What is important, about this provenance, and how, does that map on to a standard, how could this be standardized, at all so. Again there are things that we're going to remember and things that we're going to forget and we, in the digital humanities have. Always recognized. Problems, with this. And. I think one of my favorite examples of. How. To really, look at these problems is a, table, prisoner's article about the ethics of the algorithm, where. He looks very much at the how the, the, Shoah visual, History Archive was. Marked up in a way to try and make it into, a research resource but. Because, it, was marked up by humans. You're, always going to find human, fallibility and, human, interpretation. In that and I would encourage you to read the the article. Because obviously there are both, things, that make the algorithm more, ethical, because, it.

Allows You to not be distracted either by the. Paradigmatic, individuals. Or by, the mass of, something mass, of the the big data related to the Holocaust but. Also how, that. That, miso, layer can, be problematic, if you have human. Consciousness, behind it. Which. Leads me to the question of the European open science cloud going. From the show a visual history archive, to the European open science cloud implies, all sorts of things which I'm not necessarily going, to dig. Into but. There. Is an expectation that, research. In Europe in the next not even five years in, the next two years will. Become, underpinned. By. This. Cloud, of data where we're all going to share our data now. If you are or, were. I hear humanists you recognize, those there's a problem here because. I, work. A lot with historians. And they, don't own their data they have a shared ownership of their data with the cultural heritage institutions I, as. A literary scholar don't, own my, data I share, it with the publishers, I share it with the authors and it. Was really disturbing, to me to. See in the, the. Programme for the governance, of the European open science cloud which is going into build phase this, is coming and. It will be something we will all have to use which with my Daria had on I do worry about. You. Look at that list of stakeholders where. Are the publishers, where. Are the libraries, where are the museums where are the archives they're. Actually, not there so. The whole idea that there would be research, data that would have this kind of complex, social. Embeddedness, is something. That the European Commission even, looking at research, data and trying, to move us to the point where we can ask questions and, and, discover, knowledge in, the big data of. European. Research. Even. There we're finding blind spots that is a humanist, seem well, rather obvious, and, of. Course there, are other, assumptions. That we make around, big data and memory there's. A lot of people who think well the fact that we have the Internet Archive is fine the fact that we have the Wayback Machine means, that digital. Memory is protected, but. I certainly was surprised when I first realized that. There's a lot of not, only link rot, within. The, the digital archiving, but. Also that. The, use of the memento protocol, which allows sites, to be sampled, at different, times in different measures. Means. That you can find sites that actually never, existed. Where, the the pastiche, of pieces. Coming. Together means. That what you have is. Records. Of a history that never was, and. That's a little bit scary as someone who has a has, a deep, investment in the. Importance, of historical, research, now. Of course there's social levels, for, memory. And forgetting as well and I think in Europe we are in, an interesting place because, obviously we are the. Place where you can have a. Public. Dialogue and a, a, court. Based, a legal dialogue about, the right to be forgotten and of. Course we're we're all looking towards. The, idea that the the general. Data protection regulation, is going to make as scientists. Make our lives perhaps more difficult but. Also introduce, protections. For people in, the, world of big data. But. On, a more fundamental level, I think that there are things that we are. Outsourcing. About, how we culturally. Remember, and culturally. Forget, and I thought that this quotation. From, Maya. Sure and a gar was really interesting, the whole idea, that without, some form of forgetting, forgiving, becomes, a difficult undertaking this. Is precisely that kind of human value that we're seeing eroded, in. The. Anonymity. Of, the Internet. So, the question is how can we build better structures. From both the remember, in the forgetting in the digital age and. The, third time I go wanted to talk about was complexity. I love, it when computer. Science researchers, say we want to reduce complexity, I say no don't. Take away my complexity. I need, my complexity, but I need a way through it, and. Again, one. Of the starting points for me for thinking about this is the fact that raw data really is an oxymoron there is no such thing as raw data and. One of the examples I like to give and one of the examples we're looking at as a as. A sort of a a, place. To investigate, this in complex's. Machine translation. So. Google translate you. Know a haiku, a Japanese, haiku famous. Japanese haiku Google. Translate gives us the sound of water to dive an old pond frog, okay so it gives us a bit of a word salad, but, what I think is more interesting, is what human. Beings, have done with this in the past. Old, pond, frogs, jumped in sound of the water lovely, laugh catio her and the the. The irish japanese. Patriot. Of two countries. The, old pond frog, jumped in kerplunk, well that's gotta be alan ginsberg you know a nice sense of the. Rhythm in the sound of the language and of course I. Do, live not too far from Limerick in Ireland so, we have there once was a curious frog who sat by a pond on a log and to see what resulted in the pond catapulted.

With A water dies her drowned the bog, each. Of these takes, the culture. Underpinning. Food akia and. Makes. Use of it in a different way and exposes. It and plays with it. How. Can, that stand, against, the word salad, now okay so maybe maybe. Giving, Japanese. Haiku to Google Translate, wasn't fair. But. Then they see things like this and I think okay we don't play fair so this, was Mark Zuckerberg, post. From. The day when, Facebook, released their, deep, learning algorithms, for, their under their underlying their machine translation, I'm gonna talk about deep learning in a second but. I want to talk about hubris, first. And. Course he's very pleased with himself and it is good that Facebook was sharing their algorithms, I have no, question. That this, is good for, computer. Science research. But. Then we kind of get to the end of the post. Throughout. Human history language. Has been a barrier to communication I'd. Like to know what, he'd like to suggest we use instead. It's. Amazing, we get to live in a time when technology can, change that understanding. Someone's language, brings you closer to them and I'm looking forward to making universal, translation, a reality, to, help us get there faster we're sharing our work publicly so that all researchers, can use it to build better translation, tools. Knowing. The translation. Of your words does not mean that I am closer to you that does not build intimacy if, it is, it is it has a place. But. I'm not sure this is it and this is the question where I start to think ok well where are the boundaries how can we start to understand. What. Technology, can do and where technology, can end because, this is the conversation I keep having to, come. Back for a second to those deep learning algorithms, and I'm sorry for the quality of this. So, one of the partners in the kplex project, is a. Latvian. SME, and they're, very committed, to building, machine, translation, engines, for. Smaller. Languages like, latvia and so. They're working on a newer neural networks based system so they put in these. Four source. Sentences. Into, an engine so, characteristics, specialties of Latvian cuisine or bacon pies and refreshing cold sour cream soup, demand. For mobile telephones, and Internet access has exploded an insider's. Guide to drinking, sake in Tokyo and part bookshop part gallery, and, Adi, ffs highlights, Japan's deep appreciation for art and design okay, so they're doing this in a kind of a tourism context, so far so good all, four, of those statements came back with the same translation, which.

Is There in the Latvian which, translates, back to English as fast wireless Internet is available free of charge in the guest bedrooms. I'm. So sorry I'm so sorry when I get excited I start to speak too fast, so. We have the fast wireless, internet, available, free in the guest bedrooms. So. The. Way this was explained to me is that somewhere. In the black box of the machine learning there. Is a place where. Obviously although, they couldn't say exactly where, where, obviously, there. Was a connection, made between, those. Kinds of sentences, and that. Sentence and, that connection, started. To dominate what. The learning algorithm. Saw, as a correct translation and. This is a real problem with these kind of deep neural networks because, we. Don't necessarily know, as with a lot of machine learning or a lot, of AI we don't necessarily know. What's. Happening, in the black box which. Is really interesting because in my mind once. You get back to that question of not really knowing what happened. And having. To make a, judgment. Call having, to make an informed. Analysis. Of material, like that you're. Coming back to, the humanities, but. That's another, question. We. Also have, work going on about, the, emotional, side of things so again we talk about culture we talk about memory we also talk about identity, and emotion, and this isn't anything new. Alone. Together has been out since 2011, but if you go back you. Can find all of these kind of techno, skepticism. Going back but. One. Of the things we're looking at and querying, is. Whether. AI can, be emotional. Or indeed, ethical. Clearly. The developers, of. Humanoid. Robots like pepper or, pyro, I don't know if you know Paro Paro is. A fuzzy, little fur seal which. Responds. Emotionally, to you and on. Some, level it's interesting in some level it's quite frightening and quite quite. Touching. In a way. But. The questions, are that you find when, you talk to AI, researchers. Is not. Can. It be done should it be done but just how, it can be done and I, really find the question of whether for, example you know probably if you've studied philosophy you know the we problem this, question, of if you have a choice, to. Actually, bring. About the death of many or the death of few I mean are there some deaths that mean more than others in. The face I was told that that was an irrelevant, question. By. AI researchers, but this was never going to happen and yet Lexus, was, actually. Exposed. As looking, into whether protecting. The life of the driver of a driverless. Car of an, automatically. Controlled vehicle, whether. Protecting, that driver at all costs, was, a corporate. Policy so the trolley problem has become real, and of. Course I think when you start to say I, can. Program, ethics, you just need to tell me how I. Fall. Back on the fact that an ethical stance, is, an essentially, human position, it is one one. Fallible. Mortal, being being able to take responsibility, for another so. There's a lot of questions being raised there. In. The light of where, we are and just to give you a sense of the project, we're. We're. About a year in of a. Project. That is a year in three months so we're, actually going into our right up phase now and we're looking, towards the kinds of recommendations we can make and. There's been quite a lot of quantitative, work that will will all be released in the fullness of time. But. What I wanted to do as a way of. Wrapping. Up this presentation, is, make, five, modest, proposals, and, I think these are proposals, for humanists, but, also for digital humanists, because if you, work in the digital humanities you. Generally. Occupy, I don't, want to say a unique, but a privileged, position of, being able to understand. Both. Of those cultures, both of the epistemic cultures, of the humanities, where. There's a certain, prevalence. And preference, for sources. And for ways. Of thinking and ways of investigating, but. Also the. Software engineering side and the big data and the AI and the questions, the the, conditions. Of possibility for. Knowledge creation in the - so.

I Would say these are these are things that we can investigate, so. The. First modest proposal and. This is specifically, looking towards, the European open, science cloud, but. More than that I think, we. Really need a discussion, of if we're going to create knowledge from Big Data we. Need to talk about what kinds, of questions you, can ask a Big Data how do you learn to ask, research, questions, that, can, engage data, from, a. Sensor. An environmental. Sensor, and a, literary text and historical. Records do, we know how to ask those questions and if so do we know how to ask them in a way that will engage the data that we are going to be offered, and. Of course this question of shared ownership because. The, shared ownership is, not just important, for archives, and, researchers. Shared. Ownership also exists, between, you, and me, and Facebook. And the, sensors that are taking. Our information, and the and the, the the. New Amazon. Grocery. Store where, there's no one at the tail you just take what you want and walk out and it knows what you have there's. A shared, ownership of data there as well and this, isn't always respected. And. One of the things we're looking at in kplex in, particular, in terms of shared. Ownership but also provenance, is the question of a data passport, so if data. In a European open science cloud obviously. It's going to have some metadata attached but. How can we get that beyond a standard, into something that really reflects where this data has come from what. It has been, gone through and what has been done to it how its been transformed. And, what. Can be done with it going forward. And. Again. The. Commissioner, has said it he believes that the most exciting and grown breaking, work, it's, happening, at the intersection of disciplines, so, if, we want to take him up on his offer of, a European, open science cloud we. Really need to think about how to do it well and that is, for everyone and I think the humanists, are in a good position to, actually make a real impact, there but, also something that we're doing in, Daria the European research infrastructure, that I meant is we're. Bringing together stakeholders, to try and develop a data, reuse charter, because, we recognize, that the individual, researcher, does, not feel empowered to necessarily, reuse. Data they don't know the. Paper that they signed for the archive does that mean that they can put, the, data in an open repository does. That mean that they have to keep it private to them sounds what, are the conditions, for sharing data data.

Would Be better. Available. If it, was shared more widely it. Would be perhaps more, sustainable. If it's shared more widely but. There are still blockages, in the cultures, especially between the researchers, and the cultural heritage institutions and, we're trying to find ways of smoothing, that over so I think this is one of the things we need to look at. Another. Thing as I mentioned a, lot, of the problems that we're coming to now, really. Need a humanistic, approach and I know there is science and technology, studies and, I have a lot of respect for a lot of the work done in science and technology studies but. It does tend to be very social science, based that's what it is what. Are the cultural approaches, what about the the, understanding. That humanists. Have of human. Motivation of, human. Values, of human. Activities, actions, I think. There's a lot to be had there and again. I can't, necessarily recommend. That book to you which was written by someone coming out of Stanford writing, about the fuzzy and the techie how they together and make a perfect approach to technology. But. It's actually it's interesting that the book exists, at all on, the book in itself is interesting for how it views the. The way you can, get a better intelligence, out, of combining, these two approaches, to knowledge, creation. And. It's. Interesting I I mentioned, fake science here because I was asked last week. By. Someone in the Commission, well, our historians. Worried about fake science I. Thought. This is a really interesting question, I said, well you. Really can't prove. A, lot, of things in, literary. Research or historical reason you can't necessarily prove, them right wrong so. We've developed certain, ways of actually showing. An argument. Of showing. A provenance. Of showing, a way through a set of source material. Which. May or may not have. Biases. But at least the biases, they're exposed, this is what post-modernism, meant, to me is that I had to be careful about my own biases. So, the. Idea, that there are also things that we can say about, the. Repeatability of science is another thing that has struck me recently as. An. Approach, that humanists, and particularly digital humanists, might take, so. In, this world where knowledge, will become more, overtly, messy, than we need to approach it like a becket text something. With doodles and scribbles and cross-outs and things, that we know a lot about. Next. We, need to get past privacy, protection and approach indentity, enrichment, as a goal for big data and AI, privacy. Protection, this is a term I've taken straight from the big, data PPP. The, companies are all on board for, privacy, preserving technologies. Which. Is putting, I think the cart before the horse but. It's also ignoring. The opportunity, costs, of. What. We allow. Ourselves to not be exposed to the. Ways in which the. The digital and in particular, the, the sort of the social. Media platforms the way in which they. Are affecting, identities. By, not, exposing us to culture, so there's a gap there as well so I'd love to see us move from talking about privacy. Which, clearly has a, monetary. Place in the minds of, the of the companies to, something that is more holistic and that sees both the in and the out, because. You have people like this writer from The Guardian who, says I'm a typical millennial, I'm glued to my phone my virtual life has fully merged with my real life there's no difference, anymore if, that's gonna be the case then, it, would be useful to think about what kinds of identities, that are being built there.

To. More quickly problem. Solving isn't enough we, need to be thoughtful imaginative. And disciplined about our engineering and how we speak about it and here place where again, you can see very good work starting, it's quite inspired, when I first saw the Copenhagen, letter I don't know how many of you know this but it is an open letter signed by a I think about 5,000. People at the moment, saying, if we. Are contributing, to the building of technology, that we need to keep certain things in mind and I would I would highly recommend you go and look at it as a move towards, having. A different kind of conscience, within, technology development. And, of course things like the the the + computational biology, paper on 10 simple rules for Responsible big data research, it's. Not it's, not rocket science, actually there, are things that can, be done it's computational, biology of course there. Are things that we can do and if there are values. That are going to be emerging, I mean, I'm glad to see open science emerging, as a value, for science in Europe but, I'd love to see things about. Protecting. The, user protecting. The individual, I would, love to see things like that emerge, in the, way we talk about big, data and I'd like to see more. Focus. I'd like to see more. What. I would see of is a scientific, rigor about, the way we talk about this research coming through, and. Finally, I do, think, that there's a I, have a sense from the the. Work I've been involved in but, there's always a sense towards convergence, we want to converge everything, everything, will be digitized, don't. Worry so, all we need is the right digital, space we need the right digital object the right device. Well. I started, doing this ethnographic, work and I started taking pictures of my work spaces, these. Work spaces are not really, gonna converge they're messy, I know what they're messy for a reason, they're, messy, because the information, I'm dealing with is messy, they're messy because there. Is heterogenous, they're. Messy, because I'm. Working at different levels, on different things at, the same time and if you want me to put it into Microsoft speak.

They're Messy, because I'm chunking, I'm micro, tasking, you know some of the stuff is very sexy, in the tech world but. That is something that I think. We need to push for more you, know don't give me another VRE, don't give me a one-stop-shop. Give. Me a technical, intervention, that supports, the way my, research environment, works, give. Me a technical, intervention. That helps the way my life works and then, I think we'll have a better chance of that more. Refined. Hybrid. Intelligence, not artificial, not, human not, biased in one way or the other, but able to check and balance itself, because. I do believe in the end the, fact that we, I am I am. A very human. Human I suppose we. Can't feel data this. Is why seeing everything printed, strikes you we are physical, creatures we. Need materiality, so. I suppose to end to talk about big data with the word words. We need materiality, is a slightly. Strong. Stance to make but I hope that we can discuss, it in the questions thank, you. You.

2018-07-27 15:28

Show Video

Other news