Navegando por Assunto "Data mining"
Agora exibindo 1 - 17 de 17
- Resultados por página
- Opções de Ordenação
Dissertação Acesso aberto (Open Access) Abordagem probabilística para caracterização do sistema de marcação de sequenciamento multiplex na plataforma ABI SOLID(Universidade Federal do Pará, 2011-07-01) LOBATO, Fábio Manoel França; SANTANA, Ádamo Lima de; http://lattes.cnpq.br/4073088744952858The next generation sequencers such as Illumina and SOLiD platforms generate a large amount of data, commonly above 10 Gigabytes of text files. Particularly, the SOLiD platform allows the sequencing of multiple samples in a single run (called multiplex run) through a marking system called Barcode. This feature requires a computational process for separation of data per sample, therefore, the sequencer provides a mixture of all samples in a single output. This process must be secure to avoid any harm that may scramble further analysis. In this context, this dissertation proposes development of a probabilistic model capable of characterizing the marking system used in multiplex sequencing. The results corroborate the adequacy of the model obtained, which allows, among other things, identify faults in some step in the sequencing process, adapt and develop new protocols for sample preparation, and assign a grade to the reliability of data generated and guide a filtering process that respects the characteristics of each sequence, without discarding sequences useful in an arbitrary manner.Dissertação Acesso aberto (Open Access) Agrupamento de fornos de redução de alumínio utilizando os algoritmos Affinity Propagation, Mapa auto–organizável de Kohonen (som), Fuzzy C–Means e K–Means(Universidade Federal do Pará, 2017-10-11) LIMA, Flávia Ayana Nascimento de; CARDOSO, Diego Lisboa; http://lattes.cnpq.br/0507944343674734; OLIVEIRA, Roberto Célio Limão de; http://lattes.cnpq.br/4497607460894318The continuous development of technology accounts for measures that provide industries benefits to grant them profitability and competitive advantage. In the mineralogy field, aluminum smelting usually requires substantial number of cells, also known as reduction pots, to produce aluminum in a continuous and complex process. Analytical monitoring is essential for those industries’ competitive advantage, given that during operation some cells show behavior similar to others, thereby forming clusters of cells. These clusters depend on data patterns usually implicit or invisible for the operation, but can be found by data analysis techniques. In this work four clustering techniques are presented to that end: the Affinity Propagation; the Kohonen Self Organizing Map; the Fuzzy C–Means; and the K–Means Algorithm. These techniques are used to find and group cells that share similar behavior, by analysing seven variables which are closely related to the aluminum reduction process. This work aims at addressing the benefits of clustering, especially by simplifying the aluminum potline analysis, once a large group of cells might be summarized in one sole group, what can provide more compact yet rich information for data driven modeling and control. Moreover, the identification of similar data patterns in clusters makes the task of those who is going to be in charge of analyzing these dats. This work also identifies the ideal cluster size for each technique applied.Dissertação Acesso aberto (Open Access) Análise dos fatores relacionados ao desempenho das escolas no IDEB: estudo de caso no Estado do Pará(Universidade Federal do Pará, 2022-02-11) GOMES, Vitor Hugo Macedo; SILVA, Marcelino Silva daThe complexity of identifying all the factors that are related to the performance of schools on the Basic Education Development Index (IDEB) is enormous. In this study, three databases were analyzed with the objective of identifying several factors that correlate with low performance in state schools in the state of Pará. Initially, it was observed through the analysis that 142 municipalities in the state were at risk of not meeting the goal regarding the reduction of school dropouts and, consequently, affecting the performance of schools. This study used educational data mining techniques to, first, select variables with structural characteristics in the teaching environment, comparing the schools with higher and lower performance in IDEB, identifying possible relationships with school dropouts. Then, the Randon Florest (RF) algorithm was used to select the most important variables that directly or indirectly impact the IDEB index. After the selection phase, the variables were submitted to the Linear Regression (LR) algorithm. The results reveal that in the group of schools below average in IDEB, 60.6% reside in families with incomes up to one minimum wage, while 37.5% have incomes above one minimum wage. In the group of schools above average in IDEB, 42.4% live in families with incomes up to one minimum wage, while 51.6% live in families with incomes above one minimum wage. Evidencing that family income is related to better IDEB scores and, consequently, better infrastructure conditions. The results also indicate that the income of students’ families is related to the average family income in the analyzed municipalities. Next, variables related to parents’ income were used to identify a possible relationship between parents’ schooling and students’ performance. Finally, the analysis ends with the analysis of the impact of the Municipal Human Development Index (HDI) on the variables related to the students’ grades, the teachers’ qualifications, and the teachers’ experience in the school environment. The results reveal that there is a correlation between the index and student learning in the classroom. On the other hand, better IDEB scores are directly related to the adequacy of the curriculum to the subject taught, in addition to good working conditions for teachers.Dissertação Acesso aberto (Open Access) Análise dos impactos harmônicos em uma indústria de manufatura de eletroeletrônicos utilizando árvores de decisão(Universidade Federal do Pará, 2015-03-27) NOGUEIRA, Rildo de Mendonça; SANTANA, Ádamo Lima de; http://lattes.cnpq.br/4073088744952858; TOSTES, Maria Emília de Lima; http://lattes.cnpq.br/4197618044519148The Power Quality (PQ) is constantly the subject of many studies, mainly those that are related to the industrial production sector, where are concentrated large loads of the electrical systems. With the evolution of industrial production processes and the introduction of new technologies in the industrial sector, quantities of electronic equipment bars were added that are sources of disturbances in the systems, and affecting the quality of the product "electricity". In order to minimize the inconvenience resulting from low quality of energy and damage to utilities and consumers (industrial, commercial and residential), it was developed in Brazil, distribution procedures in the national grid (PRODIST), created and developed the National Electric Energy Agency (ANEEL). The PRODIST aims to regulate and standardize activities related to energy distribution, including product quality standards. This work was concentrated and held in a company of the industrial pole of Manaus (PIM), which has a three-phase electrical system low voltage, in order to monitor the quality of the product "electricity" through the harmonic content generated by the electrical network involved in manufacturing. The data generated were subjected to computational intelligence technique (HF), using the process of knowledge extraction discovery in databases or KDD. The objective is to analyze, identify and diagnose the coupling points and processes that have representative harmonic content for the system, so being able to check how much each analyzed process may be affecting the power quality within the industry itself and the point of coupling with the concessionaire, through the generations of harmonic distortion, thus avoiding penalties and other sanctions regulated.Dissertação Acesso aberto (Open Access) Análise dos impactos harmônicos na qualidade da energia elétrica utilizando kdd – estudo de caso na Universidade Federal do Pará(Universidade Federal do Pará, 2019-03-18) SILVA, Waterloo Ferreira da; TOSTES, Maria Emília de Lima; http://lattes.cnpq.br/4197618044519148The present work presents an analysis of data related to Power Quality (PQ), the increasing use of nonlinear loads, equipment based on power electronics in residential, commercial and industrial installations are contributing to the significant increase in the levels of harmonic distortion of current and, consequently, of voltage, as observed in the Brazilian electricity distribution system. It was developed in Brazil, the distribution procedures in the national electricity system (PRODIST), created and developed by the National Electric Energy Agency (ANEEL). PRODIST aims to standardize and standardize activities related to energy distribution, including product quality standards. In order to monitor the quality of the product "electric energy" through the harmonic content generated by the electric network of the institution, a methodology is proposed for the analysis using computational intelligence (CI) and data mining techniques to analyze the data collected by meters of energy quality installed in the main sectors of this institution and at the point of common coupling of the consumer and consequently establish the relationship between the harmonic currents of the nonlinear loads with the harmonic distortion at the common coupling point. The KDD process was applied, including the collection, selection, cleaning, integration, transformation and reduction, mining, interpretation and evaluation of the data, in order to monitor the quality of the product "electric energy" through the harmonic content generated by the electric grid. educational institution. In the "Data Mining" data mining phase, the Naive Bayes classifier was used. The obtained results showed that the KDD process has applicability in the analysis of the Total Harmonic Distortion of Voltage at the Common Coupling Point and can be applied in any commercial, residential and industrial area.Dissertação Acesso aberto (Open Access) Uma arquitetura de pré-processamento para análise de sentimento em mídias sociais em português brasileiro(Universidade Federal do Pará, 2018-08-23) CIRQUEIRA, Douglas da Rocha; SANTANA, Ádamo Lima de; http://lattes.cnpq.br/4073088744952858The Web 2.0 and the evolution of Information Technologies have brought novel interaction and relationship channels. In this context, the Online Social Networks (OSN) are an example as platforms which allow interactions and sharing of information between people. In this scenario, it is possible to observe the adoption of OSN as a channel for posting opinions regarding products and experience. This scene presents an excellent opportunity for companies that aim to improve products, services and marketing strategies, given OSNs are powerful sources of massive unstructured data generated by consumers (UGC), with opinions and reviews concerning offers, in platforms such as Facebook, Twitter and Instagram. Brazil is a highlight in this scenario, where this phenomenon can be observed, as the Brazilian population is one of the most active in social media platforms in the world. This makes it a country full of opportunities to market exploitation. In this context, computational techniques of Opinion Mining and Sentiment Analysis (SA) are applied aiming to infer the polarity (positive, negative, neutral) regarding a sentiment associated to texts, and can also be applied in data from OSN to evaluate the feedback from a target audience. Although the existing diversity of SA strategies reported in the literature, there are still challenges faced in the application of SA in text data from OSN, given the characteristics of the language adopted in such platforms. The state of art is focused on SA towards the English language, and the existing proposals for Brazilian Portuguese do not have a standardized methodology for preprocessing steps. In this context, this research investigates an approach with no translation, and proposes a novel preprocessing architecture for SA towards Brazilian Portuguese, aiming to provide enriched features to SA algorithms. The proposal was compared with well-established baselines from the literature, and the obtained results indicate that this architecture can overcome the state of art recall in at least 3% , for 6 out of 7 datasets evaluated.Tese Acesso aberto (Open Access) Avaliação da distorção harmônica total de tensão no ponto de acoplamento comum industrial usando o processo KDD baseado em medição(Universidade Federal do Pará, 2018-03-27) OLIVEIRA, Edson Farias de; TOSTES, Maria Emília de Lima; http://lattes.cnpq.br/4197618044519148In the last decades, the transformation industry has provided the introduction of increasingly faster and more energy efficient products for residential, commercial and industrial use, however these loads due to their non-linearity have contributed significantly to the increase of distortion levels harmonic of voltage as a result of the current according to the Power Quality indicators of the Brazilian electricity distribution system. The constant increase in the levels of distortions, especially at the point of common coupling, has generated in the current day a lot of concern in the concessionaires and in the consumers of electric power, due to the problems that cause like losses of the quality of electric power in the supply and in the installations of the consumers and this has provided several studies on the subject. In order to contribute to the subject, this thesis proposes a procedure based on the Knowledge Discovery in Database - KDD process to identify the impact loads of harmonic distortions of voltage at the common coupling point. The proposed methodology uses computational intelligence and data mining techniques to analyze the data collected by energy quality meters installed in the main loads and the common coupling point of the consumer and consequently establish the correlation between the harmonic currents of the nonlinear loads with the harmonic distortion at the common coupling point. The proposed process consists in analyzing the loads and the layout of the location where the methodology will be applied, in the choice and installation of the QEE meters and in the application of the complete KDD process, including the procedures for collection, selection, cleaning, integration, transformation and reduction, mining, interpretation, and evaluation of data. In order to contribute, the data mining techniques of Decision Tree and Naïve Bayes were applied and several algorithms were tested for the algorithm with the most significant results for this type of analysis as presented in the results. The results obtained evidenced that the KDD process has applicability in the analysis of the Voltage Total Harmonic Distortion at the Point of Common Coupling and leaves as contribution the complete description of each step of this process, and for this it was compared with different indices of data balancing, training and test and different scenarios in different shifts of analysis and presented good performance allowing their application in other types of consumers and energy distribution companies. It also shows, in the chosen application and using different scenarios, that the most impacting load was the seventh current harmonic of the air conditioning units for the collected data set.Dissertação Acesso aberto (Open Access) Avaliação de desempenho em programa de formação massiva utilizando técnicas de mineração de dados(Universidade Federal do Pará, 2015-08-28) PINHEIRO, Marcia Fontes; CARDOSO, Diego Lisboa; http://lattes.cnpq.br/0507944343674734; SANTANA, Ádamo Lima de; http://lattes.cnpq.br/4073088744952858With the evolution of the application of Information and Communication Technologies (ICTs) in education was fostered the emergence of new methods, techniques and procedures that favor active learning, planning and management courses and support for overcoming difficulties in the educational process, be distance learning or presencial teaching. The Virtual Learning Environments (VLEs) have become fundamental to the conduct of educational processes, providing the democratization of education and enabling continuing education, as well as generating large volumes of data about the learning process. Have information about the learning process is of utmost importance for educators and students, as it allows to support decision making and reflection on the methodologies applied in education, used content and student performance. In this sense, this research proposes feature selection methodology for performance evaluation Massive Training Program students using data mining techniques. The proposed methodology considers identify attributes to be used for making inferences related to student performance and correlated with social aspects through qualitative and quantitative analysis of results. This methodology was developed considering the educational context and valuing diversity in the process. To demonstrate the feasibility of the proposed methodology was applied case study on hybrid environment of massive learning with proprietary databases from Telecentros.BR program provided by the managers of the program. In the case study was applied to feature selection methodology for Educational Data Mining, thus classification tasks were applied using the J48 algorithms, Random Forest and Random Tree to predict student grades; grouping tasks using the K-means algorithm to find profile of students based on the VLE usage logs and Self-Organized Maps (SOM) to find quality educational features from textual qualitative assessments. The results obtained through case study demonstrated the feasibility of the methodology considering the educational context and present new performance indicators to managers of Telecentros.BR program, such as profile use of AVA, evasion indicators, student profile.Dissertação Acesso aberto (Open Access) Classificação de dados utilizando algoritmos genéticos e lógica difusa(Universidade Federal do Pará, 2008-12-14) KATO, Rodrigo Bentes; OLIVEIRA, Roberto Célio Limão de; http://lattes.cnpq.br/4497607460894318Several of the traditional techniques of Data Mining have been applied successfully and others have some limitations. Both, in performance and the quality of knowledge generated. Recent research has shown that the techniques in the field of IA, such as GA and Fuzzy sets, can be used successfully. In this research we are interested in investigating the applicability of a hybrid combination of genetic algorithms and fuzzy sets to find rules in large and complex spaces. This paper presents a Genetic Algorithm (GA), using Fuzzy Logic, for coding, assessment and reproduction of chromosomes, looking for classifying data using extracted rules for the automatic way with the evolution of chromosomes. The Fuzzy Logic is used to make the rules clearer and closer to human language, using linguistic representations to identify continuous data.Dissertação Acesso aberto (Open Access) Detecção de fraudes no consumo de energia elétrica usando árvores de decisão(Universidade Federal do Pará, 2017-07-11) MATOS, Yasmin Christine Correa; VIEIRA, João Paulo Abreu; http://lattes.cnpq.br/8188999223769913In recent years, the injury caused by the nontechnical losses to power distribution utilities, in Brazil have been estimated at R$ 7 billion per year. This reality represents a challenge for some of country’s utilities, who need effective measures to combat commercial losses. In this scenario, this dissertation presents a methodology able of detecting fraud in the consumption of electric energy, using a technique of data mining, known as decision tree. Performance tests of the method were done using real data from the history of electricity consumption and the inspection of consumer units (CU’s) suspected of being irregular in the metropolitan region of Belém. The results showed that the proposed decision-tree based method performs well in the detection of fraud in the electric power consumption.Tese Acesso aberto (Open Access) Estratégia de otimização para a melhoria da interpretabilidade de redes bayesianas: aplicações em sistemas elétricos de potência(Universidade Federal do Pará, 2009-12-10) ROCHA, Cláudio Alex Jorge da; FRANCÊS, Carlos Renato Lisboa; http://lattes.cnpq.br/7458287841862567The study of methods, techniques and tools that can aid the decision processes in power systems, in its many sections, is a subject of great interest. This decision support can be accomplished through many different techniques, particularly those based on computational intelligence, given their applicability on domains with uncertainty. In this proposal, Bayesian networks are used for the extraction of knowledge models from the available data on power systems. Moreover, given the demands of these systems and some limitations imposed to the inferences in Bayesian networks, a method is proposed, using genetic algorithms, capable of extending the power of comprehensibility of the patterns discovered; it aims at finding the optimal scenario in order to attain a given target, considering the incorporation of a priori knowledge from domain specialists, identifying the most influent variables in the domain for the maximization of the target variable.Tese Acesso aberto (Open Access) Experimentos de mineração de dados aplicados a sistemas scada de usinas hidrelétricas(Universidade Federal do Pará, 2012-04-13) OHANA, Ivaldo; BEZERRA, Ubiratan Holanda; http://lattes.cnpq.br/6542769654042813The current model of the Brazilian electric sector allows equal terms to all actors and reduces the role of the State in this sector. This model forces the electrical utilities to improve the quality of their products and, as a prerequisite for this purpose, they should make more effective use of the enormous amount of operational data that are stored in databases, acquired from the operation of their electrical systems which use the hydroelectric power plants as their main source of energy generation. One of the main tools for managing the operation of these plants are the Supervisory Control and Data Acquisition systems (SCADA). Thus, the large amount of data stored in databases by SCADA systems, certainly containing relevant information, should be treated to discover relationships and patterns that would help in the understanding of many important operational aspects as well as in the evaluation of operational performance of the electric power systems. The process of Knowledge Discovery in Database (KDD) is the process of identification of patterns in large data sets, that are valid, new, and useful to improve the understanding of a problem or a decision-making procedure. Data Mining is the step within KDD that extracts useful information from large databases. In this scenario, the present study objective is to perform data mining experiments on data generated by power plants SCADA systems, to produce relevant information to assist in planning, operation, maintenance and security of hydro power plants and also contribute to the implementation of the culture of using data mining techniques applied to these plants.Dissertação Acesso aberto (Open Access) Mineração de dados educacionais aplicada à busca de perfis de alunos em casos de evasão ou retenção: uma abordagem através de Redes Bayesianas(Universidade Federal do Pará, 2017-09-12) COUTO, Diego da Costa do; SANTANA, Ádamo Lima de; http://lattes.cnpq.br/4073088744952858This work investigates the profiles of undergraduate students at the University of Federal University of Pará prone to two problems faced in several universities evasion and retention. These problems stimulated the study of methodologies that detect patterns that lead to extrapolation or the premature end of the studies. The tool chosen for this purpose, the Bayesian Network is powerful in providing reasoning about uncertainties, especially in causes and effects diagnoses. Assumption of the relationship of the variables and their probability of occurrence and marginal. Another aspect inherent in the structure of Bayesian Networks is the comprehensibility of representation and results, which generate specialists and users entered into the domain. Considering such placements, these potential of the methodology in question strengthened its application in this research. So, academic records containing tens of thousands of samples from students immersed in presential teaching environments belonging to undergraduate students at the Federal University of Pará until the year 2016 were submitted to the of Knowledge Discovery in the Database, specifically in Data Mining the desired patterns were extracted using the classification task. In addition, several performance analyzes were performed during Data Mining stage The Bayesian Network together with other classic algorithms of supervised learning, and which revealed its great accuracy and efficiency, rising from the best solutions found, its use has been certified on the selected database. In three Study of Case, the results shows classifier’s quality based on Bayesian Networks, which presented an accuracy of more than 82%, a condition that its usefulness in the researched domain. Thus, the results achieved were satisfactory and strong influences of some variables on the propensity of evasion or retention.Dissertação Acesso aberto (Open Access) Otimização do processo de aprendizagem da estrutura gráfica de Redes Bayesianas em BigData(Universidade Federal do Pará, 2014-02-20) FRANÇA, Arilene Santos de; SANTANA, Ádamo Lima de; http://lattes.cnpq.br/4073088744952858Automation at data management and analysis has been a crucial factor for companies which need efficient solutions in an each more competitive corporate world. The explosion of the volume information, which has remained increasing in recent years, has demanded more and more commitment to seek strategies to manage and, especially, to extract valuable strategic informations from the use of data mining algorithms, which commonly need to perform exhausting queries at the database in order to obtain statistics that solve or optimize the parameters of the model of knowledge discovery selected; process which requires intensive computing to perform calculations and frequent access to the database. Given the effectiveness of uncertainty treatment, Bayesian networks have been widely used for this process, however, as the amount of data (records and/or attributes) increases, it becomes even more costly and time consuming to extract relevant information in a knowledge base. The goal of this work is to propose a new approach to optimization of the Bayesian Network structure learning in the context of BigData, by using the MapReduce process, in order to improve the processing time. To that end, it was generated a new methodology that includes the creation of an Intermediary Database, containing all the necessary probabilities to the calculations of the network structure. Through the analyzes presented at this work, it is shown that the combination of the proposed methodology with the MapReduce process is a good alternative to solve the scalability problem of the search frequency steps of K2 algorithm and, as a result, to reduce the response time generation of the network.Dissertação Acesso aberto (Open Access) Stormsom: clusterização em tempo-real de fluxos de dados distribuídos no contexto de BigData(Universidade Federal do Pará, 2015-08-28) LIMA, João Gabriel Rodrigues de Oliveira; CARDOSO, Diego Lisboa; http://lattes.cnpq.br/0507944343674734; SANTANA, Ádamo Lima de; http://lattes.cnpq.br/4073088744952858Dissertação Acesso aberto (Open Access) Uso de árvore de decisão para avaliação da segurança estática em tempo real de sistemas elétricos de potência(Universidade Federal do Pará, 2014-09-12) RODRIGUES, Benedito das Graças Duarte; VIEIRA, João Paulo Abreu; http://lattes.cnpq.br/8188999223769913; BEZERRA, Ubiratan Holanda; http://lattes.cnpq.br/6542769654042813The techniques used to Static Security Assessment in power systems depend on the implementation of a large number of cases of load flow for various topologies and system operating conditions. In real-time operation environments, this practice is difficult to implement, especially in large systems where the execution of all cases of load flow needed, requires high time and computational effort even for the current resources available. Data Mining techniques such as decision tree have been used in recent years and have achieved good results in the applications of static and dynamic security assessment of electrical power systems. This work presents a methodology for static security assessment in real-time of electrical power systems using the decision tree, where off-line load flow simulations, performed by software ANAREDE (CEPEL), has been generated an extensive labeled database related to the state of the system for various operating conditions. This database was used for induction of decision trees, providing a model for fast and accurate prediction that classifies the state of the system (secure or insecure) for real time application. This methodology reduces the use of computers in the on-line environment, since the processing of the decision tree requires only checking some if-then logical instructions of a limited number of numerical tests in the binary nodes for the attribute value definition that satisfies the rules, because these tests are performed in a same number of hierarchical levels of the decision tree, which is usually reduced. With this simple computational processing, the task of the static security evaluating will be able to be performed in a fraction of the time required to perform by faster traditional methods. To validate the methodology, a case study based on a real power system was performed, where for every contingency classified as insecure a corrective control action was executed from the decision tree information on the critical attribute that affects the security. The results showed the methodology is an important tool for static security assessment in real time for use in a center's operation system.Dissertação Acesso aberto (Open Access) Uso de técnicas de mineração de dados para a extração de indicação de falha na operação de hidrogeradores a partir de medidas de descargas parciais(Universidade Federal do Pará, 2016-06-17) PARDAUIL, Ana Carolina Neves; BEZERRA, Ubiratan Holanda; http://lattes.cnpq.br/6542769654042813By studies conducted by CIGRE in 2009, it was found that the main source of hydro generator failures is correlated to the machine electrical insulation. Due to this fact, monitoring the stator winding conditions became an important supervising procedure. A very used practice to accomplish this supervision is through the measurement and analysis of partial discharges (PDs), being this practice one of the most effective and secure methods for analysis of generator stator insulation. However, although PDs have well-defined standards, it is not trivial to classify the obtained PDs signals in these patterns, mainly due to the large number and variety of PDs occurrences. Today, the significant increase in the amount of PDs data available was due to improvements in equipment and software for PDs monitoring, as for example the system IMA-DP, which has contributed to better planned and more frequent measurement campaigns. So, this work proposes the use of an intelligent tool to facilitate the process of identification and diagnosis of partial discharges, based on data mining techniques using decision trees (DT), which is a solution for analyzing large amount of data. In the specific case presented in this dissertation it was used 2,435 measurements obtained for phase A of a hydro generator of the Tucurui Hydro Plant, which was essential to validate the proposed method, because they represent real data obtained from the Hydro Plant operation. A hybrid approach (supervised/unsupervised) was used to identify and rank PDs patterns among the well-known forms of DPs. A fast and very satisfactory PDs classification procedure was achieved, especially when converting data from statistical maps into amplitude histograms, thus, obtaining well-defined clusters and a created decision tree that achieved global indices of accuracy above 98%.
